DravidianLangTech-2021

Overview

The development of technology increases our internet use, and most of the global languages have adapted themselves to the digital era. However, there are many regional, under-resourced languages that face challenges as they still lack developments in language technology [1]. One such language family is the Dravidian family of languages. Dravidian languages are primarily spoken in south India and Sri Lanka. Pockets of speakers are found in Nepal, Pakistan, Malaysia, other parts of India and elsewhere in the world. The Dravidian languages, which are 4,500 years old [2] and spoken by millions of speakers, are under-resourced in speech and natural language processing [1].

The Dravidian languages are divided into four groups: South, South-Central, Central, and North groups. Dravidian morphology is agglutinating and exclusively suffixal. Syntactically, Dravidian languages are head-final and left-branching. They are free-constituent order languages. To improve access to and production of information for monolingual speakers of Dravidian languages, it is necessary to have speech and languages technologies. The aim of these workshops is to save the Dravidian languages from extinction in technology. This is the first workshop on speech and language technologies for Dravidian languages.

The broader objective of DravidianLangTech-2021 will be

To investigate challenges related to speech and language resource creation for Dravidian languages.
To promote a research in speech and language technology in Dravidian languages.
To adopt appropriate language technology models which suit Dravidian languages
To provide opportunities for researchers from the Dravidian language community from around the world to collaborate with other researchers.

1. Shared Task on Machine Translation in Dravidian languages

Organizers:

Bharathi Raja Chakravarthi, National University of Ireland Galway
Ruba Priyadharshini, Madurai Kamaraj University
Anand Kumar M, National Institute of Technology Karnataka Surathkal
Parameshwari, University of Hyderabad
Melvin Johnson, Google Research, USA
John P. McCrae, National University of Ireland Galway
Elizabeth Sherly, Indian Institute Of Information Technology and Management-Kerala

Student Volunteers

Richard Saldanha, National Institute of Technology Karnataka Surathkal
Shubankar Banarjee, National University of Ireland Galway

2.Shared Task on Offensive Language Identification in Dravidian languages.

Organizers:

Bharathi Raja Chakravarthi, National University of Ireland Galway
Ruba Priyadharshini , Madurai Kamaraj University
Anand Kumar M, National Institute of Technology Karnataka Surathkal
John P. McCrae, National University of Ireland Galway
Thomas Mandl, University of Hildesheim
Elizabeth Sherly, Indian Institute Of Information Technology and Management-Kerala

Student Volunteers

Navya Jose, Indian Institute Of Information Technology and Management-Kerala
Prasanna Kumar Kumaresan, Indian Institute Of Information Technology and Management-Kerala
Rahul Ponnsamy, Indian Institute Of Information Technology and Management-Kerala
Hariharan R. L., National Institute of Technology Karnataka Surathkal

3. Meme classification for Dravidian languages

Organizers:Shardul Suryawanshi, Bharathi Raja Chakravarthi, Mihael Arcan, and Paul Buitelaar.

These shared tasks target the south Dravidian language on social media content, in future these shared tasks will extend to other Dravidian languages.

Workshop contact:

dravidianlangtech@gmail.com and bharathiraja.akr@gmail.com

References

[1] Chakravarthi, B.R., 2020. Leveraging orthographic information to improve machine translation of under-resourced languages (Doctoral dissertation, NUI Galway).

[2] Kolipakam, V., Jordan, F.M., Dunn, M., Greenhill, S.J., Bouckaert, R., Gray, R.D. and Verkerk, A., 2018. A Bayesian phylogenetic study of the Dravidian language family. Royal Society open science, 5(3), p.171504.

[3] Shardul, S., Chakravarthi, B.R., Varma, P., Arcan, M., McCrae, J.P., Buitelaar, P., 2020, May. A dataset for troll classification of Tamil memes. In Proceedings of the 5th Workshop on Indian Language Data Resource and Evaluation (WILDRE-5). European Language Resources Association (ELRA).