The development of technology increases our internet use, and most of the global languages have adapted themselves to the digital era. However, there are many regional, under-resourced languages that face challenges as they still lack developments in language technology [1]. One such language family is the Dravidian family of languages. Dravidian languages are primarily spoken in south India and Sri Lanka. Pockets of speakers are found in Nepal, Pakistan, Malaysia, other parts of India and elsewhere in the world. The Dravidian languages, which are 4,500 years old [2] and spoken by millions of speakers, are under-resourced in speech and natural language processing [1].

The Dravidian languages are divided into four groups: South, South-Central, Central, and North groups. Dravidian morphology is agglutinating and exclusively suffixal. Syntactically, Dravidian languages are head-final and left-branching. They are free-constituent order languages. To improve access to and production of information for monolingual speakers of Dravidian languages, it is necessary to have speech and languages technologies. The aim of these workshops is to save the Dravidian languages from extinction in technology. This is the first workshop on speech and language technologies for Dravidian languages. 

The broader objective of DravidianLangTech-2021 will be

  • To investigate challenges related to speech and language resource creation for Dravidian languages.
  • To promote a research in speech and language technology in Dravidian languages.
  • To adopt appropriate language technology models which suit Dravidian languages
  • To provide opportunities for researchers from the Dravidian language community from around the world to collaborate with other researchers.

1. Shared Task on Machine Translation in Dravidian languages

Organizers:

  • Bharathi Raja Chakravarthi, National University of Ireland Galway

  • Ruba Priyadharshini, Madurai Kamaraj University

  • Anand Kumar M, National Institute of Technology Karnataka Surathkal

  • Parameshwari, University of Hyderabad

  • Melvin Johnson, Google Research, USA

  • John P. McCrae, National University of Ireland Galway 

  • Elizabeth Sherly, Indian Institute Of Information Technology and Management-Kerala 

Student Volunteers

  • Richard Saldanha, National Institute of Technology Karnataka Surathkal 

  • Shubankar Banarjee, National University of Ireland Galway

2.Shared Task on Offensive Language Identification in Dravidian languages.

Organizers: 

  • Bharathi Raja Chakravarthi, National University of Ireland Galway

  • Ruba Priyadharshini ,  Madurai Kamaraj University

  • Anand Kumar M, National Institute of Technology Karnataka Surathkal 

  • John P. McCrae, National University of Ireland Galway

  • Thomas Mandl, University of Hildesheim

  • Elizabeth Sherly, Indian Institute Of Information Technology and Management-Kerala 

Student Volunteers

  • Navya Jose, Indian Institute Of Information Technology and Management-Kerala

  • Prasanna Kumar Kumaresan, Indian Institute Of Information Technology and Management-Kerala

  • Rahul Ponnsamy, Indian Institute Of Information Technology and Management-Kerala

  • Hariharan R. L., National Institute of Technology Karnataka Surathkal

3. Meme classification for Dravidian languages

Organizers:Shardul Suryawanshi, Bharathi Raja Chakravarthi, Mihael Arcan, and Paul Buitelaar.

These shared tasks target the south Dravidian language on social media content, in future these shared tasks will extend to other Dravidian languages. 

Workshop contact:

dravidianlangtech@gmail.com and bharathiraja.akr@gmail.com

References

[1] Chakravarthi, B.R., 2020. Leveraging orthographic information to improve machine translation of under-resourced languages (Doctoral dissertation, NUI Galway).

[2] Kolipakam, V., Jordan, F.M., Dunn, M., Greenhill, S.J., Bouckaert, R., Gray, R.D. and Verkerk, A., 2018. A Bayesian phylogenetic study of the Dravidian language family. Royal Society open science, 5(3), p.171504.

[3] Shardul, S., Chakravarthi, B.R., Varma, P., Arcan, M., McCrae, J.P., Buitelaar, P., 2020, May. A dataset for troll classification of Tamil memes. In Proceedings of the 5th Workshop on Indian Language Data Resource and Evaluation (WILDRE-5). European Language Resources Association (ELRA).