DravidianLangTech-2021

Overview

The development of technology increases our internet use, and most of the global languages have adapted themselves to the digital era. However, there are many regional, under-resourced languages that face challenges as they still lack developments in language technology [1]. One such language family is the Dravidian family of languages. Dravidian languages are primarily spoken in south India and Sri Lanka. Pockets of speakers are found in Nepal, Pakistan, Malaysia, other parts of India and elsewhere in the world. The Dravidian languages, which are 4,500 years old [2] and spoken by millions of speakers, are under-resourced in speech and natural language processing [1].

The Dravidian languages are divided into four groups: South, South-Central, Central, and North groups. Dravidian morphology is agglutinating and exclusively suffixal. Syntactically, Dravidian languages are head-final and left-branching. They are free-constituent order languages. To improve access to and production of information for monolingual speakers of Dravidian languages, it is necessary to have speech and languages technologies. The aim of these workshops is to save the Dravidian languages from extinction in technology. This is the first workshop on speech and language technologies for Dravidian languages.

The broader objective of DravidianLangTech-2022 will be

To investigate challenges related to speech and language resource creation for Dravidian languages.
To promote a research in speech and language technology in Dravidian languages.
To adopt appropriate language technology models which suit Dravidian languages
To provide opportunities for researchers from the Dravidian language community from around the world to collaborate with other researchers.

Workshop contact:

dravidianlangtech@gmail.com and bharathiraja.akr@gmail.com

References

[1] Chakravarthi, B.R., 2020. Leveraging orthographic information to improve machine translation of under-resourced languages (Doctoral dissertation, NUI Galway).

[2] Kolipakam, V., Jordan, F.M., Dunn, M., Greenhill, S.J., Bouckaert, R., Gray, R.D. and Verkerk, A., 2018. A Bayesian phylogenetic study of the Dravidian language family. Royal Society open science, 5(3), p.171504.

[3] Shardul, S., Chakravarthi, B.R., Varma, P., Arcan, M., McCrae, J.P., Buitelaar, P., 2020, May. A dataset for troll classification of Tamil memes. In Proceedings of the 5th Workshop on Indian Language Data Resource and Evaluation (WILDRE-5). European Language Resources Association (ELRA).