ACL 2022 : DravidianLangTech : Program

Keynote - Shashirekha HL, Mangalore University

Abstract: Tulu is one of the Dravidian languages predominantly spoken in Southern part of India, mainly by the people of Dakshina Kannada, Udupi and some places of Kasaragod. More than 2.5 million people speak Tulu and they consider it as their mother tongue. Tulu speaking community with its distinct sociocultural traits, religious practices, artistic traditions and theatrical forms has made significant contribution to the cultural heritage of Karnataka and through it to the totality of Indian culture and civilization. Even though Tulu has its own script called ‘Tigalari’, most people predominantly use Kannada script to write Tulu articles. Tulu is a free word order language with a high level of agglutination and rich morphological structure and follow similar strategy for its phonology like other Dravidian languages. A word is formed by adding suffixes or prefixes to the root word in a series similar to other Dravidian languages and the word complexity increases with the number of prefixes and/or suffixes where suffixes indicate the number, tense, case and gender related information. Verbs have both affirmative and negative voice and with verb-final inflectional patterns, Tulu is an inflectional language like Kannada. In spite of several literary works in Tulu, digital presence of Tulu is almost zero making it an underresourced language. The size of Tulu Wikipedia text and Tulu text corpus are of very less size making it difficult to construct datasets for any applications. BPEmb - pre-trained subword embeddings with a vocabulary of size 10,000 and fasttext - pre-trained word vectors, are the only digital resources available for Tulu natural language processing. Due to lack of resources, computational tools such as Morphological Generator and Analyser, POS tagger, NER and so on and applications such as Sentiment analysis, Offensive language identification, Fake news and are not available for Tulu. This talk addresses the needs and the possible solutions for the development of resources, tools and applications for Tulu language.