Tholpadi, Goutham and Bhattacharyya, Chiranjib and Shevade, Shirish (2015) Translation Induction on Indian Language Corpora Using Translingual Themes from Other Languages. In: 16th Annual Conference on Intelligent Text Processing and Computational Linguistics (CICLing), APR 14-20, 2015, Nile Univ, Cairo, EGYPT, pp. 505-519.
Full text not available from this repository. (Request a copy)Abstract
Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.
Item Type: | Conference Proceedings |
---|---|
Series.: | Lecture Notes in Computer Science |
Publisher: | SPRINGER-VERLAG BERLIN |
Additional Information: | Copy right for this article belongs to the SPRINGER-VERLAG BERLIN, HEIDELBERGER PLATZ 3, D-14197 BERLIN, GERMANY |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 05 Nov 2015 08:57 |
Last Modified: | 05 Nov 2015 08:57 |
URI: | http://eprints.iisc.ac.in/id/eprint/52698 |
Actions (login required)
View Item |