Kumar, A and Bhattamishra, S and Bhandari, M and Talukdar, P (2019) Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019, 2-7 June 2019, Minneapolis; United States, pp. 3609-3619.
PDF
CON_NOR_AME_CHA_ASS_COM_LIN_HUM_LAN_TEC_1_2019_3609-3619_2019.pdf - Published Version Restricted to Registered users only Download (987kB) | Request a copy |
Abstract
Inducing diversity in the task of paraphrasing is an important problem in NLP with applications in data augmentation and conversational agents. Previous paraphrasing approaches have mainly focused on the issue of generating semantically similar paraphrases, while paying little attention towards diversity. In fact, most of the methods rely solely on top-k beam search sequences to obtain a set of paraphrases. The resulting set, however, contains many structurally similar sentences. In this work, we focus on the task of obtaining highly diverse paraphrases while not compromising on paraphrasing quality. We provide a novel formulation of the problem in terms of monotone submodular function maximization, specifically targeted towards the task of paraphrasing. Additionally, we demonstrate the effectiveness of our method for data augmentation on multiple tasks such as intent classification and paraphrase recognition. In order to drive further research, we have made the source code available. © 2019 Association for Computational Linguistics
Item Type: | Conference Paper |
---|---|
Publication: | NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference |
Publisher: | Association for Computational Linguistics (ACL) |
Additional Information: | cited By 0; Conference of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 ; Conference Date: 2 June 2019 Through 7 June 2019; Conference Code:159851 |
Keywords: | Classification (of information); Computational linguistics, Beam search; Conversational agents; Data augmentation; Multiple tasks; Source codes; Submodular functions; Submodular optimizations, Digital storage |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 04 Nov 2020 09:27 |
Last Modified: | 04 Nov 2020 09:27 |
URI: | http://eprints.iisc.ac.in/id/eprint/65675 |
Actions (login required)
View Item |