ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation

Kumar, A and Bhattamishra, S and Bhandari, M and Talukdar, P (2019) Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019, 2-7 June 2019, Minneapolis; United States, pp. 3609-3619.

[img] PDF
CON_NOR_AME_CHA_ASS_COM_LIN_HUM_LAN_TEC_1_2019_3609-3619_2019.pdf - Published Version
Restricted to Registered users only

Download (987kB) | Request a copy
Official URL: https://dx.doi.org/10.18653/v1/N19-1363

Abstract

Inducing diversity in the task of paraphrasing is an important problem in NLP with applications in data augmentation and conversational agents. Previous paraphrasing approaches have mainly focused on the issue of generating semantically similar paraphrases, while paying little attention towards diversity. In fact, most of the methods rely solely on top-k beam search sequences to obtain a set of paraphrases. The resulting set, however, contains many structurally similar sentences. In this work, we focus on the task of obtaining highly diverse paraphrases while not compromising on paraphrasing quality. We provide a novel formulation of the problem in terms of monotone submodular function maximization, specifically targeted towards the task of paraphrasing. Additionally, we demonstrate the effectiveness of our method for data augmentation on multiple tasks such as intent classification and paraphrase recognition. In order to drive further research, we have made the source code available. © 2019 Association for Computational Linguistics

Item Type: Conference Paper
Publication: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Publisher: Association for Computational Linguistics (ACL)
Additional Information: cited By 0; Conference of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 ; Conference Date: 2 June 2019 Through 7 June 2019; Conference Code:159851
Keywords: Classification (of information); Computational linguistics, Beam search; Conversational agents; Data augmentation; Multiple tasks; Source codes; Submodular functions; Submodular optimizations, Digital storage
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 04 Nov 2020 09:27
Last Modified: 04 Nov 2020 09:27
URI: http://eprints.iisc.ac.in/id/eprint/65675

Actions (login required)

View Item View Item