ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Neural machine translation with recurrent highway networks

Parmar, M and Devi, VS (2018) Neural machine translation with recurrent highway networks. In: 6th International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2018, 20 - 22 December 2018, Cluj-Napoca, pp. 299-308.

[img]
Preview
PDF
MIKE 2018_11308_299-308_2018.pdf - Published Version

Download (896kB) | Preview
Official URL: https://doi.org/10.1007/978-3-030-05918-7_27

Abstract

Recurrent Neural Networks have lately gained a lot of popularity in language modelling tasks, especially in neural machine translation (NMT). Very recent NMT models are based on Encoder-Decoder, where a deep LSTM based encoder is used to project the source sentence to a fixed dimensional vector and then another deep LSTM decodes the target sentence from the vector. However there has been very little work on exploring architectures that have more than one layer in space (i.e. in each time step). This paper examines the effectiveness of the simple Recurrent Highway Networks (RHN) in NMT tasks. The model uses Recurrent Highway Neural Network in encoder and decoder, with attention. We also explore the reconstructor model to improve adequacy. We demonstrate the effectiveness of all three approaches on the IWSLT English-Vietnamese dataset. We see that RHN performs on par with LSTM based models and even better in some cases. We see that deep RHN models are easy to train compared to deep LSTM based models because of highway connections. The paper also investigates the effects of increasing recurrent depth in each time step. © Springer Nature Switzerland AG 2018.

Item Type: Conference Paper
Publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher: Springer Verlag
Additional Information: The copyright for this article belongs to the Authors.
Keywords: Computational linguistics; Computer aided language translation; Decoding; Modeling languages; Signal encoding, Attention; Dimensional vectors; Encoder-decoder; Highway networks; Language modelling; Machine translations; Reconstructor; Vietnamese, Long short-term memory
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Division of Interdisciplinary Sciences > Computational and Data Sciences
Date Deposited: 01 Sep 2022 10:21
Last Modified: 01 Sep 2022 10:21
URI: https://eprints.iisc.ac.in/id/eprint/76346

Actions (login required)

View Item View Item