<Addressing the Rare Word Problem in Neural Machine Translation>
1. A significant weakness in conventional NMT systems is their inability to correctly trans-late very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OOV) word.
2. We train an NMT system on data that is augmented by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the target sentence, the position of its corresponding word in the source sen-tence. This information is later utilized in a post-processing step that translates every OOV word using a dictionary.
3. Motivated by the strengths of standard phrasebased system, we propose and implement a novel approach to address the rare word problem of NMTs. Our approach annotates the training corpus with explicit alignment information that enables the NMT system to emit, for each OOV word, a “pointer” to its corresponding word in the source sentence. This information is later utilized in a post-processing step that translates the OOV words using a dictionary or with the identity translation, if no translation is found.
4. We propose to address the rare word problem by training the NMT system to track the origins of the unknown words in the target sentences. If we knew the source word responsible for each unknown target word, we could introduce a post-processing step that would replace each unk in the system’s output with a translation of its source word, using either a dictionary or the identity translation.
5. We treat the NMT system as a black box and train it on a corpus annotated by one of the models below. First, the alignments are produced with an unsupervised aligner(Berkeley aligner). Next, we use the alignment links to construct a word dictionary that will be used for the word translations in the post-processing step.
6. There are three kinds of models. Copyable Model, Positional All Model(PosAll), Positional Unknown Model(PosUnk).Copyable Model: in T corresponds to in S, but there are in T, so it is limited by its inability to translate unknown target words that are aligned to known words in the source sentence.
Positional All Model(PosAll): Use a universal unk token with a positional token to reach complete alignments. But it doubles the length of T and makes learning more difficult.
Positional Unknown Model(PosUnk): Use in the T.