因為語言本身就有模糊性,所以研究者就想要找到更多上下文 (context) 提供給翻譯系統,例如 source sentence 剛好在描述一張圖片, 該圖片就可能帶有隱藏的資訊能夠幫助翻譯
Multimodal MT 就是想使用 source + image 來同時翻譯 target sentence
目前大多的 multimodal MT 主要是一個 text-to-text 的系統,並且在 input 加入了從額外的 computer vision model
或 visual attention
所獲得的
global image features
作為輔助
Multilingual image description with neural sequence models
Multimodal pivots for image caption translation
Attention-based multimodal neural machine translation
目前已被證實使用圖片 (visual clues) 來幫助翻譯是有效的:
An error analysis for image-based multi-modal neural machine translation
目前主要在 NMT 的處理單位大多為 characters
, subword
為主流,這個設計並不是依照語言學,而是因為將 NMT 擴展到 open vocabulary
是非常困難的
若要依照語言學設計,翻譯應該被看作是更大元素之間的轉換,例如句子中的 words
, phrases
, syntactic structures
所以目前有許多研究嘗試在 source/target 端放入 syntactic constituency trees
或是 dependency trees
,或是保持 sequence-to-sequence 的架構,並將架構線性化 (linearize) 為 tree-structure
,方法有:
- Bracket expressions
Multi-source syntactic neural machine translation
- Sequence of rules
Multi-representation ensembles and delayed SGD updates improve syntax-based NMT
- CCG supertags
Predicting target language CCG supertags improves neural machine translation
- Packed forests represent multiple source sentence parses
Forest-based neural machine translation
- Recurrent neural network grammars
Recurrent neural network grammars
- Tree-LSTMs
Improved semantic representations from tree-structured long short-term memory networks
- Convolutional encoders to represent a dependency graph
Graph convolutional encoders for syntax-aware neural machine translation
- Biased encoder-decoder attention weights with syntactic clues
Syntax-directed attention for neural machine translation
- Unsupervised tree-based methods
Unsupervised recurrent neural network grammars
更多的研究改良了基於 tree-based NMT
的 source sentence input:
- Lattice-based NMT
Lattices
代表upstream components
(e.g., speech recognizer, tokenizer) 的不確定性Neural lattice-to-sequence models for uncertain inputs
Lattice-based recurrent neural network encoders for neural machine translation
Lattice-to-sequence attentional neural machine translation models
- Factors
Factors
則利用 tuple 來代表更多單字的資訊 (e.g., lemma, prefix, suffix, POS, etc)Factored translation models
Linguistic input features improve neural machine translation
Factored neural machine translation architectures
Neural machine translation by generating multiple linguistic factors
某個研究中 (Has machine translation achieved human parity?
) 提到,當人類取得較多上下文內容時,例如 full-document
時就能翻譯的比 SOTA sentence-level NMT 還要來得好,原因是人類在尋找跨句子的上下文比較優秀,因此許多技巧被用來尋找跨句子的上下文 (intersentential context
):
- Initializing encoder or decoder states
Exploiting cross-sentence context for neural machine translation
- Multisource encoders as additional decoder input
Does neural machine translation benefit from larger context?
- Memory-augmented neural networks
Learning to remember translation history with a continuous cache
Document context neural machine translation with memory networks
Cache-based document-level neural machine translation
- Document-level LM
CUED@WMT19:EWC&LMs
- Hierarchical attention
Document-level neural machine translation with hierarchical attention networks
Selective attention for context-aware neural machine translation
- Deliberation networks
Modeling coherence for discourse neural machine translation
- Simply concatenating multiple source and/or target sentences
Neural machine translation with extended context
- Context-aware extensions to Transformer encoders
Context-aware neural machine translation learns anaphora resolution
Improving the Transformer translation model with document-level context