Skip to content

Latest commit

 

History

History
152 lines (128 loc) · 7.11 KB

hw5.md

File metadata and controls

152 lines (128 loc) · 7.11 KB

HW5: Neural machine translation with the sequence-to-sequence model

Due April 17th, 2018 11:59pm Eastern time

In this assignment I provide an implementation of the sequence-to-sequence model for neural machine translation like described in section 5.1 of the Koehn NMT textbook and Sutskever et. al. Sequence to Sequence Learning for Neural Networks, 2014. It is also described in the Tensorflow Neural Machine Translation (seq2seq) Tutorial. Your task will be to train a model and evaluate it with the BLEU score. Next you will improve on the quality of the model by changing the training configuration and by improving the algorithm.

Please submit a report answering the analysis questions below and the modified code.

Required training time

While the homework assignment does not require a lot of coding, training neural MT models on a desktop/laptop CPU can take several hours. The assignment asks you to try different experiment configurations, therefore the homework needs to be started early to allow for the different experiments to finish.

Training the model and evaluating the output with BLEU (20 points)

  1. Set up an Anaconda Python 3.5 environment for this HW4 as described in Computing Environment (same as HW4)
  2. Clone the knmt github repository https://github.com/achimr/knmt to your computer (same as HW4)
  3. Open and Anaconda 3.5 prompt and navigate to the knmt/seq2seq folder
  4. Download the training and test data from the HW5 file folder and unzip it to a folder called fra-eng below the knmt/seq2seq folder
  5. Run "python lstm_seq2seq_wordbased.py" to train a baseline NMT system. This will\
    • Train a French-English seq2seq NMT model on the first 8000 sentence pairs in fr-en.train.txt (for 100 epochs taking about 2 hours on a Core i5 laptop)
    • Write the trained model to a file s2sw.h5
    • Run inference on the source contained in fr-en.test_small.txt
    • Calculate the BLEU score on the inference output with the reference sentences contained in fr-en.test_small.txt - the baseline BLEU score should be around 0.03179288690658304

CAUTION: If you get a warning message like the following the displayed BLEU score will not be valid and it is not comparable to other BLEU scores calculated without the warning. You should treat it as a BLEU score of zero.

C:\Users\achim\Anaconda3\envs\py35\lib\site-packages\nltk\translate\bleu_score.py:490: UserWarning:
Corpus/Sentence contains 0 counts of 3-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)
BLEU score: 0.09686693290317193

 

Make sure you copy and paste the output into a text file. In your report provide an analysis on:

  1. The characteristics of the training data that is used for the baseline
  2. The progression of the training and validation data loss during the training
  3. The translations of the test set: Which ones are good? Which ones bad? What is missing? What is characteristic about the source sentences that have bad translations? (you should not do a sentence-by-sentence analysis, but rather high-level observations) What are potential reasons for the bad translations/missing information?
  4. The BLEU score

Once you have trained the model, you can read the trained model from the file s2sw.h5 and run inference using the script lstm_seq2seq_restore_wordbased.py. Run "python lstm_seq2seq_restore_wordbased.py" at least once without any changes and verify that you get the same translations and same BLEU score as with the full training.

Improving the MT quality without changing the code (40 points)

Try to improve the BLEU score on the fr-en.test_small.txt test set by retraining the system with a different configuration. Possibilities are (in no particular order):

  • Training the system for more (or less) epochs (command line parameter --epochs)
  • Training the system with more training data (command line parameter --num-samples). Note that the training data is structured in a way that the sentences get longer, so training time will increase because of this in addition to the increased training time because of the increased number of training sentence pairs.
  • Training with a different Embedding dimension (no command line parameter, variable latent_dim)
  • Training with a different LSTM layer dimension (no command line parameter, variable latent_dim)
  • Lowercase the training data
  • Pre-process the training data with a different tokenizer

Please analyze the effect of your changes on the translated sentences and the BLEU score. Analyze why your changes improved the BLEU score or not. It is sufficient to make one change that improves the BLEU score. Please provide a log of your output.

Improving the MT quality by improving the algorithm (40 points)

Try to improve the BLEU score on the fr-en.test_small.txt test set by retraining the system with a code change. Possibilities are (in no particular order):

Please analyze the effect of your changes on the translated sentences and the BLEU score. Analyze why your changes improved the BLEU score or not. It is sufficient to make one change that improves the BLEU score over the BLEU score from the previous task (Improving the MT quality without changing the code). Please provide a log of your output and your modified code.

Of course the options above require different levels of work, but I will not compare one solution to the other in grading. Important for grading is a code change leading to an output quality improvement and the analysis why it did.

Acknowledgments