Skip to content

Project on Adaptive Wait-k Policy for Simultaneous Text-to-Text Machine Translation Based on RL

Notifications You must be signed in to change notification settings

Tanmay-IITDSAI/MLProject

Repository files navigation

MLProject

This report presents the initial implementation of a simultaneous translation model using PyTorch, which incorporates a mixture of experts strategy and reinforcement learning principles. The model has been designed to optimize translation quality while minimizing latency.

  • Base Model: The project will utilize the Mixture-of-Experts Wait-k policy as the backbone model. This policy allows each head of the multi-head attention mechanism to perform translation with different levels of latency.

Reinforcement Learning Integration:

  • State: The state in the RL framework consists of the current source tokens processed and the target tokens generated.
  • Action: The action space corresponds to selecting a value of 𝑘. (how many source tokens to wait before generating a translation).
  • Reward Function: The reward will be a combination of BLEU score (translation quality) and Average Lagging (latency). The model will be penalized for high latency and low translation quality. We might also test ROUGE score as it can provide insights into the coverage and informativeness of the generated text.
  • Agent: A policy-based RL agent will be used to predict the optimal k value.

Model Architecture:

  • The MOEWaitKPolicy class implements the Mixture-of-Experts Wait-k mechanism, allowing the model to leverage multiple experts for different heads of attention.
  • The AdaptiveWaitKModel class integrates the encoder and decoder components, incorporating the adaptive wait-k policy into the translation process.
  • The SimultaneousTranslationModel class encapsulates the entire model architecture, facilitating the forward pass through the network. (not working need to tackle this too)
  • The model is trained over multiple epochs using the Adam optimizer and Cross-Entropy loss. The training loop involves passing batches through the model, calculating the loss, and updating the weights.n (not working need to tackle this too)

Pending Tasks & Challenges

  • Fix Tensor Size Mismatches: Address the runtime error caused by mismatched tensor dimensions in the MOEWaitKPolicy forward method.
  • Resolve Syntax Issues: Correct all syntax errors in the code to ensure it runs without interruption.
  • Evaluate Model Performance: Once the errors are fixed, perform comprehensive evaluations using BLEU and ROUGE scores.

Alternative Frameworks: If issues with PyTorch persist, we will consider switching to TensorFlow/Keras for the model implementation. This may involve rewriting parts of the code to adapt to the new framework.

About

Project on Adaptive Wait-k Policy for Simultaneous Text-to-Text Machine Translation Based on RL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published