A Non-monotonic Self-terminating Language Model

The official repository of the ICLR 2023 conference paper, "A Non-monotonic Self-terminating Language Model".

Authors: Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee

1. Overview:

The repository is organized as follows.

.
├── eos.png
├── LICENSE
├── README.md
├── requirements.txt
└── src
    ├── gpt2
    │   ├── data.py
    │   ├── eval.py
    │   ├── metrics.py
    │   ├── nmst.py
    │   ├── preprocess.py
    │   ├── st.py
    │   ├── train.py
    │   └── utils.py
    └── wiki2
        ├── data.py
        ├── decoding_utils.py
        ├── evaluate.py
        ├── model_utils.py
        ├── train.py
        └── utils.py

2. Setup:

The code was written in Python 3.9.12 and all experiments in the paper were conducted using a single NVIDIA Quadro RTX 8000. Please set the environment variable and install dependencies, following commands below. This will ensure a seamless execution of the training and evaluation scripts:

Dependencies:

pip install -r requirements.txt

Environment variable:

export NMST_DIR=/path/to/non-monotonic-self-terminating-lm

3. Running experiments:

WikiText-2

Make sure that you are in the following directory.

cd ${NMST_DIR}/src/wiki2

Training:

Note: The default argparse configuration for train.py in ${NMST_DIR}/src/wiki2 is based on the hyperparameters used in the NMST+LSTM (1e-4) experiment. The epsilon value for NMST+ and ST+ can be set to any value in (0,1).

Please refer to the following example commands comparing NMST/ST/VA parameterizations with an epsilon value of 1e-5 as a guide for running the WikiText-2 experiments.

LSTM:

VA+LSTM

python train.py --loss mle --rnn-type nn.LSTM --dropout 0.5 --embedding-dim 512 --num-layers 2 --hidden-size 512 --rnn-dropout 0.0 --batch-size 32 --expr-name lstm_lm

ST+LSTM (1e-5)

python train.py --loss st --epsilon 1e-5 --rnn-type nn.LSTM --dropout 0.5 --embedding-dim 512 --num-layers 2 --hidden-size 512 --rnn-dropout 0.0 --batch-size 32 --expr-name lstm_st-1e-5

NMST+LSTM (1e-5)

python train.py --loss nmst --epsilon 1e-5 --rnn-type nn.LSTM --dropout 0.5 --embedding-dim 512 --num-layers 2 --hidden-size 512 --rnn-dropout 0.0 --batch-size 32 --expr-name lstm_nmst-1e-5

RNN:

VA+RNN

python train.py --loss mle --rnn-type nn.RNN --dropout 0.3 --embedding-dim 256 --num-layers 2 --hidden-size 256 --rnn-dropout 0.0 --batch-size 32 --expr-name rnn_lm

ST+RNN (1e-5)

python train.py --loss st --epsilon 1e-5 --rnn-type nn.RNN --dropout 0.3 --embedding-dim 256 --num-layers 2 --hidden-size 256 --rnn-dropout 0.0 --batch-size 32 --expr-name rnn_st-1e-5

NMST+RNN (1e-5)

python train.py --loss nmst --epsilon 1e-5 --rnn-type nn.RNN --dropout 0.3 --embedding-dim 256 --num-layers 2 --hidden-size 256 --rnn-dropout 0.0 --batch-size 32 --expr-name rnn_nmst-1e-5

Inference:

python evaluate.py --model-load-dir ${NMST_DIR}/checkpoint/wiki2/MODEL_DIR --DECODING_METHOD 1

The evaluate.py scipt supports multiple decoding methods, including greedy decoding, ancestral sampling, top-k sampling, nucleus sampling, beam search, as well as consistent top-k and consistent nucleus sampling, as proposed in the paper "Consistency of a Recurrent Language Model with Respect to Incomplete Decoding." Please make sure to choose the appropriate decoding hyperparameters (such as k in top-k sampling or beam size for beam search) before running the inference.

WikiText-103

First, prepare the dataset by running the following commands:

cd ${NMST_DIR}/src/gpt2
python preprocess.py

Training:

Once completed, finetune GPT-2 (124M) using train.py script:

VA+GPT2

python train.py --loss mle --expr-name lm --model-name gpt2 --bucketing 1 --eval 1 --decode 1

ST+GPT2 (1e-5)

python train.py --loss st --epsilon 1e-5 --expr-name st_1e-5 --model-name gpt2 --bucketing 1 --eval 1 --decode 1

NMST+GPT2 (1e-5)

python train.py --loss nmst --epsilon 1e-5 --expr-name nmst_1e-5 --model-name gpt2 --bucketing 1 --eval 1 --decode 1

Inference:

python eval.py --model-load-dir ${NMST_DIR}/checkpoint/gpt2/MODEL_DIR --DECODING_METHOD 1 --expr-name EVAL_RUN_NAME

The eval.py scipt supports greedy decoding, ancestral sampling, top-k sampling, nucleus sampling, and beam search. Please make sure to choose the appropriate decoding hyperparameters (such as k in top-k sampling or beam size for beam search) before running the inference.

4. Logs:

The Weights and Biases logs for all experimental results can be accessed through the links provided below:

WikiText-2:

Training:

https://wandb.ai/eugenechoi/nmst-rnn/workspace?workspace=user-eugenechoi

(Note: You can filter the logs to view only the RNN or LSTM experiments by selecting rnn_type from the filter dropdown menu.)

WikiText-103:

Training:

https://wandb.ai/eugenechoi/nmst-gpt2/workspace?workspace=user-eugenechoi

Decoding:

Perplexity & Greedy: https://wandb.ai/eugenechoi/nmst-gpt2-greedy/workspace?workspace=user-eugenechoi
Nucleus: https://wandb.ai/eugenechoi/nmst-gpt2-topp/workspace?workspace=user-eugenechoi
Top-k: https://wandb.ai/eugenechoi/nmst-gpt2-topk/workspace?workspace=user-eugenechoi
Beam Search: https://wandb.ai/eugenechoi/nmst-gpt2-beam/workspace?workspace=user-eugenechoi

BibTex:

Please use the following bib to cite our work:

@inproceedings{
choi2023a,
title={A Non-monotonic Self-terminating Language Model},
author={Eugene Choi and Kyunghyun Cho and Cheolhyoung Lee},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=vw-5EgYbJZr}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Non-monotonic Self-terminating Language Model

1. Overview:

2. Setup:

Dependencies:

Environment variable:

3. Running experiments:

WikiText-2

Training:

RNN:

Inference:

WikiText-103

Training:

Inference:

4. Logs:

WikiText-2:

Training:

WikiText-103:

Training:

Decoding:

BibTex:

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
LICENSE		LICENSE
README.md		README.md
eos.png		eos.png
requirements.txt		requirements.txt

License

nyu-dl/non-monotonic-self-terminating-lm

Folders and files

Latest commit

History

Repository files navigation

A Non-monotonic Self-terminating Language Model

1. Overview:

2. Setup:

Dependencies:

Environment variable:

3. Running experiments:

WikiText-2

Training:

RNN:

Inference:

WikiText-103

Training:

Inference:

4. Logs:

WikiText-2:

Training:

WikiText-103:

Training:

Decoding:

BibTex:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages