Skip to content

LODA Program Mining using Neural Machine Translation (NMT)

License

Notifications You must be signed in to change notification settings

loda-lang/nmt-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

NMT Program Mining

LODA Program Mining using Neural Machine Translation (NMT). Contributed by JUrban and barakeel.

Create Training Data

The training data is generated from loda-programs and stored in the TRDIR folder.

export TRDIR=/traindata
cd loda-programs/oeis
for i in `ls -d */`; do cd $i; for j in *; do perl -pe 's/;.*//; s/^ *//; s/mov/M/; s/add/A/; s/sub/S/; s/trn/T/; s/mul/U/; s/div/D/; s/dif/F/; s/mod/O/; s/pow/P/; s/gcd/G/; s/bin/B/; s/cmp/C/; s/min/I/; s/max/X/; s/lpb/L/; s/lpe/E/; s/clr/R/; s/seq/Q/; s/,/# /g; s/[\$]/\$ /g; s/([0-9])/$1 /g;  s/\-/~ /g;' $j | tr "\n" " " | perl -pe 's/^ *//; s/ +/ /g' > $TRDIR/$i/$j; done; cd ..; done

Translate back to LODA:

perl -pe 's/M/; mov/g; s/A/; add/g; s/S/; sub/g; s/T/; trn/g; s/U/; mul/g; s/D/; div/g; s/F/; dif/g; s/O/; mod/g; s/P/; pow/g; s/G/; gcd/g; s/B/; bin/g; s/C/; cmp/g; s/I/; min/g; s/X/; max/g; s/L/; lpb/g; s/E/; lpe/g; s/R/; clr/g; s/Q/; seq/g; s/# */,/g; s/[\$] */\$/g; s/([0-9]) */$1/g; s/\~ */-/g; s/^; *//; s/^ *//; s/ *$//; s/ +/ /g' 

Model Training

The folder TRDIR contains the training data and MODDIR is where the model shall be stored.

export CUDA_VISIBLE_DEVICES=0
time python -m nmt.nmt \
  --encoder_type=bi \
  --learning_rate=0.4 \
  --max_gradient_norm=3.0 \
  --attention=scaled_luong \
  --num_units=512 \
  --num_gpus=1 \
  --batch_size=512 \
  --src=miz --tgt=ferp \
  --vocab_prefix=$TRDIR/vocab \
  --train_prefix=$TRDIR/train1 \
  --dev_prefix=$TRDIR/dev \
  --test_prefix=$TRDIR/test \
  --out_dir=$MODDIR/model \
  --num_train_steps=20000 \
  --steps_per_stats=20000 \
  --steps_per_external_eval=20000 \
  --tgt_max_len=180 \
  --tgt_max_len_infer=280 \
  --src_max_len=100 \
  --num_layers=2 \
  --dropout=0.2 \
  --metrics=bleu \
  --beam_width=240 \
  --num_translations_per_input=240 > $TRDIR/train.log-tr-1 

Generate Programs (Inference)

time python -m nmt.nmt --beam_width=240 --num_translations_per_input=240 --out_dir=$MODDIR/model --inference_input_file=$inp --inference_output_file=$out >> $TRDIR/train.log-inf;

$inp is a file with the OEIS sequences for which we want program predictions encoded like this:

_ # _ # _ # _ # _ # _ # 1 2 6 0 # 5 #
~ 1 2 8 # ~ 1 2 8 # 1 9 2 # ~ 1 2 8 # 3 2 # 3 2 # ~ 4 8 # 3 2 # ~ 8 # ~ 8 # 1 2 # ~ 8 # 2 # 2 # ~ 3 # 2 #
1 # 1 # 1 # 1 # 1 # 1 # 1 # 0 # 1 # 1 # 1 # 0 # 0 # 1 # 0 # 0 #
7 5 # 6 6 # 5 7 # 4 8 # 4 1 # 3 4 # 2 7 # 2 2 # 1 7 # 1 2 # 9 # 6 # 3 # 2 # 1 # 0 #

We only use 16 input values (this is arbitrary) and turn any values with more than 6 digits into "_" . "-" (minus) becomes "~" .

We also reverse the sequence (probably not needed with our current NMT params).

Validating and Submitting Programs

Add a generator with "version": 8 to the LODA miners.json config. Set the batchFile to the file that contains your generated programs. Add a new miner config (nmt below) and reference the new generator by its name.

{
  "miners": [
    {
      "name": "nmt",
      "overwrite": "none",
      "enabled": true,
      "backoff": true,
      "generators": [
        "v8"
      ],
      "matchers": [
        "direct",
        "linear1",
        "linear2",
        "binary",
        "decimal"
      ]
    },
  ],
  "generators": [
    {
      "name": "v8",
      "version": 8,
      "batchFile": "/path/to/oeis-loda-nmt-predictall"
    }
  ]
}

Run the validation (if you run in client mode, found programs will be automatically submitted):

loda mine -i nmt

To parallelize it, you need to edit the miners.json file:

  • create multiple "generators" where each point to a different chunk
  • create multiple "miners" (with different names) each pointing to another generator

To limit the evaluation time per program, use -z option. To allow updating existing programs (slower), set overwrite to auto or all.

About

LODA Program Mining using Neural Machine Translation (NMT)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published