forked from AndreasMadsen/stable-nalu
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
make readme about MC-LSTM and refer to other code
- Loading branch information
Showing
1 changed file
with
44 additions
and
105 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,107 +1,46 @@ | ||
# Neural Arithmetic Units | ||
|
||
This code encompass two publiations. The ICLR paper is still in review, please respect the double-blind review process. | ||
|
||
![Hidden Size results](readme-image.png) | ||
|
||
_Figure, shows performance of our proposed NMU model._ | ||
|
||
## Publications | ||
|
||
#### SEDL Workshop at NeurIPS 2019 | ||
|
||
Reproduction study of the Neural Arithmetic Logic Unit (NALU). We propose an improved evaluation criterion of arithmetic tasks including a "converged at" and a "sparsity error" metric. Results will be presented at [SEDL|NeurIPS 2019](https://sites.google.com/view/sedl-neurips-2019/#h.p_vZ65rPBhIlB4). – [Read paper](http://arxiv.org/abs/1910.01888). | ||
|
||
```bib | ||
@inproceedings{maep-madsen-johansen-2019, | ||
author={Andreas Madsen and Alexander Rosenberg Johansen}, | ||
title={Measuring Arithmetic Extrapolation Performance}, | ||
booktitle={Science meets Engineering of Deep Learning at 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)}, | ||
address={Vancouver, Canada}, | ||
journal={CoRR}, | ||
volume={abs/1910.01888}, | ||
month={October}, | ||
year={2019}, | ||
url={http://arxiv.org/abs/1910.01888}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.LG}, | ||
arxivId = {2001.05016}, | ||
eprint={1910.01888} | ||
# Neural Arithmetic with MC-LSTM | ||
|
||
This repository contains the code for part of the experiments in the MC-LSTM paper. | ||
More specifically, the experiments that benchmark MC-LSTM on arithmetic tasks. | ||
The code for other experiments can be found in the [main repo](https://github.com/ml-jku/mc-lstm). | ||
|
||
For these experiments we used the code base from | ||
[Madsen and Johansen (2021)](https://openreview.net/forum?id=H1gNOeHKPS). | ||
The starting point for this code is tagged `madsen`. | ||
To get an overview of the changes we made, you can run `git diff madsen`. | ||
|
||
The pytorch module for MC-LSTM can be found in `stable_nalu/layer/mclstm.py`, | ||
the fully-connected layer with mass conservation is in `stable_nalu/layer/mcfc.py`. | ||
|
||
### Reproduction | ||
|
||
Experiments should be reproducible by running the shell scripts `mclstm_*.sh`. | ||
The bash scripts repeatedly train different networks using pytorch (python), | ||
and then call R scripts to generate a table or figure to summarise the experiment. | ||
The relevant information for every single run is also logged to tensorboard. | ||
|
||
###### Requirements | ||
|
||
You should be able to use the `setup.py` file to install the code on your system. | ||
Alternatively, you can install the requirements in `setup.py` manually | ||
and run the code by setting the `PYTHONPATH` to the top-level directory. | ||
|
||
To run the scripts as they are, you would need a graphics card with at least 11GB VRAM. | ||
Furthermore experiments were performed on 2 18-core CPUs and 384GB RAM, | ||
but it should be no problem to run the scripts on any modern high-end PC. | ||
|
||
### Paper | ||
|
||
To cite this work, you can use the following bibtex entry: | ||
```bib | ||
@report{mclstm, | ||
author = {Hoedt, Pieter-Jan and Kratzert, Frederik and Klotz, Daniel and Halmich, Christina and Holzleitner, Markus and Nearing, Grey and Hochreiter, Sepp and Klambauer, G{\"u}nter}, | ||
title = {MC-LSTM: Mass-Conserving LSTM}, | ||
institution = {Institute for Machine Learning, Johannes Kepler University, Linz}, | ||
type = {preprint}, | ||
date = {2021}, | ||
url = {http://arxiv.org/abs/2101.05186}, | ||
eprinttype = {arxiv}, | ||
eprint = {2101.05186}, | ||
} | ||
``` | ||
|
||
#### ICLR 2020 (Spotlight) | ||
|
||
Our main contribution, which includes a theoretical analysis of the optimization challenges with the NALU. Based on these difficulties we propose several improvements. – [Read paper](https://openreview.net/forum?id=H1gNOeHKPS). | ||
|
||
```bib | ||
@inproceedings{mnu-madsen-johansen-2020, | ||
author = {Andreas Madsen and Alexander Rosenberg Johansen}, | ||
title = {{Neural Arithmetic Units}}, | ||
booktitle = {8th International Conference on Learning Representations, ICLR 2020}, | ||
volume = {abs/2001.05016}, | ||
year = {2020}, | ||
url = {http://arxiv.org/abs/2001.05016}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.LG}, | ||
arxivId = {2001.05016}, | ||
eprint={2001.05016} | ||
} | ||
``` | ||
|
||
## Install | ||
|
||
```bash | ||
python3 setup.py develop | ||
``` | ||
|
||
This will install this code under the name `stable-nalu`, and the following dependencies if missing: `numpy, tqdm, torch, scipy, pandas, tensorflow, torchvision, tensorboard, tensorboardX`. | ||
|
||
## Experiments used in the paper | ||
|
||
All experiments results shown in the paper can be exactly reproduced using fixed seeds. The `lfs_batch_jobs` | ||
directory contains bash scripts for submitting jobs to an LFS queue. The `bsub` and its arguments, can be | ||
replaced with `python3` or an equivalent command for another queue system. | ||
|
||
The `export` directory contains python scripts for converting the tensorboard results into CSV files and | ||
contains R scripts for presenting those results, as presented in the paper. | ||
|
||
## Naming changes | ||
|
||
As said earlier the naming convensions in the code are different from the paper. The following translations | ||
can be used: | ||
|
||
* Linear: `--layer-type linear` | ||
* ReLU: `--layer-type ReLU` | ||
* ReLU6: `--layer-type ReLU6` | ||
* NAC-add: `--layer-type NAC` | ||
* NAC-mul: `--layer-type NAC --nac-mul normal` | ||
* NAC-sigma: `--layer-type PosNAC --nac-mul normal` | ||
* NAC-nmu: `--layer-type ReRegualizedLinearPosNAC --nac-mul normal --first-layer ReRegualizedLinearNAC` | ||
* NALU: `--layer-type NALU` | ||
* NAU: `--layer-type ReRegualizedLinearNAC` | ||
* NMU: `--layer-type ReRegualizedLinearNAC --nac-mul mnac` | ||
|
||
## Extra experiments | ||
|
||
Here are 4 experiments in total, they correspond to the experiments in the NALU paper. | ||
|
||
``` | ||
python3 experiments/simple_function_static.py --help # 4.1 (static) | ||
python3 experiments/sequential_mnist.py --help # 4.2 | ||
``` | ||
|
||
Example with using NMU on the multiplication problem: | ||
|
||
```bash | ||
python3 experiments/simple_function_static.py \ | ||
--operation mul --layer-type ReRegualizedLinearNAC --nac-mul mnac \ | ||
--seed 0 --max-iterations 5000000 --verbose \ | ||
--name-prefix test --remove-existing-data | ||
``` | ||
|
||
The `--verbose` logs network internal measures to the tensorboard. You can access the tensorboard with: | ||
|
||
``` | ||
tensorboard --logdir tensorboard | ||
``` |