GitHub - unconst/MACH: Biological-Scale Neural Networks

Biological Scale Neural Networks

Sequential Distillation models.

███████╗ ██████╗ ██████╗     █████╗ ██╗
██╔════╝██╔═══██╗██╔══██╗   ██╔══██╗██║
█████╗  ██║   ██║██████╔╝   ███████║██║
██╔══╝  ██║   ██║██╔══██╗   ██╔══██║██║
██║     ╚██████╔╝██║  ██║██╗██║  ██║██║
╚═╝      ╚═════╝ ╚═╝  ╚═╝╚═╝╚═╝  ╚═╝╚═╝

"In reality, the law always contains less than the fact itself, because it does not reproduce the fact as a whole but only in that aspect of it which is important for us, the rest being intentionally or from necessity omitted."

-- Ernst Mach

TL;DR

Increasing depth-wise scaling by training sequential distillation models.

Motivation

Depth is good, but deeper networks increasingly suffer from ’gradient locking’ if the network is required to update synchronously [1] [2] [4]. This issue can be avoided using depth-wise model parallelism, where sections of the network train independently. This, however, creates delayed gradients [5] which affect convergence when component depth exceed a certain size.

This research investigates a new class of neural network architecture which is composed of many sequentially connected sub-components each training asynchronously and distilling knowledge from their child.

The resulting model is, by its nature, a distilled versions of itself thus immediately usable in a production environment at reduced computational cost. Finally, the approach is well suited to an internet-wide environment making p2p training a potential avenue for future research.

For a deeper description read the research or join proj-mach on slack.

Run

$ virtualenv env && source env/bin/activate && pip install -r requirements.txt
$ python main.py

Resources

Paper: https://www.overleaf.com/read/fvyqcmybsgfj
Code: https://www.github.com/unconst/Mach

Pull Requests

Use Yapf for code formatting

$ pip install yapf
$ yapf --style google -r -vv -i .

References:

References

Decoupled Neural Interfaces using Synthetic Gradients.
https://arxiv.org/pdf/1608.05343.pdf
Decoupled Parallel Backpropagation with Convergence Guarantee.
https://arxiv.org/pdf/1804.10574.pdf
Outrageously Large Neural Networks: Sparsely Gated Mixtures of Experts.
https://arxiv.org/abs/1701.06538
AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks.
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/07/1705.09786.pdf
An analysis of delayed gradients problem in asynchronous SGD.
https://pdfs.semanticscholar.org/716b/a3d174006c19220c985acf132ffdfc6fc37b.pdf
Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher.
https://arxiv.org/abs/1902.03393

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
assets		assets
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
__init__.py		__init__.py
main.py		main.py
model.py		model.py
model_fn.py		model_fn.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biological Scale Neural Networks

Sequential Distillation models.

TL;DR

Motivation

Run

Resources

Pull Requests

References:

About

Releases

Packages

Contributors 2

Languages

unconst/MACH

Folders and files

Latest commit

History

Repository files navigation

Biological Scale Neural Networks

Sequential Distillation models.

TL;DR

Motivation

Run

Resources

Pull Requests

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages