maskedmixers

Code for the paper 'Masked Mixers for Language Generation and Retrieval', which you can read here. Datasets and trained models will be added soon.

For a less formal version of this work written as a technical blog post, see this page

TL;DR:

Motivation: Poor input representation accuracy in transformers, but much better accuracy in MLP-mixers adapted for causal language modeling (aka masked mixers)

Finding: Masked mixers are approximately as efficient learners of language generation relative to transformers but are far superior for retrieval.

General Use

Unless you want to replicate a specific experiment, use the src directory to train, run, and evaluate mixers and other related models.

Transformer-mixer implementation

The transfixer implementation is tightly bound to the Huggingface Llama implementation, and may be found here as a branch of the transformers library version 4.42.2.

For Experimental Replication

There are two directories for experimental replication purposes: pc denotes code used for the 1x Nvidia RTX 3060 node and server denotes code used for the 4x V100 node (compatible with DDP).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
pc		pc
server		server
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cover.png		cover.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

maskedmixers

TL;DR:

General Use

Transformer-mixer implementation

For Experimental Replication

About

Releases

Packages

Languages

License

blbadger/maskedmixers

Folders and files

Latest commit

History

Repository files navigation

maskedmixers

TL;DR:

General Use

Transformer-mixer implementation

For Experimental Replication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages