Skip to content

Code for the paper 'Masked Mixers for Language Generation and Retrieval'

License

Notifications You must be signed in to change notification settings

blbadger/maskedmixers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

maskedmixers

mixer

Code for the paper 'Masked Mixers for Language Generation and Retrieval', which you can read here. Datasets and trained models will be added soon.

For a less formal version of this work written as a technical blog post, see this page

TL;DR:

Motivation: Poor input representation accuracy in transformers, but much better accuracy in MLP-mixers adapted for causal language modeling (aka masked mixers)

Finding: Masked mixers are approximately as efficient learners of language generation relative to transformers but are far superior for retrieval.

General Use

Unless you want to replicate a specific experiment, use the src directory to train, run, and evaluate mixers and other related models.

Transformer-mixer implementation

The transfixer implementation is tightly bound to the Huggingface Llama implementation, and may be found here as a branch of the transformers library version 4.42.2.

For Experimental Replication

There are two directories for experimental replication purposes: pc denotes code used for the 1x Nvidia RTX 3060 node and server denotes code used for the 4x V100 node (compatible with DDP).

About

Code for the paper 'Masked Mixers for Language Generation and Retrieval'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published