GPT-Echo is open source research which uses pretrained GPT models to generate embeddings that are then fed into an echo state network (ESN) for memory.
The ESN acts as a contextualizer, preserving semantic information from the GPT embeddings to aid downstream tasks. (thanks GPT4)
The only trainable layer is the readout layer which makes training costs potentially comparable to a fine-tune.
Currently only text generation is supported.
[ Add screenshot ]
The chatbot interface.
Download this repo.
pip3 install -r requirements.txt
- Grab the pretrained model.(Not ready yet)
- run
python3 app.py -m '[foundation model]' -n '[pretrained_model]' -r [reservoir_size] -z [context_length]
python3 train-echo.py -m 'cerebras/Cerebras-GPT-111M' -n test -e 10 -r 1024 -z 128 -t emoji_train.txt -v emoji_test.txt -lr 0.006
This will train the echo network and save emoji.pth
after training for 10 epochs with a 1024 reservoir size, context length of 128 training with cross entropy loss.
This approach has not been scaled. In toy tasks larger foundation models do better.
If you do scale this make sure to do a grid search, a lot of options have to be just right.
Edit search.py
and set your options.
Run it with the same arguments you'd use to train.
These are experimental. They work but do not guarantee better results and can slow down training.
--usecot
- this trains 2 different ESN networks, a mediator and a generator. The mediator than then potentially be used to direct generator sampling(needs more research).--forwardforward
- uses Hinton's forward-forward training the readout layer. For negative samples it support either random uniform, a custom negative dataset, or sampling from the base model.
Dataset | Foundation Model | Download | Reservoir size | Context Length | Epochs | Accuracy |
---|---|---|---|---|---|---|
https://huggingface.co/datasets/OpenAssistant/oasst1 | cerebras/Cerebras-GPT-111M | ... | 768 | 128 | 5 | ? |
https://huggingface.co/datasets/OpenAssistant/oasst1 | OpenAssistant/stablelm-7b-sft-v7-epoch-3 | ... | 1024 | 128 | 0.1 | ? |
This is an experimental and novel technique. There may be bugs. It may not perform as well as other methods. Further evaluation is required.
MIT