-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RWKV4neo #20737
Comments
This is super cool !! 🔥 |
Ok I am gonna just set this up on my linux machine m1 setup isn't ready I spent 2 hours on this gonna try again tomorrow sorry D: |
scratch that, it also didn't work, mainly because of my limited harddisk space. Gonna retry mac... |
Fails at building wheels for transformers and onnx
I am on 3.9.11 when running |
hmmm I accepted the xcode license agreement.
|
Alright dev environment installed, the key here was not to use conda but miniforge |
Ok draft created |
Do you know how to convert .pth model to config.json/pytorch_model.bin for RWKV4neo? |
I have a conversion script + draft that has consistent logit ordering with the official implementation here: conversion script: https://github.com/tensorpro/transformers/blob/rwkv_draft/src/transformers/models/rwkv4_neo/convert_rwkv_original_pytorch_checkpoint_to_pytorch.py I can clean it up and turn it into a PR if that would help? |
Sure, there is also #22797 that should be in a pretty good state! I'm about to review it but feel free to add your touch to it if you feel like it! |
Oh cool, that one looks awesome! |
@tensorpro I could really use your scripts but I get 404 when I try to access those links. :/ |
Ah sorry, the links broke when I changed the branch I was working in. I edited the comment to point to the right branch That said, you may want to use the code in #22797 since it will be closer to the official HF version and already supports CUDA accelerated WKV. |
Model description
RWKV - Receptance Weighted Key Value
RWKV is a Sequence to Sequence Model that takes the best features of Generative PreTraining (GPT) and Recurrent Nueral Networks (RNN) that performs Language Modelling (LM). This is used to generate text Auto Regressive manner (AR).
This is a hybrid model.
It has Transformer Level Performance without the quadratic attention mechanism. It borrows ideas from Attention Free Transformers, meaning the attention is a linear in complexity. Allowing for infinite context through the hidden state in RWKV_RNN.
There are two models for RWKV, they are refered to as modes.
RWKV_RNN: This mode is designed for running inference quickly.
RWKV_GPT: This mode is for training or fine tuning your model quickly.
In the first pass we will be implementing RWKV_RNN Although we can weight share to have RWKV_GPT generate the inital context for RWKV_RNN.
Open source status
Provide useful links for the implementation
More from the Research and Development Repository: https://github.com/BlinkDL/RWKV-LM
The text was updated successfully, but these errors were encountered: