-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mamba2 training speed is very very very slow #389
Comments
For my task, image classification, mamba 1 takes 40 mins to run one epoch on rtx 6000 ada and mamba 2 takes only 20. mamba 2 also use less ram too! |
See #355 Although, I've also encountered issues similar to those being described in the issue later on (graph compilation errors). I'd also suggest using torch==2.2.0 with triton 2.2.0 (no idea why but it ran faster than 2.3.0 in my case). |
I also encountered this problem. Running the demo takes around 30 seconds: Code
Output
Environment InformationGPU: NVIDIA A6000, AdditionalI tried adding a decorator in ssd_combined.py as suggested by @Kiet0712 in this comment, but it resulted in a bug similar to what @arelkeselbri described in this comment. Is this inference speed for the demo normal? Or is there something wrong with my code? I would appreciate any help or suggestions! |
Try warming up by running it once first. The first time will invoke the triton compiler & autotune so it'll be slow. |
Thank you so much! The second inference process takes only 0.005 sec. |
I used the same code for testing, I tried running it multiple times and there was no improvement in speed.
|
The compile happens every time you launch 'python demo.py'. Try forward twice in the same script like:
|
@Gaodzlearn The problem is solved, thank you very much for your reply. |
Why is it that I run Mamba no problem, but when I run Mamba-2 I get "'NoneType' object has no attribute 'causal_conv1d_fwd'". I also install the casual-covn1d==1.2.1. |
I change the mamba to mamba2,trainning spped very very slow why?
The text was updated successfully, but these errors were encountered: