Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add falcon-180b support #4300

Draft
wants to merge 2 commits into
base: ds-inference/add-falcon-support
Choose a base branch
from

Conversation

RezaYazdaniAminabadi
Copy link
Contributor

Load Falcon-180B in about 20 second using TP-sharded checkpoints. The text quality seems fine however requires some more accuracy testing.

Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:20<00:00,  2.58s/it]
checkpoint loading time at rank 4: 19.99671220779419 sec                                                                                             | 0/1 [00:00<?, ?it/s]
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.89it/s]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.50s/it]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:20<00:00,  2.42s/it]
checkpoint loading time at rank 0: 20.810762882232666 sec
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.71it/s]
checkpoint loading time at rank 7: 20.186130046844482 sec
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.08it/s]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:21<00:00,  2.70s/it]
checkpoint loading time at rank 6: 21.765174627304077 sec                                                                                            | 0/1 [00:00<?, ?it/s]
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.77it/s]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:24<00:00,  3.02s/it]
checkpoint loading time at rank 5: 24.360199213027954 sec                                                                                            | 0/1 [00:00<?, ?it/s]
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.81it/s]
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
------------------------------------------------------
Free memory : 33.154175 (GigaBytes)  
Total memory: 79.169678 (GigaBytes)  
Requested memory: 0.906250 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x7fbece000000 
------------------------------------------------------
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Context: Deep learning involves the use of neural networks
Generated: Result: Deep learning involves the use of neural networks to train large volumes of data and is widely used for autonomous driving, robotic control, image recognition, and natural language processing.                                                        
A fundamental challenge in deep learning is ensuring the accuracy of the models developed by neural networks. This is important in any machine learning task but becomes especially key in applications that can have broad societal impact. One emerging technique is adversarial learning, which involves training deep learning algorithms to predict results that "fool" a model under

Performance: generating 100 tokens for 1 batch in 3.95 sec (39.46ms per token, 1.14 TB/s of memory BW utilization).

Needs some cleanup before merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant