Add falcon-180b support #4300

RezaYazdaniAminabadi · 2023-09-11T08:34:23Z

Load Falcon-180B in about 20 second using TP-sharded checkpoints. The text quality seems fine however requires some more accuracy testing.

Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:20<00:00,  2.58s/it]
checkpoint loading time at rank 4: 19.99671220779419 sec                                                                                             | 0/1 [00:00<?, ?it/s]
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.89it/s]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.50s/it]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:20<00:00,  2.42s/it]
checkpoint loading time at rank 0: 20.810762882232666 sec
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.71it/s]
checkpoint loading time at rank 7: 20.186130046844482 sec
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.08it/s]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:21<00:00,  2.70s/it]
checkpoint loading time at rank 6: 21.765174627304077 sec                                                                                            | 0/1 [00:00<?, ?it/s]
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.77it/s]
Loading 8 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:24<00:00,  3.02s/it]
checkpoint loading time at rank 5: 24.360199213027954 sec                                                                                            | 0/1 [00:00<?, ?it/s]
Loading 1 checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.81it/s]
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
------------------------------------------------------
Free memory : 33.154175 (GigaBytes)  
Total memory: 79.169678 (GigaBytes)  
Requested memory: 0.906250 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x7fbece000000 
------------------------------------------------------
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Context: Deep learning involves the use of neural networks
Generated: Result: Deep learning involves the use of neural networks to train large volumes of data and is widely used for autonomous driving, robotic control, image recognition, and natural language processing.                                                        
A fundamental challenge in deep learning is ensuring the accuracy of the models developed by neural networks. This is important in any machine learning task but becomes especially key in applications that can have broad societal impact. One emerging technique is adversarial learning, which involves training deep learning algorithms to predict results that "fool" a model under

Performance: generating 100 tokens for 1 batch in 3.95 sec (39.46ms per token, 1.14 TB/s of memory BW utilization).

Needs some cleanup before merging!

Reza Yazdani added 2 commits September 11, 2023 01:13

add falcon-180b support

2df486c

clean-up

8205b55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add falcon-180b support #4300

Add falcon-180b support #4300

RezaYazdaniAminabadi commented Sep 11, 2023

Add falcon-180b support #4300

Are you sure you want to change the base?

Add falcon-180b support #4300

Conversation

RezaYazdaniAminabadi commented Sep 11, 2023