Adding the new feature of FPDT #6462

YJHMITWEB · 2024-08-29T23:48:00Z

FPDT can only be used with this version of Megatron-DeepSpeed.

…rger than 1

delock · 2024-08-30T01:17:12Z

deepspeed/sequence/fpdt_layer.py

+    return out, lse
+
+
+def single_all_to_all(input_, scatter_idx, gather_idx, group):


Hi @YJHMITWEB is this single_all_to_all the same as the one in

DeepSpeed/deepspeed/sequence/layer.py

Line 41 in 89c4d9f

def single_all_to_all(input, scatter_idx, gather_idx, batch_dim_idx, group, async_op=False, handle=None, type=None):

?

We have not tested the non-blocking All2all with our FPDT design, therefore, we use the original non-blocking version. If this is preferred, we can test this.

This is fixed. We will now use the single_all_to_all from layer.py

delock · 2024-08-30T01:20:22Z

deepspeed/sequence/fpdt_layer.py

+
+                compute_stream.wait_stream(offload_stream)
+                compute_stream.synchronize()
+                with torch.cuda.stream(offload_stream):


torch.cuda.stream(stream) should be replaced by get_accelerator().stream(stream), same of other occurance in this file.

delock · 2024-08-30T01:21:28Z

deepspeed/sequence/fpdt_layer.py

+            grad_qkv_linear_bias = torch.zeros(qkv_linear_bias.shape, device=qkv_linear_weight.device, dtype=torch.float)
+
+        grad_global_attn_output_chunk = single_all_to_all(grad_output[:, :chunk_size].contiguous(), scatter_idx, gather_idx, spg)
+        torch.cuda.synchronize()


Here is should be get_accelerator().synchronize()

Got it, thanks.

This one is solved.

inkcherry · 2024-08-30T05:03:39Z

deepspeed/sequence/layer.py

@@ -168,7 +196,8 @@ def __init__(
        self.gather_idx = gather_idx
        self.sp_overlap_comm = False
        self.overlap_handles = None
-        self.sp_stream = sp_stream
+        self.sp_stream = None
+        # self.sp_stream = sp_stream


Why comment out this line?

This one is solved.

deepspeed/sequence/layer.py

deepspeed/sequence/fpdt_layer.py

fix format and add unit test for fpdt

loadams · 2024-11-06T17:37:50Z

.github/workflows/nv-flash-attn.yml

+      - '.github/workflows/nv-flash-attn.yml'
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true


Should we add a cron trigger as well to this? And perhaps the Nightly CI failure issue creation from here - that way if a PR doesn't trigger this we will still know if it fails?

Thank you @loadams for the suggestion! I added both. Can you check them?

Jinghan Yao and others added 6 commits August 3, 2024 00:04

fix the bug of deepspeed sequence parallel working with batch size la…

c076827

…rger than 1

Merge branch 'master' into master

1b8a8c1

apply yapf formatting

ed34e89

Formatting fixes

89b119e

Merge branch 'microsoft:master' into master

7db5798

add FPDT

0beff24

YJHMITWEB requested a review from tjruwase as a code owner August 29, 2024 23:48

Merge branch 'master' into master

4522ed7

YJHMITWEB mentioned this pull request Aug 30, 2024

Adding the new feature of FPDT microsoft/Megatron-DeepSpeed#441

Open

delock reviewed Aug 30, 2024

View reviewed changes

inkcherry reviewed Aug 30, 2024

View reviewed changes

tohtana reviewed Aug 30, 2024

View reviewed changes

deepspeed/sequence/layer.py Outdated Show resolved Hide resolved

tohtana reviewed Aug 30, 2024

View reviewed changes

deepspeed/sequence/fpdt_layer.py Outdated Show resolved Hide resolved

tjruwase and others added 9 commits September 6, 2024 17:59

Merge branch 'master' into master

c15d1d8

modify streams

69f3892

modify streams

8ef9f5a

Merge branch 'master' into master

b43c5ec

remove duplication of alltoall

a55d1f5

Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed

1cbd59d

remove duplication of pos

6bfd76f

fix format

4eeadca

Merge branch 'master' into master

8994991

tohtana requested a review from loadams as a code owner October 10, 2024 15:49

Jinghan Yao and others added 5 commits October 10, 2024 19:00

fix format and add unit test for fpdt

128286c

Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed

386f606

fix format and add unit test for fpdt

add einops

ebea5b0

add flashattn

5c8eec8

Merge branch 'master' into master

a7e175a

Jinghan Yao added 9 commits November 5, 2024 15:21

modify unit test for fpdt

42461d2

modify unit test for fpdt

d637d60

modify unit test for fpdt

907c79d

modify unit test for fpdt

8f5d039

modify unit test for fpdt

02c2fbf

modify unit test for fpdt

f570213

add condition for using fpdt offloading

5b8c419

add condition for using fpdt offloading

bd090c8

add flash-attn version check

e48e85b

tohtana self-requested a review November 6, 2024 16:45

tohtana and others added 2 commits November 6, 2024 08:45

Merge branch 'master' into master

af24777

add unit test directory as test trigger

ebaf56c

loadams reviewed Nov 6, 2024

View reviewed changes

tohtana and others added 17 commits November 6, 2024 17:42

add cron for test and reporting for nightly CI failures

9e811b8

add multiGPU fpdt unit test

a7522da

add multiGPU fpdt unit test

209adab

add multiGPU fpdt unit test

dbeea8a

add multiGPU fpdt unit test

845e42d

add multiGPU fpdt unit test

8b2549c

add multiGPU fpdt unit test

058c973

add multiGPU fpdt unit test

0dcc234

add multiGPU fpdt unit test

d1be5d3

add multiGPU fpdt unit test

3a0feba

add multiGPU fpdt unit test

8c57812

add multiGPU fpdt unit test

43decf6

add multiGPU fpdt unit test

d39585c

add multiGPU fpdt unit test

389b1a3

add multiGPU fpdt unit test

958f3bf

add multiGPU fpdt unit test

af025c5

Merge branch 'master' into master

2230377

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the new feature of FPDT #6462

Adding the new feature of FPDT #6462

YJHMITWEB commented Aug 29, 2024 •

edited by samadejacobs

Loading

delock Aug 30, 2024

YJHMITWEB Aug 30, 2024

YJHMITWEB Oct 8, 2024

delock Aug 30, 2024 •

edited

Loading

YJHMITWEB Aug 30, 2024

delock Aug 30, 2024 •

edited

Loading

YJHMITWEB Aug 30, 2024

YJHMITWEB Sep 24, 2024

inkcherry Aug 30, 2024

YJHMITWEB Sep 24, 2024

loadams Nov 6, 2024

tohtana Nov 6, 2024

		return out, lse


		def single_all_to_all(input_, scatter_idx, gather_idx, group):

Adding the new feature of FPDT #6462

Are you sure you want to change the base?

Adding the new feature of FPDT #6462

Conversation

YJHMITWEB commented Aug 29, 2024 • edited by samadejacobs Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delock Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delock Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YJHMITWEB commented Aug 29, 2024 •

edited by samadejacobs

Loading

delock Aug 30, 2024 •

edited

Loading

delock Aug 30, 2024 •

edited

Loading