Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the new feature of FPDT #6462

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
c076827
fix the bug of deepspeed sequence parallel working with batch size la…
Aug 3, 2024
1b8a8c1
Merge branch 'master' into master
samadejacobs Aug 6, 2024
ed34e89
apply yapf formatting
Aug 7, 2024
89b119e
Formatting fixes
loadams Aug 7, 2024
7db5798
Merge branch 'microsoft:master' into master
YJHMITWEB Aug 28, 2024
0beff24
add FPDT
Aug 29, 2024
4522ed7
Merge branch 'master' into master
YJHMITWEB Aug 29, 2024
c15d1d8
Merge branch 'master' into master
tjruwase Sep 6, 2024
69f3892
modify streams
Sep 24, 2024
8ef9f5a
modify streams
Sep 24, 2024
b43c5ec
Merge branch 'master' into master
loadams Sep 27, 2024
a55d1f5
remove duplication of alltoall
Oct 7, 2024
1cbd59d
Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed
Oct 7, 2024
6bfd76f
remove duplication of pos
Oct 7, 2024
4eeadca
fix format
Oct 7, 2024
8994991
Merge branch 'master' into master
tohtana Oct 10, 2024
128286c
fix format and add unit test for fpdt
Oct 10, 2024
386f606
Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed
Oct 10, 2024
ebea5b0
add einops
Oct 10, 2024
5c8eec8
add flashattn
Oct 11, 2024
a7e175a
Merge branch 'master' into master
tohtana Oct 11, 2024
764a572
add requirements for flash-attn in FPDT
Oct 14, 2024
14e91b8
Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed
Oct 14, 2024
77dcd38
Merge branch 'master' into master
tohtana Oct 17, 2024
7a5c29c
Merge branch 'master' into master
tohtana Oct 22, 2024
534cb93
skip test when fa is unavailable
tohtana Nov 5, 2024
972ddda
formatting
tohtana Nov 5, 2024
37bc694
add workflow to run a6000 tests
tohtana Nov 5, 2024
8f8aaa0
Merge branch 'master' into FPDT
tohtana Nov 5, 2024
ac7baf6
revert world sizes for tests
tohtana Nov 5, 2024
0d2b624
Merge pull request #1 from YJHMITWEB/tohtana/merge_FPDT
tohtana Nov 5, 2024
8935529
update workflow
tohtana Nov 5, 2024
edd2e05
update image version
tohtana Nov 5, 2024
464d117
remove --no-build-isolation
tohtana Nov 5, 2024
7389f66
remove requirements file for flash-attn
tohtana Nov 5, 2024
5f859be
remove flash-attn requirements from setup.py
tohtana Nov 5, 2024
56cb647
fix pip command
tohtana Nov 5, 2024
164f459
modify unit test for fpdt
Nov 5, 2024
3eb816d
modify unit test for fpdt
Nov 5, 2024
2ae68dc
modify unit test for fpdt
Nov 5, 2024
b1b2688
modify unit test for fpdt
Nov 5, 2024
67aa3df
modify unit test for fpdt
Nov 5, 2024
42461d2
modify unit test for fpdt
Nov 5, 2024
d637d60
modify unit test for fpdt
Nov 5, 2024
907c79d
modify unit test for fpdt
Nov 5, 2024
8f5d039
modify unit test for fpdt
Nov 5, 2024
02c2fbf
modify unit test for fpdt
Nov 5, 2024
f570213
modify unit test for fpdt
Nov 5, 2024
5b8c419
add condition for using fpdt offloading
Nov 5, 2024
bd090c8
add condition for using fpdt offloading
Nov 5, 2024
e48e85b
add flash-attn version check
Nov 5, 2024
af24777
Merge branch 'master' into master
tohtana Nov 6, 2024
ebaf56c
add unit test directory as test trigger
tohtana Nov 6, 2024
9e811b8
add cron for test and reporting for nightly CI failures
tohtana Nov 6, 2024
a7522da
add multiGPU fpdt unit test
Nov 7, 2024
209adab
add multiGPU fpdt unit test
Nov 7, 2024
dbeea8a
add multiGPU fpdt unit test
Nov 7, 2024
845e42d
add multiGPU fpdt unit test
Nov 7, 2024
8b2549c
add multiGPU fpdt unit test
Nov 7, 2024
058c973
add multiGPU fpdt unit test
Nov 7, 2024
0dcc234
add multiGPU fpdt unit test
Nov 7, 2024
d1be5d3
add multiGPU fpdt unit test
Nov 7, 2024
3a0feba
add multiGPU fpdt unit test
Nov 7, 2024
8c57812
add multiGPU fpdt unit test
Nov 7, 2024
43decf6
add multiGPU fpdt unit test
Nov 8, 2024
d39585c
add multiGPU fpdt unit test
Nov 8, 2024
389b1a3
add multiGPU fpdt unit test
Nov 8, 2024
958f3bf
add multiGPU fpdt unit test
Nov 8, 2024
af025c5
add multiGPU fpdt unit test
Nov 8, 2024
2230377
Merge branch 'master' into master
tohtana Nov 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/nv-flash-attn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: nv-flash-attn

on:
workflow_dispatch:
pull_request:
paths:
loadams marked this conversation as resolved.
Show resolved Hide resolved
- 'deepspeed/sequence/**'
- 'tests/unit/sequence_parallelism/**'
- '.github/workflows/nv-flash-attn.yml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a cron trigger as well to this? And perhaps the Nightly CI failure issue creation from here - that way if a PR doesn't trigger this we will still know if it fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @loadams for the suggestion! I added both. Can you check them?


jobs:
unit-tests:
runs-on: [self-hosted, nvidia, a6000]
container:
image: nvcr.io/nvidia/pytorch:24.03-py3
ports:
- 80
options: --gpus all --shm-size "8G"

steps:
- uses: actions/checkout@v4

- name: Check container state
run: |
ldd --version
nvcc --version
nvidia-smi
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
- name: Install transformers
run: |
git clone --depth=1 https://github.com/huggingface/transformers
cd transformers
git rev-parse --short HEAD
python -m pip install .
- name: Install deepspeed
run: |
python -m pip install .[dev]
ds_report
- name: Install FlashAttention
run: |
python -m pip install flash-attn
- name: Python environment
run: |
python -m pip list
- name: Unit tests
run: |
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
cd tests
python -m pytest --color=yes --durations=0 --verbose -rF unit/sequence_parallelism/test_ulysses.py --torch_ver="2.3" --cuda_ver="12"
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,9 @@ def is_activation_to_checkpoint(item):
Is an activation to be checkpointed
"""
global mp_size
return torch.is_tensor(item) and item.is_floating_point() and item.numel() >= mp_size
extra_flag = (not hasattr(item, 'no_checkpointing')) or (hasattr(item, 'no_checkpointing')
and item.no_checkpointing == False)
return torch.is_tensor(item) and item.is_floating_point() and item.numel() >= mp_size and extra_flag


def partition_activations(args, cpu_checkpoint, contiguous_checkpoint):
Expand Down
Loading
Loading