You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @Monekyzoon this is because after we split sequence into multiple chunks, we cannot tell flash attn kernel what's the sequence offset of each chunk, so flash attn kernel cannot know the correct relative positions of tokens, and the ALiBi bias generated inside the kernel is wrong for each sequence chunk.
https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/attention.py#L3237
assert ( alibi_slopes is None ), "Alibi slope bias addition is not supported with context parallelism."
Why not support alibi?
The text was updated successfully, but these errors were encountered: