Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoder and Decoder has different TP_SIZE #1121

Open
heavyrain-lzy opened this issue Sep 5, 2024 · 0 comments
Open

Encoder and Decoder has different TP_SIZE #1121

heavyrain-lzy opened this issue Sep 5, 2024 · 0 comments

Comments

@heavyrain-lzy
Copy link

Your question

  1. I have a question about creating the pp groups when enabling context_parallel_size > 1 and encoder_tensor_parallel_size != tensor_parallel_size.

When enabling context_parallel, the input will be split symmetrically to balance the calculation. Using zip(cycle(e_ranks), d_ranks) is wrong.

# Map 1 encoder tp rank to several decoder tp ranks, because

  1. Why do we use stack operator to calculate the sum of received tensor.
    return torch.stack(x, dim=0).sum(dim=0, dtype=torch.float32).to(x[0].dtype)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant