Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accelerate sim_matrix process in multi-GPU #113

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zsnoob
Copy link

@zsnoob zsnoob commented Nov 24, 2023

I edit two main things:

  1. Deleting the "loss.mean()" that do nothing. DDP provides automatically gradient synchronization.

  2. Refer to this comment, Batch Sharding Details openai/CLIP#132 (comment) we will do every similarity calculation locally. This will use all negative samples in global batch and positive samples in local batch, so local sim_matrix will be shaped in (batch_size / n_gpu, batch_size).

  • But this approach will cause another question that if the loss function always put the first local-batch-size columns as the diagonal elements, and local batch is not the first in global, the correct postive samples will locating at column range: local_rank * local_batch_size - (local_rank + 1) * local_batch_size. So i give the second parameter in torch.diag() which means the first positive sample's column.

By experiments, the model can converge as usual, and more efficient.

@zsnoob zsnoob closed this Nov 24, 2023
@zsnoob zsnoob reopened this Nov 24, 2023
@zsnoob
Copy link
Author

zsnoob commented Nov 24, 2023

Also mentioned in this issue, #101 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant