Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For sample indexing we fix the uneven sampling #226

Merged
merged 6 commits into from
Sep 24, 2024

Conversation

hariharan-devarajan
Copy link
Collaborator

1. Fix uneven sampling done for index based and iterative
2. Add a validation step to ensure we can validate that global indices are correctly shuffled and no indices are lost.
3. Make sure we do file and sample shuffling in reconfigure step.
4. Remove sample shuffling from dataloader Sampler code.
5. Added test case to support uneven file distributions #225
@hariharan-devarajan
Copy link
Collaborator Author

@zhenghh04 I investigated #221. It was not redundant shuffling. The current shuffling on reconfigure was only doing file shuffling, and the sampler in data loaders was doing sample shuffling. I made it more streamlined by doing both during reconfigure. Also added a validation step on reconfigure to make sure we read everything in every epoch. This PR Fixes #221

@hariharan-devarajan
Copy link
Collaborator Author

@zhenghh04 Can we merge this?

@zhenghh04 zhenghh04 merged commit 3732663 into main Sep 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexError: list index out of range when running custom.yaml file with custom num_files_train parameter
2 participants