Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search local shards directly to find shard to evict #791

Closed
wants to merge 1 commit into from

Conversation

coryMosaicML
Copy link
Contributor

This PR changes the search for coldest shard to avoid looping over remote shards by considering local shards only as possible candidates for eviction.

@j316chuck
Copy link

@j316chuck
Copy link

j316chuck commented Sep 27, 2024

You can invoke like this yourself to test against cache_limit yaml!

Screenshot 2024-09-27 at 3 41 37 PM

Results will show up in #genai-regression-testing-dev in a couple hours

@j316chuck
Copy link

j316chuck commented Sep 30, 2024

Screenshot 2024-09-30 at 2 45 49 PM

Throughput looks normal, can we rev/merge @snarayan21 @XiaohanZhangCMU

Copy link
Member

@XiaohanZhangCMU XiaohanZhangCMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the evicted shard is currently used by other processes? Is there a mechanism to prevent that?

streaming/base/dataset.py Show resolved Hide resolved
Copy link
Collaborator

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nice! agreed with @XiaohanZhangCMU 's comments. I don't think we actually need a mechanism to make sure the local shard isn't used...according to regression tests, this seems fine to me. Also we're choosing the coldest shard anyways.

@XiaohanZhangCMU
Copy link
Member

@snarayan21 yeah, I'm ok with it. The question I had is not valid, which should have been handled by the get_item retry logic already.

@snarayan21 snarayan21 mentioned this pull request Oct 1, 2024
8 tasks
@coryMosaicML
Copy link
Contributor Author

Closing because it is superseded by 795.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants