-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow writers to overwrite existing data #594
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for creating the PR. The changes looks good to me. I have posted some minor comments.
@XiaohanZhangCMU Can you please also take a look?
Looking at the tests that are failing: I believe there's now a conflict when |
@JAEarly Currently exists_ok=True in clouduploader.py is only used by merging index (streaming/base/utils/merge_index). Your change will make that fail. I suggest you add the "file-removing behavior" to mdswriter instead of upload.py. |
30216a2
to
2094c83
Compare
2094c83
to
e9ca975
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Wait for ci checks /
e9ca975
to
32b1fe7
Compare
Should be good to merge, I've rebased just need the CI checks to be run again, thanks! |
Sure. Just kicked off the CI tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the improvement!
Description of changes:
CloudUploader has an
exist_ok
kwarg that is set to False by default.Writers (e.g. MDSWriter) cannot overwrite this.
This PR makes two changes:
exist_ok
can now be set by a writer.exist_ok
is True, existing files are now deleted. This allows easy debugging when developing streaming datasets as the files do not have to be manually deleted each time the code is run.Tests and docs updated accordingly.
Merge Checklist:
Put an
x
without space in the boxes that apply. If you are unsure about any checklist, please don't hesitate to ask. We are here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
pre-commit
on my change. (check out thepre-commit
section of prerequisites)