-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spot/Serve] Optimize the translation of filemounts #4016
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature @warrickhe ! It would be really helpful for us to reduce the file upload latency. Left some discussion ; )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR @warrickhe! This is awesome.
Haven't looked into the code, but just a reminder in case we have missed it in our implementation: our original behavior for rsync will exclude files specified with .gitignore
. Just curious, if we have this behavior in our compression logic as well. : )
Good point! I will modify the code along with Tian's suggestion to maintain the original behaviour . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warrickhe Thanks for the prompt fix! I tried this with sky serve (examples/serve/http_server/task.yaml
, I forcefully enabled the compress feature.) but seems like i got a lot of zipped files in the cloud bucket, while expected to only get one. Not sure if it has anything to do with the other file mounts. Could you please take a look on this?
I 10-08 11:51:15 backend_utils.py:1343] Syncing (to 1 node): /tmp/service-task-new-http-2ok7ehj6 -> ~/.sky/serve/new_http/task.yaml.tmp
I 10-08 11:51:16 backend_utils.py:1343] Syncing (to 1 node): /tmp/tmp3oxekj8x -> ~/.sky/serve/new_http/config.yaml
I 10-08 11:51:16 backend_utils.py:1343] Syncing (to 1 node): /home/txia/.sky/catalogs/v5/aws/images.csv -> ~/.sky/catalogs/v5/aws/images.csv
I 10-08 11:51:16 backend_utils.py:1343] Syncing (to 1 node): /home/txia/.sky/catalogs/v5/aws/az_mappings-6d23a727.csv -> ~/.sky/catalogs/v5/aws/az_mappings-6d23a727.csv
I 10-08 11:51:17 backend_utils.py:1343] Syncing (to 1 node): /home/txia/.sky/catalogs/v5/azure/images.csv -> ~/.sky/catalogs/v5/azure/images.csv
I 10-08 11:51:17 backend_utils.py:1343] Syncing (to 1 node): /home/txia/.sky/catalogs/v5/azure/vms.csv -> ~/.sky/catalogs/v5/azure/vms.csv
I 10-08 11:51:17 backend_utils.py:1343] Syncing (to 1 node): /home/txia/.sky/catalogs/v5/azure/az_ml_vms.csv -> ~/.sky/catalogs/v5/azure/az_ml_vms.csv
I 10-08 11:51:18 backend_utils.py:1343] Syncing (to 1 node): /home/txia/.sky/catalogs/v5/gcp/vms.csv -> ~/.sky/catalogs/v5/gcp/vms.csv
Could you show how you forcefully enabled compression in so I may try to replicate this issue? It's seems like that my code is trying to compress every file individually, which leads me to believe that the way you forcefully enable compression may have been incorrect. My code could also be going a bit haywire, but it's hard to say without access to the bucket and workspace. What is confusing is how there are 20 files when the workdir should only have 2, I will try to dig into this |
sky/data/storage.py
Outdated
if file.startswith('skypilot-filemounts-' | ||
) and file.endswith('.tar.gz'): | ||
logger.warning(f'Consider renaming the file: {file}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should directly raise an error here since IIUC this will essentially remove the user file by override them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be better to leave it as a warning and call _sync_store from here, so the run will complete?
(Work in Progress) PR to fix #3188. Added compression to local file -> intermediate bucket translation. Still working on the decompression side.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh