-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Make controller regions/ choose from replica resources #4053
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature @euclidgame ! This is awesome. The PR looks mostly good for me. Left some discussions ;)
sky/utils/controller_utils.py
Outdated
for cloud_name, regions in requested_clouds_with_region_zone.items() | ||
for region, zones in regions.items() for zone in zones | ||
for cloud in [clouds.CLOUD_REGISTRY.from_str(cloud_name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit confusing. Lets do explicit for loop instead?
sky/utils/controller_utils.py
Outdated
requested_clouds_with_region_zone[cloud_name] = { | ||
'_allow_any_region': {'_allow_any_zone'} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to left blank if enable any region? same apply for zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is possible, but it requires special handling to determine whether an empty set indicates that any region is allowed, which means we should only add new regions if it’s the first time. Using a placeholder makes it easier. Maybe I can use None
instead of the specific strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @euclidgame ! Looks good to me. Left some discussions ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @euclidgame ! Left some discussions ;)
# 2. All resources has cloud specified. Some of them | ||
# could NOT host controllers. Return a set, only | ||
# containing those could host controllers. | ||
# 2. Some resources cannot host controllers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we revert this?
# 3. Some resources does not have cloud specified. | ||
# Return the default resources. | ||
# 3. Some resources do not have cloud specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, could we revert this?
Thanks for the suggestions @cblmemo ! All fixed, please review :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @euclidgame for the prompt fix! Left some nits and after that it should be ready to go!
@cblmemo Thanks for the suggestions! I have fixed them, please review again. |
Thanks @euclidgame ! Mostly looks good to me. I tested the unittest but unfortunately got an error. Could you help fix it? pytest tests/unit_tests/test_controller_utils.py
D 10-24 16:40:14 skypilot_config.py:228] Using config path: /home/txia/.sky/config.yaml
D 10-24 16:40:14 skypilot_config.py:233] Config loaded:
D 10-24 16:40:14 skypilot_config.py:233] {'serve': {'controller': {'resources': {'cloud': 'aws', 'cpus': 4}}}}
D 10-24 16:40:14 skypilot_config.py:245] Config syntax check passed.
bringing up nodes...
....F.
==================================================================================================== FAILURES =====================================================================================================
_____________________________________________________________ test_get_controller_resources_with_task_resources[serve-default_controller_resources1] ______________________________________________________________
[gw4] linux -- Python 3.9.17 /home/txia/miniconda3/envs/sky-serve/bin/python
tests/unit_tests/test_controller_utils.py:104: in test_get_controller_resources_with_task_resources
_check_controller_resources(controller_resources, expected_combinations,
tests/unit_tests/test_controller_utils.py:82: in _check_controller_resources
assert config == default_controller_resources, config
E AssertionError: {'cpus': '4', 'disk_size': 200}
E assert {'cpus': '4',...sk_size': 200} == {'cpus': '4+'...sk_size': 200}
E Omitting 1 identical items, use -vv to show
E Differing items:
E {'cpus': '4'} != {'cpus': '4+'}
E Use -v to get more diff
============================================================================================= short test summary info =============================================================================================
FAILED tests/unit_tests/test_controller_utils.py::test_get_controller_resources_with_task_resources[serve-default_controller_resources1] - AssertionError: {'cpus': '4', 'disk_size': 200}
1 failed, 5 passed, 2 warnings in 5.47s |
nvm. It is an issue on master branch as well. Issued #4172 to keep track of it. Thanks for contributing @euclidgame and merging now! |
Fixes #3364 .
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh