-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure proxy and server images are only provided with --use-pathways. #100
base: main
Are you sure you want to change the base?
Conversation
9a719a3
to
24f8d5b
Compare
570f4ae
to
86bdd91
Compare
--proxy-server-image=<Pathways proxy server image> \ | ||
--docker-name='user-workload' \ | ||
--docker-image=<maxtext docker image> \ | ||
--command='bash /usr/pathways/ifrt/maxtext_entrypoint.sh base_output_directory=<output directory> dataset_path=<dataset path> per_device_batch_size=1 enable_checkpointing=false enable_profiler=false remat_policy=full global_parameter_scale=4 steps=300 max_target_length=2048 use_iota_embed=true reuse_example_batch=1 dataset_type=synthetic attention=flash gcs_metrics=True run_name=$(USER)-pw-xpk-test-1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if you have a simpler example. This seems like a complicated README case example to maintain.
Like in the non-pathways workload command example we just print hello world. Is that possible here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the question, Victor! Let me check if other dependencies are up-to-date. If so, I can simplify it to more like "python3 Maxtext...", however, we really want the workload to actually train!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, we really want the workload to actually train!
Agreed this is the ultimately goal but we should go one step at a time. How about we have both a simple workload and a MaxText example?
I want users to be able to verify they have xpk + system working before trying to train.
--docker-name='user-workload' \ | ||
--docker-image=<maxtext docker image> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--base-docker-image flow is more recommended for xpk users so that local changes can be iterated on. Is that possible for pathways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the docker-image provided by the users is a fixed docker image, but if the others deps (mentioned above) are merged, I can update docker-image to base-docker-image! I would also prefer that!
@@ -4857,17 +4896,15 @@ def directory_path_type(value): | |||
workload_pathways_workload_arguments.add_argument( | |||
'--proxy-server-image', | |||
type=str, | |||
default='gcr.io/cloud-tpu-v2-images/pathways/pathways-demo:proxy_server', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmk if you do want to keep the ability to have a default and we can think through how this would look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is super important, as folks would need to replace it during iteration!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during iteration
Can you go into this more? will customers need to replace this often? curious if we should make it simpler if so.
Removed a Pathways arg.
86bdd91
to
24b9a11
Compare
I will be punting this PR for later. With the way parsers are configured, it is not straightfoward to ensure --use-pathways is provided when any of the other Pathways args are provided. I will resume when the requirements are clearer and this edge case takes priority. |
Fixes / Features
Testing / Documentation
Testing details.
Additionally, I did some manual tests to see that the user gets the right error message for -
Also manually tested that invalid docker images are detected and user is guided with an error message.