Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add num_workers and save_main_session flag to auto_model_refresh notebook #28777

Merged
merged 9 commits into from
Oct 4, 2023

Conversation

AnandInguva
Copy link
Contributor

@AnandInguva AnandInguva commented Oct 2, 2023

Fixes: #28773


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@AnandInguva AnandInguva marked this pull request as ready for review October 2, 2023 21:29
@AnandInguva
Copy link
Contributor Author

R: @damccorm

I used colab to push it to the branch. I am not sure why the spaces and everything got formatted. When I open the notebook locally, others cells with metadata are getting edited when I push it to the branch(I guess due to the IDE I use)

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on this quickly

"cell_type": "code",
"source": [
"# Authenticate to your Google Cloud account.\n",
"from google.colab import auth\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this broke the save_main_session=True case because it got saved as part of the main session and then worker startup failed trying to do this import.

The fix was to make this cell:

# Authenticate to your Google Cloud account.
def auth_to_colab():
  from google.colab import auth
  auth.authenticate_user()

auth_to_colab()

"## Use the TensorFlow model handler\n",
" This example uses `TFModelHandlerTensor` as the model handler and the `resnet_101` model trained on [ImageNet](https://www.image-net.org/).\n",
"\n",
" Download the model from [Google Cloud Storage](https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels.h5) (link downloads the model), and place it in the directory that you want to use to update your model.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that just downloading the model naively didn't work and I got an error about it being saved in an improper format. This can be demonstrated by using the same command our model handler uses for loading: tf.keras.models.load_model('path/to/resnet101_weights_tf_dim_ordering_tf_kernels.h5')

The fix I found was to load the model and then save it with keras. So:

model = tf.keras.applications.resnet.ResNet101()
model.save('path/to/resnet101_weights_tf_dim_ordering_tf_kernels.keras')
model = tf.keras.applications.resnet.ResNet152()
model.save('path/to/resnet152_weights_tf_dim_ordering_tf_kernels.keras')

(and then using the .keras extension elsewhere). I think the .h5 files just contain the model weights

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved
examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved
@AnandInguva
Copy link
Contributor Author

Addressed comments.

.h5 used to store entire model as well. I was loading model with the same link but may be in the updated version, something has changed? In the new TF version, they are referring .h5 to be legacy and .keras was introduced. So its better to keep .keras

"source": [
"!pip install apache_beam[gcp]>=2.46.0 --quiet\n",
"!pip install tensorflow\n",
"!pip install tensorflow_hub"
"# !pip install tensorflow_hub"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to change these locally?

@@ -132,8 +119,6 @@
"from typing import Tuple\n",
"\n",
"import apache_beam as beam\n",
"from apache_beam.examples.inference.tensorflow_imagenet_segmentation import PostProcessor\n",
"from apache_beam.examples.inference.tensorflow_imagenet_segmentation import read_image\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to inline this function somewhere (probably alongside the mapping function where we call it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain it? I didn't understand inline this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, read_image is still called, but its not defined anywhere. Originally we imported it, but that's fragile since the example could change/doesn't have rigorous API guarantees. We should define the function next to its usage instead

Copy link
Contributor Author

@AnandInguva AnandInguva Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we use preprocess_image instead of read_image(same logic though, just defined in the notebook) and this change was also lost in the commit. I am running the notebook with updated changes.

"# To expedite the model update process, it's recommended to set num_workers>1.\n",
"# https://github.com/apache/beam/issues/28776\n",
"options.view_as(WorkerOptions).num_workers = 5"
"\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to remove save_main_session/num_workers?

@AnandInguva
Copy link
Contributor Author

I opened the notebook with the option open the notebook using colab and it removed the code suggestions and previous commits. It gave the master ref. That was why the changes were not in the recent commit. I made changes now. PTAL

"source": [
"3. Pass the images to the RunInference `PTransform`. RunInference takes `model_handler` and `model_metadata_pcoll` as input parameters.\n",
" * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input is used to update the `model_uri` in the `model_handler` without needing to stop the Apache Beam pipeline\n",
" * Use `WatchFilePattern` as side input to watch a `file_pattern` matching `.h5` files. In this case, the `file_pattern` is `'gs://BUCKET_NAME/*.h5'`.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be *.keras here and below

@damccorm
Copy link
Contributor

damccorm commented Oct 4, 2023

I opened the notebook with the option open the notebook using colab and it removed the code suggestions and previous commits. It gave the master ref. That was why the changes were not in the recent commit. I made changes now. PTAL

You can reference it with your repo/branch name FWIW - https://colab.sandbox.google.com/github/AnandInguva/beam/blob/auto_model_refresh/examples/notebooks/beam-ml/automatic_model_refresh.ipynb - I'd recommend running through it that way after pushing the branch but before opening PRs in general to check your logic

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just a couple more comments

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved
examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved
@damccorm
Copy link
Contributor

damccorm commented Oct 4, 2023

FYI, I'd recommend doing changes in a text editor or on GitHub to avoid a big diff from colab

Co-authored-by: Danny McCormick <dannymccormick@google.com>
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@damccorm damccorm merged commit c2666e1 into apache:master Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Automatic model refresh notebook broken
2 participants