Add num_workers and save_main_session flag to auto_model_refresh notebook #28777

AnandInguva · 2023-10-02T21:29:51Z

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

AnandInguva · 2023-10-02T21:31:45Z

R: @damccorm

I used colab to push it to the branch. I am not sure why the spaces and everything got formatted. When I open the notebook locally, others cells with metadata are getting edited when I push it to the branch(I guess due to the IDE I use)

github-actions · 2023-10-02T21:33:46Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

damccorm

Thanks for following up on this quickly

damccorm · 2023-10-03T12:22:13Z

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

+      "cell_type": "code",
+      "source": [
+        "# Authenticate to your Google Cloud account.\n",
+        "from google.colab import auth\n",


I found that this broke the save_main_session=True case because it got saved as part of the main session and then worker startup failed trying to do this import.

The fix was to make this cell:

# Authenticate to your Google Cloud account. def auth_to_colab(): from google.colab import auth auth.authenticate_user() auth_to_colab()

damccorm · 2023-10-03T12:26:19Z

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

+        "## Use the TensorFlow model handler\n",
+        " This example uses `TFModelHandlerTensor` as the model handler and the `resnet_101` model trained on [ImageNet](https://www.image-net.org/).\n",
+        "\n",
+        " Download the model from [Google Cloud Storage](https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels.h5) (link downloads the model), and place it in the directory that you want to use to update your model.\n",


I found that just downloading the model naively didn't work and I got an error about it being saved in an improper format. This can be demonstrated by using the same command our model handler uses for loading: tf.keras.models.load_model('path/to/resnet101_weights_tf_dim_ordering_tf_kernels.h5')

The fix I found was to load the model and then save it with keras. So:

model = tf.keras.applications.resnet.ResNet101() model.save('path/to/resnet101_weights_tf_dim_ordering_tf_kernels.keras') model = tf.keras.applications.resnet.ResNet152() model.save('path/to/resnet152_weights_tf_dim_ordering_tf_kernels.keras')

(and then using the .keras extension elsewhere). I think the .h5 files just contain the model weights

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

AnandInguva · 2023-10-03T18:39:36Z

Addressed comments.

.h5 used to store entire model as well. I was loading model with the same link but may be in the updated version, something has changed? In the new TF version, they are referring .h5 to be legacy and .keras was introduced. So its better to keep .keras

damccorm · 2023-10-04T12:35:53Z

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

      "source": [
        "!pip install apache_beam[gcp]>=2.46.0 --quiet\n",
-        "!pip install tensorflow\n",
-        "!pip install tensorflow_hub"
+        "# !pip install tensorflow_hub"


Did you mean to change these locally?

damccorm · 2023-10-04T12:37:48Z

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

@@ -132,8 +119,6 @@
        "from typing import Tuple\n",
        "\n",
        "import apache_beam as beam\n",
-        "from apache_beam.examples.inference.tensorflow_imagenet_segmentation import PostProcessor\n",
-        "from apache_beam.examples.inference.tensorflow_imagenet_segmentation import read_image\n",


We still need to inline this function somewhere (probably alongside the mapping function where we call it)

Can you explain it? I didn't understand inline this function?

Right now, read_image is still called, but its not defined anywhere. Originally we imported it, but that's fragile since the example could change/doesn't have rigorous API guarantees. We should define the function next to its usage instead

Yeah, we use preprocess_image instead of read_image(same logic though, just defined in the notebook) and this change was also lost in the commit. I am running the notebook with updated changes.

damccorm · 2023-10-04T12:38:32Z

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

-        "# To expedite the model update process, it's recommended to set num_workers>1.\n",
-        "# https://github.com/apache/beam/issues/28776\n",
-        "options.view_as(WorkerOptions).num_workers = 5"
+        "\n"


Did you mean to remove save_main_session/num_workers?

Co-authored-by: Danny McCormick <dannymccormick@google.com>

AnandInguva · 2023-10-04T17:59:01Z

I opened the notebook with the option open the notebook using colab and it removed the code suggestions and previous commits. It gave the master ref. That was why the changes were not in the recent commit. I made changes now. PTAL

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

damccorm · 2023-10-04T18:48:45Z

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

+      "source": [
+        "3. Pass the images to the RunInference `PTransform`. RunInference takes `model_handler` and `model_metadata_pcoll` as input parameters.\n",
+        "  * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input is used to update the `model_uri` in the `model_handler` without needing to stop the Apache Beam pipeline\n",
+        "  * Use `WatchFilePattern` as side input to watch a `file_pattern` matching `.h5` files. In this case, the `file_pattern` is `'gs://BUCKET_NAME/*.h5'`.\n",


Should be *.keras here and below

damccorm · 2023-10-04T18:50:19Z

I opened the notebook with the option open the notebook using colab and it removed the code suggestions and previous commits. It gave the master ref. That was why the changes were not in the recent commit. I made changes now. PTAL

You can reference it with your repo/branch name FWIW - https://colab.sandbox.google.com/github/AnandInguva/beam/blob/auto_model_refresh/examples/notebooks/beam-ml/automatic_model_refresh.ipynb - I'd recommend running through it that way after pushing the branch but before opening PRs in general to check your logic

damccorm

Mostly LGTM, just a couple more comments

examples/notebooks/beam-ml/automatic_model_refresh.ipynb

damccorm · 2023-10-04T20:45:13Z

FYI, I'd recommend doing changes in a text editor or on GitHub to avoid a big diff from colab

Co-authored-by: Danny McCormick <dannymccormick@google.com>

damccorm

Thanks!

AnandInguva marked this pull request as ready for review October 2, 2023 21:29

github-actions bot added the examples label Oct 2, 2023

damccorm reviewed Oct 3, 2023

View reviewed changes

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved

damccorm reviewed Oct 4, 2023

View reviewed changes

AnandInguva and others added 4 commits October 4, 2023 13:49

Add num_workers and save_main_session flag

267492e

Add WorkerOptions

6c612de

Apply suggestions from code review

a763ff1

Co-authored-by: Danny McCormick <dannymccormick@google.com>

Add back removed contents from a past commit

92b2cbd

AnandInguva force-pushed the auto_model_refresh branch from 5f9a282 to 92b2cbd Compare October 4, 2023 17:51

Add workerOptions to the import

9f39a85

damccorm reviewed Oct 4, 2023

View reviewed changes

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Show resolved Hide resolved

damccorm reviewed Oct 4, 2023

View reviewed changes

AnandInguva added 3 commits October 4, 2023 16:35

Created using Colaboratory

056b84c

Created using Colaboratory

076847e

Update auto model refresh notebook

0749d17

damccorm reviewed Oct 4, 2023

View reviewed changes

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved

examples/notebooks/beam-ml/automatic_model_refresh.ipynb Outdated Show resolved Hide resolved

Apply suggestions from code review

1354cf6

Co-authored-by: Danny McCormick <dannymccormick@google.com>

damccorm approved these changes Oct 4, 2023

View reviewed changes

damccorm merged commit c2666e1 into apache:master Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add num_workers and save_main_session flag to auto_model_refresh notebook #28777

Add num_workers and save_main_session flag to auto_model_refresh notebook #28777

AnandInguva commented Oct 2, 2023 •

edited

Loading

AnandInguva commented Oct 2, 2023

github-actions bot commented Oct 2, 2023

damccorm left a comment

damccorm Oct 3, 2023

damccorm Oct 3, 2023

AnandInguva commented Oct 3, 2023

damccorm Oct 4, 2023

damccorm Oct 4, 2023

AnandInguva Oct 4, 2023

damccorm Oct 4, 2023

AnandInguva Oct 4, 2023 •

edited

Loading

damccorm Oct 4, 2023

AnandInguva commented Oct 4, 2023

damccorm Oct 4, 2023

damccorm commented Oct 4, 2023

damccorm left a comment

damccorm commented Oct 4, 2023

damccorm left a comment

Add num_workers and save_main_session flag to auto_model_refresh notebook #28777

Add num_workers and save_main_session flag to auto_model_refresh notebook #28777

Conversation

AnandInguva commented Oct 2, 2023 • edited Loading

GitHub Actions Tests Status (on master branch)

AnandInguva commented Oct 2, 2023

github-actions bot commented Oct 2, 2023

damccorm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnandInguva commented Oct 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnandInguva Oct 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnandInguva commented Oct 4, 2023

Choose a reason for hiding this comment

damccorm commented Oct 4, 2023

damccorm left a comment

Choose a reason for hiding this comment

damccorm commented Oct 4, 2023

damccorm left a comment

Choose a reason for hiding this comment

AnandInguva commented Oct 2, 2023 •

edited

Loading

AnandInguva Oct 4, 2023 •

edited

Loading