Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Protobuf and GCP dependencies in Beam Python SDK #24599

Merged
merged 79 commits into from
Mar 15, 2023

Conversation

AnandInguva
Copy link
Contributor

@AnandInguva AnandInguva commented Dec 8, 2022

Fixes: #23355
Fixes: #24569
Fixes: #25581
Fixes: #25328
Fixes: #20991
Fixes: #21019
Fixes: #24432
Fixes: #23585
Fixes: #22742

Also look at #22319 for proto2_coder_test_messages_pb2.py generation
Tensorflow Protobuf update GH issue: tensorflow/tensorflow#59221


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the python label Dec 8, 2022
@AnandInguva
Copy link
Contributor Author

Local tested error in Python 3.8 MacOS env:

Traceback (most recent call last):
  File "/Users/anandinguva/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 185, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/anandinguva/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/__init__.py", line 92, in <module>
    from apache_beam import coders
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/coders/__init__.py", line 17, in <module>
    from apache_beam.coders.coders import *
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/coders/coders.py", line 59, in <module>
    from apache_beam.coders import coder_impl
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/coders/coder_impl.py", line 63, in <module>
    from apache_beam.typehints.schemas import named_tuple_from_schema
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/typehints/__init__.py", line 38, in <module>
    from apache_beam.typehints.arrow_type_compatibility import *
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/typehints/arrow_type_compatibility.py", line 32, in <module>
    from apache_beam.portability.api import schema_pb2
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/portability/api/__init__.py", line 21, in <module>
    from .org.apache.beam.model.pipeline import v1
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/portability/api/org/apache/beam/model/pipeline/v1/__init__.py", line 17, in <module>
    from . import external_transforms_pb2
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/portability/api/org/apache/beam/model/pipeline/v1/external_transforms_pb2.py", line 14, in <module>
    from . import schema_pb2 as org_dot_apache_dot_beam_dot_model_dot_pipeline_dot_v1_dot_schema__pb2
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/portability/api/org/apache/beam/model/pipeline/v1/schema_pb2.py", line 15, in <module>
    from . import beam_runner_api_pb2 as org_dot_apache_dot_beam_dot_model_dot_pipeline_dot_v1_dot_beam__runner__api__pb2
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/portability/api/org/apache/beam/model/pipeline/v1/beam_runner_api_pb2.py", line 14, in <module>
    from . import endpoints_pb2 as org_dot_apache_dot_beam_dot_model_dot_pipeline_dot_v1_dot_endpoints__pb2
  File "/Users/anandinguva/projects/beam/sdks/python/apache_beam/portability/api/org/apache/beam/model/pipeline/v1/endpoints_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/Users/anandinguva/.pyenv/versions/deps_3.8.12/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 560, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

@AnandInguva
Copy link
Contributor Author

@AnandInguva
Copy link
Contributor Author

@AnandInguva
Copy link
Contributor Author

AnandInguva commented Feb 7, 2023

The urns are not getting build when protobuf is updated to 4.21.11.

Can be reproduced with

  1. python setup.py sdist
  2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz

Update:

Looked more into it. We generate the sidecar urn files to support mypy type checker. Some of the types of Protobuf classes has been changed and they were not getting matched with the current code, hence the URNs are not getting generated.

@AnandInguva AnandInguva changed the title Update build dependencies and mypy-protobuf Update Python SDK build dependencies Feb 7, 2023
@AnandInguva
Copy link
Contributor Author

Run Python 3.7 PostCommit

@github-actions github-actions bot added model and removed model labels Mar 10, 2023
@AnandInguva
Copy link
Contributor Author

Run Python 3.9 PostCommit

@AnandInguva
Copy link
Contributor Author

Run Python 3.7 PostCommit

@AnandInguva
Copy link
Contributor Author

blocker: tensorflow_hub 0.12 (latest) is not compatible with tensorflow 2.12, which makes our TF integration tests fail. We need to wait until tensorflow_hub releases a newer version.

  • we can skip the TF tests for now and have a tracking issue for tensorflow_hub release and unskip them before the next release.

@AnandInguva
Copy link
Contributor Author

Run Python 3.9 PostCommit

@AnandInguva
Copy link
Contributor Author

Run Python 3.7 PostCommit

@AnandInguva
Copy link
Contributor Author

Run Python_PVR_Flink PreCommit

@tvalentyn
Copy link
Contributor

merging

@tvalentyn tvalentyn merged commit 5cb1711 into apache:master Mar 15, 2023
@tvalentyn
Copy link
Contributor

Thanks, @AnandInguva !

@@ -32,14 +32,11 @@ cython<1
# some versions of libraries that launch Beam pipelines, like tensorflow-transform.
# Leaving 'future' in our containers for now prevent breaking tft users.
future
# TODO: Remove the upper bound once Tensorflow 2.11 is released.
# https://github.com/apache/beam/issues/23355
google-cloud-profiler<4.0.0
Copy link
Contributor

@Abacn Abacn May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason removing google-cloud-profiler dep in base image? This has broken the google-cloud-profiler support on Beam v2.47.0

Update: confirmed that support on v2.47.0 is not affected. It only affects dev version for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not intentional. Looks like Dataflow containers still include it.

Copy link
Contributor

@tvalentyn tvalentyn May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we intentionally removed google-cloud-debugger from Dataflow containers, perhaps there was confusion between these two deps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataflow container now does not include it. Can be seen by https://github.com/apache/beam/blob/master/sdks/python/container/py38/base_image_requirements.txt where google-cloud-profiler is no longer listed in. I now get error using profiler: #26698

Copy link
Contributor Author

@AnandInguva AnandInguva May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is less severe, so it only affects dev version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment