Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote protobuf generation scripts in Python #12527

Merged
merged 14 commits into from
Sep 19, 2024
Merged

Conversation

Avasam
Copy link
Collaborator

@Avasam Avasam commented Aug 15, 2024

Closes #12511

This is so much faster on Windows, takes a few seconds to run (including downloads and pre-commit). Compared to the over 1m it used to take me on WSL.

I have two design questions:

  1. Which usage do we prefer:
# A)
python scripts/sync_proto/tensorflow.py

# B)
python scripts/sync_proto.py tensorflow
# Which I could make work with the following for shell autocomplete
python scripts/sync_proto.py ./stubs/tensorflow
  1. Should I add a parametrized main function (something like sync_stubs_with_proto) that takes all of a scripts' special needs as parameters (including a "post-run" Callable). Or keep the 3 scripts separate with shared helper functions.

I'm also open to name changes suggestions.

@Avasam Avasam marked this pull request as ready for review August 16, 2024 19:17
@@ -13,6 +13,8 @@ ruff==0.5.4 # must match .pre-commit-config.yaml

# Libraries used by our various scripts.
aiohttp==3.10.2
grpcio-tools
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a minimal version of protoc that is shipped with grpcio-tools that we should be using, but I can't recall off the top of my head and would have to search through past PRs to find what it was.

from http.client import HTTPResponse
from pathlib import Path
from typing import TYPE_CHECKING, Iterable
from urllib.request import urlopen
Copy link
Collaborator Author

@Avasam Avasam Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm purposefully avoiding requests here, as to not add requests and types-requests in requirements-tests.txt

@Avasam Avasam changed the title [WIP] Rewrote protobuf generation scripts in Python Rewrote protobuf generation scripts in Python Aug 16, 2024
Comment on lines +16 to +18
# grpc install only fails on Windows, but let's avoid building sdist on other platforms
# https://github.com/grpc/grpc/issues/36201
grpcio-tools; python_version < "3.13"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Avasam Avasam added the project: infrastructure typeshed build, test, documentation, or distribution related label Aug 18, 2024
Copy link
Collaborator

@srittau srittau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm not really familiar with protobuf or protobuf stubs generation so my comments are limited to general issues.

In general, there are a few places that use / as path separator, so I'd expect problems on Windows, but I'm fine with that for now.

scripts/sync_protobuf/_utils.py Outdated Show resolved Hide resolved
Comment on lines 34 to 36
data: dict[str, dict[str, dict[str, str]]] = json.load(file)
# The root key will be the protobuf source code version
return next(iter(data.values()))["languages"]["python"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some validation of the version, considering its coming from an outside source. Something like:

Suggested change
data: dict[str, dict[str, dict[str, str]]] = json.load(file)
# The root key will be the protobuf source code version
return next(iter(data.values()))["languages"]["python"]
data = json.load(file)
# The root key will be the protobuf source code version
version = next(iter(data.values()))["languages"]["python"]
assert isinstance(version, str)
assert re.fullmatch(r"...", version) # proper re here
return version

This way we're also sure (at runtime) that version has the correct type and format.

Copy link
Collaborator Author

@Avasam Avasam Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like validating the version string is unnecessary extra work. If they somehow write an invalid Python version, our script doesn't need to fail. We're not doing anything with it other than displaying it. Proper validation should probably use a Python packaging library (I don't remember which).
The str assertion I still find valuable in case protobuff changes the structure of that file and the value becomes an object (dict)

scripts/sync_protobuf/google_protobuf.py Outdated Show resolved Hide resolved
Comment on lines +55 to +75
temp_dir = Path(tempfile.mkdtemp())
# Fetch s2clientprotocol (which contains all the .proto files)
archive_path = temp_dir / ARCHIVE_FILENAME
download_file(ARCHIVE_URL, archive_path)
extract_archive(archive_path, temp_dir)

# Remove existing pyi
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"):
old_stub.unlink()

PROTOC_VERSION = run_protoc(
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",),
mypy_out=STUBS_FOLDER,
proto_globs=extract_proto_file_paths(temp_dir),
cwd=temp_dir,
)

PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json")

# Cleanup after ourselves, this is a temp dir, but it can still grow fast if run multiple times
shutil.rmtree(temp_dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure the temp directory is always cleaned up:

Suggested change
temp_dir = Path(tempfile.mkdtemp())
# Fetch s2clientprotocol (which contains all the .proto files)
archive_path = temp_dir / ARCHIVE_FILENAME
download_file(ARCHIVE_URL, archive_path)
extract_archive(archive_path, temp_dir)
# Remove existing pyi
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"):
old_stub.unlink()
PROTOC_VERSION = run_protoc(
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",),
mypy_out=STUBS_FOLDER,
proto_globs=extract_proto_file_paths(temp_dir),
cwd=temp_dir,
)
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json")
# Cleanup after ourselves, this is a temp dir, but it can still grow fast if run multiple times
shutil.rmtree(temp_dir)
with tempfile.TemporaryDirectory() as td:
temp_dir = Path(td)
# Fetch s2clientprotocol (which contains all the .proto files)
archive_path = temp_dir / ARCHIVE_FILENAME
download_file(ARCHIVE_URL, archive_path)
extract_archive(archive_path, temp_dir)
# Remove existing pyi
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"):
old_stub.unlink()
PROTOC_VERSION = run_protoc(
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",),
mypy_out=STUBS_FOLDER,
proto_globs=extract_proto_file_paths(temp_dir),
cwd=temp_dir,
)
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that originally, but it was more annoying to comment out for debugging purposes. Maybe I could do like #12151



def main() -> None:
temp_dir = Path(tempfile.mkdtemp())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.



def main() -> None:
temp_dir = Path(tempfile.mkdtemp())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@Avasam Avasam added the status: needs decision Needs a final decision by the typeshed maintainers label Sep 12, 2024
@srittau srittau merged commit c025e37 into python:main Sep 19, 2024
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: infrastructure typeshed build, test, documentation, or distribution related status: needs decision Needs a final decision by the typeshed maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewriting proto scripts in Python
2 participants