-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrote protobuf generation scripts in Python #12527
Conversation
requirements-tests.txt
Outdated
@@ -13,6 +13,8 @@ ruff==0.5.4 # must match .pre-commit-config.yaml | |||
|
|||
# Libraries used by our various scripts. | |||
aiohttp==3.10.2 | |||
grpcio-tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's a minimal version of protoc that is shipped with grpcio-tools that we should be using, but I can't recall off the top of my head and would have to search through past PRs to find what it was.
scripts/sync_protobuf/_helpers.py
Outdated
from http.client import HTTPResponse | ||
from pathlib import Path | ||
from typing import TYPE_CHECKING, Iterable | ||
from urllib.request import urlopen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm purposefully avoiding requests
here, as to not add requests
and types-requests
in requirements-tests.txt
# grpc install only fails on Windows, but let's avoid building sdist on other platforms | ||
# https://github.com/grpc/grpc/issues/36201 | ||
grpcio-tools; python_version < "3.13" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref grpc/grpc#36201 & grpc/grpc#34922
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm not really familiar with protobuf or protobuf stubs generation so my comments are limited to general issues.
In general, there are a few places that use /
as path separator, so I'd expect problems on Windows, but I'm fine with that for now.
data: dict[str, dict[str, dict[str, str]]] = json.load(file) | ||
# The root key will be the protobuf source code version | ||
return next(iter(data.values()))["languages"]["python"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see some validation of the version, considering its coming from an outside source. Something like:
data: dict[str, dict[str, dict[str, str]]] = json.load(file) | |
# The root key will be the protobuf source code version | |
return next(iter(data.values()))["languages"]["python"] | |
data = json.load(file) | |
# The root key will be the protobuf source code version | |
version = next(iter(data.values()))["languages"]["python"] | |
assert isinstance(version, str) | |
assert re.fullmatch(r"...", version) # proper re here | |
return version |
This way we're also sure (at runtime) that version has the correct type and format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like validating the version string is unnecessary extra work. If they somehow write an invalid Python version, our script doesn't need to fail. We're not doing anything with it other than displaying it. Proper validation should probably use a Python packaging library (I don't remember which).
The str assertion I still find valuable in case protobuff changes the structure of that file and the value becomes an object (dict)
temp_dir = Path(tempfile.mkdtemp()) | ||
# Fetch s2clientprotocol (which contains all the .proto files) | ||
archive_path = temp_dir / ARCHIVE_FILENAME | ||
download_file(ARCHIVE_URL, archive_path) | ||
extract_archive(archive_path, temp_dir) | ||
|
||
# Remove existing pyi | ||
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"): | ||
old_stub.unlink() | ||
|
||
PROTOC_VERSION = run_protoc( | ||
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",), | ||
mypy_out=STUBS_FOLDER, | ||
proto_globs=extract_proto_file_paths(temp_dir), | ||
cwd=temp_dir, | ||
) | ||
|
||
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json") | ||
|
||
# Cleanup after ourselves, this is a temp dir, but it can still grow fast if run multiple times | ||
shutil.rmtree(temp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure the temp directory is always cleaned up:
temp_dir = Path(tempfile.mkdtemp()) | |
# Fetch s2clientprotocol (which contains all the .proto files) | |
archive_path = temp_dir / ARCHIVE_FILENAME | |
download_file(ARCHIVE_URL, archive_path) | |
extract_archive(archive_path, temp_dir) | |
# Remove existing pyi | |
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"): | |
old_stub.unlink() | |
PROTOC_VERSION = run_protoc( | |
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",), | |
mypy_out=STUBS_FOLDER, | |
proto_globs=extract_proto_file_paths(temp_dir), | |
cwd=temp_dir, | |
) | |
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json") | |
# Cleanup after ourselves, this is a temp dir, but it can still grow fast if run multiple times | |
shutil.rmtree(temp_dir) | |
with tempfile.TemporaryDirectory() as td: | |
temp_dir = Path(td) | |
# Fetch s2clientprotocol (which contains all the .proto files) | |
archive_path = temp_dir / ARCHIVE_FILENAME | |
download_file(ARCHIVE_URL, archive_path) | |
extract_archive(archive_path, temp_dir) | |
# Remove existing pyi | |
for old_stub in STUBS_FOLDER.rglob("*_pb2.pyi"): | |
old_stub.unlink() | |
PROTOC_VERSION = run_protoc( | |
proto_paths=(f"{EXTRACTED_PACKAGE_DIR}/src",), | |
mypy_out=STUBS_FOLDER, | |
proto_globs=extract_proto_file_paths(temp_dir), | |
cwd=temp_dir, | |
) | |
PYTHON_PROTOBUF_VERSION = extract_python_version(temp_dir / EXTRACTED_PACKAGE_DIR / "version.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that originally, but it was more annoying to comment out for debugging purposes. Maybe I could do like #12151
|
||
|
||
def main() -> None: | ||
temp_dir = Path(tempfile.mkdtemp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
|
||
|
||
def main() -> None: | ||
temp_dir = Path(tempfile.mkdtemp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
Co-authored-by: Sebastian Rittau <srittau@rittau.biz>
Closes #12511
This is so much faster on Windows, takes a few seconds to run (including downloads and pre-commit). Compared to the over 1m it used to take me on WSL.
I have two design questions:
sync_stubs_with_proto
) that takes all of a scripts' special needs as parameters (including a "post-run" Callable). Or keep the 3 scripts separate with shared helper functions.I'm also open to name changes suggestions.