Skip to content

Commit

Permalink
fix: using thread pool on macOS (#1861)
Browse files Browse the repository at this point in the history
The python parser uses ProcessPoolExecutor, which is problematic on
macOS when it is distributed as a zip file, leading to errors like:

```
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/var/folders/fw/vythc6112ygfsvky8mdb5p580000gn/T/Bazel.runfiles_esxfeg_v/runfiles/python3_aarch64-apple-darwin/lib/python3.9/multiprocessing/resource_tracker.py", line 24, in <module>
    from . import spawn
  File "/var/folders/fw/vythc6112ygfsvky8mdb5p580000gn/T/Bazel.runfiles_esxfeg_v/runfiles/python3_aarch64-apple-darwin/lib/python3.9/multiprocessing/spawn.py", line 13, in <module>
    import runpy
  File "/var/folders/fw/vythc6112ygfsvky8mdb5p580000gn/T/Bazel.runfiles_esxfeg_v/runfiles/python3_aarch64-apple-darwin/lib/python3.9/runpy.py", line 19, in <module>
    from pkgutil import read_code, get_importer
ModuleNotFoundError: No module named 'pkgutil'
```

According to ["Contexts and start methods"
section](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
of the documentation:

> On macOS, the spawn start method is now the default. The fork start
method should be considered unsafe as it can lead to crashes of the
subprocess as macOS system libraries may start threads.

meanwhile:

> The 'spawn' and 'forkserver' start methods generally cannot be used
with “frozen” executables (i.e., binaries produced by packages like
PyInstaller and cx_Freeze) on POSIX systems.

This means there is no way to start a ProcessPoolExecutor when the
Python zip file is running on macOS. This PR switches it to
ThreadPoolExecutor instead.
  • Loading branch information
linzhp authored Apr 17, 2024
1 parent 4be00a6 commit afae3f0
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion gazelle/python/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import concurrent.futures
import json
import os
import platform
import sys
from io import BytesIO
from tokenize import COMMENT, NAME, OP, STRING, tokenize
Expand Down Expand Up @@ -108,8 +109,19 @@ def parse(repo_root, rel_package_path, filename):
return output


def create_main_executor():
# We cannot use ProcessPoolExecutor on macOS, because the fork start method should be considered unsafe as it can
# lead to crashes of the subprocess as macOS system libraries may start threads. Meanwhile, the 'spawn' and
# 'forkserver' start methods generally cannot be used with “frozen” executables (i.e., Python zip file) on POSIX
# systems. Therefore, there is no good way to use ProcessPoolExecutor on macOS when we distribute this program with
# a zip file.
# Ref: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
if platform.system() == "Darwin":
return concurrent.futures.ThreadPoolExecutor()
return concurrent.futures.ProcessPoolExecutor()

def main(stdin, stdout):
with concurrent.futures.ProcessPoolExecutor() as executor:
with create_main_executor() as executor:
for parse_request in stdin:
parse_request = json.loads(parse_request)
repo_root = parse_request["repo_root"]
Expand Down

0 comments on commit afae3f0

Please sign in to comment.