Python dependency inference: add/emit debugging information #17039

lilatomic · 2022-09-28T15:48:48Z

Is your feature request related to a problem? Please describe.
There currently isn't any information emitted about what dependencies were inferred. If dependency inference fails, developers don't have much to pinpoint the error.

For concrete usecases:

I ran into an issue where dependent files weren't pulled into a test PEX. This was due to some shenanigans I had with pants source roots, where the imports were resolved from the repository root, not the closest parent source root.
I ran into an issue where guarded imports weren't pulled into a test PEX. I had the incorrect module path, but the PEX built fine because they were weak imports

Describe the solution you'd like
A good first pass would be having the ability to output,

for every file, the imports found and whether they were resolved.
the map of 3rd party dependencies to modules
the map of 1st party targets to modules

This would help developers confirm that the imports were found and understand if the targets of the import were found or ignored. For the example, with the second usecase, the imports found I would expect something like the following to let me know that it was deliberate and acceptable that there wasn't an error not resolving the imports:

[
  {
    "path": "//folder/file.py",
    "imports": [
      {
        "import": "a.b.c",
        "weak": true,
        "resolved": false
      }
    ]
  }
]

Describe alternatives you've considered
The current output is just checkpointing. With -ltrace, these are what is printed (many messages elided):

11:31:51.66 [DEBUG] Completed: Find all targets in the project
11:31:51.66 [DEBUG] Completed: Find all Python targets in project
11:31:51.66 [DEBUG] Completed: Creating map of third party targets to Python modules
11:31:51.67 [DEBUG] Completed: Creating map of first party Python targets to Python modules
11:31:51.67 [TRACE] Completed: Inferring Python dependencies by analyzing source
11:31:51.67 [TRACE] Completed: Inferring Python dependencies by analyzing source

Additional context
rel: #13283 ; this requests information on dependency inference, that requests graphing the normal dependencies. Discussion in that mentions graphing at different scopes, this mentions a scope below what could ordinarily be graphed by that.

The text was updated successfully, but these errors were encountered:

lilatomic · 2022-09-28T15:52:06Z

I think there's a lot of information that will be output, so something like peek's --peek-output-file might be appropriate

stuhood · 2022-09-28T16:30:17Z

There is some (informal) prior art here in the JVM backend (and the go backend, as well, actually):

Those goals dump the sources of thirdparty dependencies as JSON, and the exact extracted symbols per file (respectively).

If symbol extraction were standardized across languages behind a @union, a goal like this could be done in a language-agnostic manner. But failing that, adding a debug goal like this would also be an option. cc @tdyas

Eric-Arellano · 2022-09-28T16:50:23Z

I think that a debug_goals backend is a good idea for Python. This ticket would be awesome! Thanks for the suggestion.

stuhood · 2022-09-28T21:30:00Z

I think there's a lot of information that will be output, so something like peek's --peek-output-file might be appropriate

Oh, hm! It just occurred to me that another connection to your peek idea is #16967: essentially, you could think of the extracted imports / consumed-symbols of a file as effectively computed "file metadata" about that file. If we had additional generic computed per-file metadata like this, then peek might be a natural place to (optionally) render it...

See #17039. Given a testbed of <details> <summary>input</summary> ```python # Copyright 2022 Pants project contributors (see CONTRIBUTORS.md). # Licensed under the Apache License, Version 2.0 (see LICENSE). import json # unownable, root level import os.path # unownable, not root level import watchdog # dependency not included import yaml # dependency included import yamlpath # owned by other resolve try: import weakimport # weakimport missing except ImportError: ... open("src/python/configs/prod.json") # asset open("testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py") ``` </details> we get <details> <summary>output</summary> ``` { "src/python/pants/backend/python/dependency_inference/t.py": { "imports": [ { "name": "weakimport", "reference": { "lineno": 12, "weak": true }, "resolved": { "status": "ImportOwnerStatus.weak_ignore", "address": [] }, "possible_resolve": null }, { "name": "json", "reference": { "lineno": 4, "weak": false }, "resolved": { "status": "ImportOwnerStatus.unownable", "address": [] }, "possible_resolve": null }, { "name": "os.path", "reference": { "lineno": 5, "weak": false }, "resolved": { "status": "ImportOwnerStatus.unownable", "address": [] }, "possible_resolve": null }, { "name": "watchdog", "reference": { "lineno": 7, "weak": false }, "resolved": { "status": "ImportOwnerStatus.unowned", "address": [] }, "possible_resolve": null }, { "name": "yaml", "reference": { "lineno": 8, "weak": false }, "resolved": { "status": "ImportOwnerStatus.unambiguous", "address": [ "3rdparty/python#PyYAML", "3rdparty/python#types-PyYAML" ] }, "possible_resolve": null }, { "name": "yamlpath", "reference": { "lineno": 9, "weak": false }, "resolved": { "status": "ImportOwnerStatus.unowned", "address": [] }, "possible_resolve": [ [ "src/python/pants/backend/helm/subsystems:yamlpath", "helm-post-renderer" ] ] } ], "assets": [ { "name": "src/python/configs/prod.json", "reference": "src/python/configs/prod.json", "resolved": { "status": "ImportOwnerStatus.unowned", "address": [] }, "possible_resolve": null }, { "name": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py", "reference": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py", "resolved": { "status": "ImportOwnerStatus.unambiguous", "address": [ "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py:../../../../pants_plugins_directory" ] }, "possible_resolve": null } ] } } ``` </details> Telling you, for each file, for each import, what dependencies pants thought it could have, and what it decided to do with them. This uses almost all the same code as the main dependency inference code, with the exception of the top-level orchestration of it. I think that's pretty close, there's about 100 lines of semi-duplicate code. There's also a more advanced mode that dumps information about each stage of the process. I think this might be useful for people digging through the dependency inference process but not really for end-users. we get it for free, though. Fixes #17039. --- this is fairly critical for performance, so here are benchmarks (with comparison-of-means t-test) | | main | this | difference | P-score | | --- | --- | --- | --- | --- | | `hyperfine --runs=10 './pants --no-pantsd dependencies --transitive ::'` | 21.839 s ± 0.326 s | 22.142 s ± 0.283 s | 1.38% | 0.0395 | | `hyperfine --warmup=1 --runs=10 './pants dependencies --transitive ::'` | 1.798 s ± 0.074 s | 1.811 s ± 0.076 s | 0.72% | 0.7029 | | `hyperfine --runs=10 './pants --no-pantsd dependencies ::'` | 21.547 s ± 0.640 s | 21.863 s ± 1.072 s | 1.47% | 0.4339 | | `hyperfine --warmup=1 --runs=10 './pants dependencies ::'` | 1.828 s ± 0.091 s | 1.844 s ± 0.105 s | 0.88% | 0.7200 | So it looks like this MR might impact performance, by about 1%, although those p-values are mighty unconvincing. LMK if we want to increase runs and get more statistics, I've run the stats a few times throughout and this looks about right, so I think we can proceed with the review under the assumption that there is currently a 1% performance overhead. I'm open to suggestions on improving performance.

lilatomic added the enhancement label Sep 28, 2022

lilatomic mentioned this issue Sep 29, 2022

Add debug goals to python #17057

Merged

tdyas added the backend: Python Python backend-related issues label Oct 2, 2022

stuhood closed this as completed in #17057 Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python dependency inference: add/emit debugging information #17039

Python dependency inference: add/emit debugging information #17039

lilatomic commented Sep 28, 2022

lilatomic commented Sep 28, 2022

stuhood commented Sep 28, 2022 •

edited

Loading

Eric-Arellano commented Sep 28, 2022

stuhood commented Sep 28, 2022 •

edited

Loading

Python dependency inference: add/emit debugging information #17039

Python dependency inference: add/emit debugging information #17039

Comments

lilatomic commented Sep 28, 2022

lilatomic commented Sep 28, 2022

stuhood commented Sep 28, 2022 • edited Loading

Eric-Arellano commented Sep 28, 2022

stuhood commented Sep 28, 2022 • edited Loading

stuhood commented Sep 28, 2022 •

edited

Loading

stuhood commented Sep 28, 2022 •

edited

Loading