Add debug goals to python #17057

lilatomic · 2022-09-29T00:06:50Z

rel #17039
We're ready for review! Given a testbed of

input

# Copyright 2022 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

import json  # unownable, root level
import os.path  # unownable, not root level

import watchdog  # dependency not included
import yaml  # dependency included
import yamlpath  # owned by other resolve

try:
    import weakimport  # weakimport missing
except ImportError:
    ...

open("src/python/configs/prod.json")  # asset
open("testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py")

we get

output

{
  "src/python/pants/backend/python/dependency_inference/t.py": {
    "imports": [
      {
        "name": "weakimport",
        "reference": {
          "lineno": 12,
          "weak": true
        },
        "resolved": {
          "status": "ImportOwnerStatus.weak_ignore",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "json",
        "reference": {
          "lineno": 4,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unownable",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "os.path",
        "reference": {
          "lineno": 5,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unownable",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "watchdog",
        "reference": {
          "lineno": 7,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "yaml",
        "reference": {
          "lineno": 8,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "3rdparty/python#PyYAML",
            "3rdparty/python#types-PyYAML"
          ]
        },
        "possible_resolve": null
      },
      {
        "name": "yamlpath",
        "reference": {
          "lineno": 9,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": [
          [
            "src/python/pants/backend/helm/subsystems:yamlpath",
            "helm-post-renderer"
          ]
        ]
      }
    ],
    "assets": [
      {
        "name": "src/python/configs/prod.json",
        "reference": "src/python/configs/prod.json",
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "reference": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py:../../../../pants_plugins_directory"
          ]
        },
        "possible_resolve": null
      }
    ]
  }
}

Telling you, for each file, for each import, what dependencies pants thought it could have, and what it decided to do with them.
This uses almost all the same code as the main dependency inference code, with the exception of the top-level orchestration of it. I think that's pretty close, there's about 100 lines of semi-duplicate code.

There's also a more advanced mode that dumps information about each stage of the process. I think this might be useful for people digging through the dependency inference process but not really for end-users. we get it for free, though.

Fixes #17039.

this is fairly critical for performance, so here are benchmarks (with comparison-of-means t-test)

	main	this	difference	P-score
`hyperfine --runs=10 './pants --no-pantsd dependencies --transitive ::'`	21.839 s ± 0.326 s	22.142 s ± 0.283 s	1.38%	0.0395
`hyperfine --warmup=1 --runs=10 './pants dependencies --transitive ::'`	1.798 s ± 0.074 s	1.811 s ± 0.076 s	0.72%	0.7029
`hyperfine --runs=10 './pants --no-pantsd dependencies ::'`	21.547 s ± 0.640 s	21.863 s ± 1.072 s	1.47%	0.4339
`hyperfine --warmup=1 --runs=10 './pants dependencies ::'`	1.828 s ± 0.091 s	1.844 s ± 0.105 s	0.88%	0.7200

So it looks like this MR might impact performance, by about 1%, although those p-values are mighty unconvincing. LMK if we want to increase runs and get more statistics, I've run the stats a few times throughout and this looks about right, so I think we can proceed with the review under the assumption that there is currently a 1% performance overhead. I'm open to suggestions on improving performance.

lilatomic · 2022-09-30T05:24:41Z

ok, thanks to folks in the Pants Slack, this is looking much better. Code duplication is minimal. I've also hooked a few levels deeper into inferrence, it now looks like:

{
  "address": "build-support/bin/_release_helper.py:py_scripts",
  "identified": {
    "imports": {
      "__future__.annotations": {
        "lineno": 4,
        "weak": false
      },
      "argparse": {
        "lineno": 6,
        "weak": false
      },
...
      "pants.util.contextutil.temporary_dir": {
        "lineno": 34,
        "weak": false
      },
      "pants.util.memo.memoized_property": {
        "lineno": 35,
        "weak": false
      },
      "pants.util.strutil.softwrap": {
        "lineno": 36,
        "weak": false
      },
      "pants.util.strutil.strip_prefix": {
        "lineno": 36,
        "weak": false
      }
    },
    "assets": [
      "3rdparty/python/requirements.txt",
      "build-support/bin/get_os.sh"
    ]
  },
  "resolved": {
    "imports": [
      "3rdparty/python#types-requests",
      "src/python/pants/util/strutil.py",
      "build-support/bin/common.py:py_scripts",
      "3rdparty/python#requests",
      "src/python/pants/util/contextutil.py",
      "src/python/pants/util/memo.py",
      "build-support/bin/reversion.py:py_scripts",
      "3rdparty/python#packaging"
    ],
    "unowned": [],
    "assets": [],
    "explicit": {
      "address": "build-support/bin/_release_helper.py:py_scripts",
      "includes": [],
      "ignores": []
    }
  }
}

This is pretty good! But with the hinting in _handle_unowned_imports, I think I can make it more useful by putting that into the export as well. There's also the fact that this isn't super usable in itself. It would be nice to see the resolved imports placed alongside the actual import statements.

lilatomic · 2022-10-01T05:22:44Z

hey now that's looking good, and now with assets wired in too:

{
  "src/python/pants/backend/python/dependency_inference/t.py": {
    "imports": [
      {
        "name": "weakimport",
        "reference": {
          "lineno": 12,
          "weak": true
        },
        "resolved": {
          "status": "ImportOwnerStatus.weak_ignore",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "json",
        "reference": {
          "lineno": 4,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unownable",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "watchdog",
        "reference": {
          "lineno": 6,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": [
          [
            "src/python/pants/backend/helm/subsystems:yamlpath",
            "helm-post-renderer"
          ]
        ]
      },
      {
        "name": "yaml",
        "reference": {
          "lineno": 7,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "3rdparty/python#PyYAML",
            "3rdparty/python#types-PyYAML"
          ]
        },
        "possible_resolve": null
      }
    ],
    "assets": [
      {
        "name": "src/python/configs/prod.json",
        "reference": "src/python/configs/prod.json",
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "reference": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py:../../../../pants_plugins_directory"
          ]
        },
        "possible_resolve": null
      }
    ]
  }
}

(don't ask about any of the data, it's 100% shenanigans.) Code needs a cleanup and tests, and then it'll be ready to go

stuhood

Neat!

I didn't look a whole lot at the debug goal itself: mostly at the impact on the runtime, because a general principle that I think will be important in landing this is that we don't want the debug goals to cost us any runtime performance in the hot path. So it would be good to ensure that any indirection that is introduced to allow for consuming the output in the debug goals does not impact runtime.

stuhood · 2022-10-18T23:15:04Z

src/python/pants/backend/python/dependency_inference/rules.py

+    unownable = "unownable"
+
+
+@dataclass()


Suggested change

@dataclass()

@dataclass(frozen=True)

stuhood · 2022-10-18T23:18:00Z

src/python/pants/backend/python/dependency_inference/rules.py

+                resolve_results[filepath] = ImportResolveResult(
+                    ImportOwnerStatus.unambiguous, possible_addresses
+                )
+                continue


I feel like this cluster of if-elses might be clearer if every branch sets a local variable ImportOwnerStatus, and then has a single line to update resolve_results with that value. Otherwise the type checker isn't actually helping to guarantee that all branches mutate the dict.

agree. I've refactored to use a function and comprehension.

stuhood · 2022-10-18T23:21:33Z

src/python/pants/backend/python/dependency_inference/rules.py

+@rule
+async def find_other_owners_for_unowned_imports(
+    req: UnownedImportsPossibleOwnersRequest,
+    python_setup: PythonSetup,
+) -> UnownedImportsPossibleOwners:


Should this be a @rule, rather than a @_rule_helper? Is it likely to be called multiple times with the same inputs (such that it gains benefit from memoization)? If not, maybe @_rule_helper.

hmm, I think that memoising individual possible owners would be useful (since it might be imported a few times), but not as much the whole bundle. I'll try rebuilding it.

stuhood · 2022-10-18T23:25:12Z

src/python/pants/backend/python/dependency_inference/rules.py

+@rule
+async def _exec_parse_deps(


If this is purely an internal API called to produce ResolvedParsedPythonDependencies, then ditto potentially @_rule_helper?

stuhood · 2022-10-18T23:28:20Z

It looks like you might need to rebase this to make CI happy... not sure why though.

lilatomic · 2022-10-19T21:41:12Z

Yes, impacting performance has always been looming over this MR. I thought of having separate "optimal" and "debug" pathways, but divergence between the 2 is a source of bugs. I don't have a good intuition for what would impact performance: I don't know the relative overhead of a rule invocation vs a function invocation, or of passing larger datastructures across calls. Do we have a way to performance-test Pants?

lilatomic · 2022-10-22T19:04:02Z

performance seems fine? I'm not sure what a good benchmark to test is

	main	this
`hyperfine --runs=10 './pants --no-pantsd dependencies --transitive ::'`	19.271 s ± 1.163 s	19.263 s ± 1.181 s
`hyperfine --warmup=1 --runs=10 './pants dependencies --transitive ::'`	1.687 s ± 0.048 s	1.694 s ± 0.072 s

lilatomic · 2022-10-26T21:06:34Z

src/python/pants/backend/python/dependency_inference/rules.py

-) -> Iterator[Address]:
-    for filepath in assets:
+) -> dict[str, ImportResolveResult]:
+    def _resolve_single_asset(filepath) -> ImportResolveResult:


I'm not sure how I feel about this being an internal method. Extracting it would make it a bit easier to test, but would also mean that we have to pass all the other parameters through. I'd normally make a class (AssetDependencyResolver or something) to hold on to those, but that doesn't seem to be the style with Pants. Thoughts?

Yea, we definitely bias more toward a functional style, although there are helper classes here and there. No preference in this case, and probably fine to only test the outer method.

lilatomic · 2022-10-26T22:20:04Z

src/python/pants/backend/python/dependency_inference/rules.py

+
+
+@dataclass(frozen=True)
+class ImportResolveResult:


Thoughts on whether I should add smart constructors? It's would help ensure that we always have the expected field combinations (ex every unambiguous result has an owner), but also it's more boilerplate.
ex:

@staticmethod def disambiguated(maybe_disambiguated): return ImportResolveResult(ImportOwnerStatus.disambiguated, (maybe_disambiguated,))

Fine either way.

I think I'm going to skip it, save's on a layer of indirection

lilatomic · 2022-10-26T22:21:26Z

src/python/pants/backend/python/dependency_inference/rules.py

+
+@dataclass(frozen=True)
+class UnownedImportsPossibleOwners:
+    value: Dict[str, list[tuple[Address, ResolveName]]]


Should I add a dataclass for tuple[Address, ResolveName]?

lilatomic · 2022-10-26T22:56:10Z

src/python/pants/backend/python/goals/debug_goals.py

+from pants.option.option_types import EnumOption
+
+
+class AnalysisFlavor(Enum):


I'm open to better names

I'm not too concerned with the name, but a bit longer help string on the option would be good.

stuhood

Looks great! Only blocking comment is probably moving this into a dedicated experimental backend.

stuhood · 2022-10-28T03:10:30Z

src/python/pants/backend/python/dependency_inference/rules.py

-) -> Iterator[Address]:
-    for filepath in assets:
+) -> dict[str, ImportResolveResult]:
+    def _resolve_single_asset(filepath) -> ImportResolveResult:


Yea, we definitely bias more toward a functional style, although there are helper classes here and there. No preference in this case, and probably fine to only test the outer method.

stuhood · 2022-10-28T03:11:24Z

src/python/pants/backend/python/dependency_inference/rules.py

+
+
+@dataclass(frozen=True)
+class ImportResolveResult:


Fine either way.

stuhood · 2022-10-28T03:13:56Z

src/python/pants/backend/python/goals/debug_goals.py

+from pants.option.option_types import EnumOption
+
+
+class AnalysisFlavor(Enum):


I'm not too concerned with the name, but a bit longer help string on the option would be good.

stuhood · 2022-10-28T03:19:46Z

src/python/pants/backend/python/register.py

+        # Test
+        *debug_goals.rules(),


These should probably go in their own backend / register.py, similar to the JVM and Go debug goals: https://github.com/pantsbuild/pants/tree/main/src/python/pants/backend/experimental/java/debug_goals

That's mostly because the goal names / flags are experimental / unstable: it would be amazing to eventually merge them.

[ci skip-rust] [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

[ci skip-rust] [ci skip-build-wheels]

wowowow [ci skip-rust] [ci skip-build-wheels]

[ci skip-rust] [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

[ci skip-rust] [ci skip-build-wheels]

memo to me: read the docs [ci skip-rust] [ci skip-build-wheels]

[ci skip-rust] [ci skip-build-wheels]

exposes the ungathered results for the analysis easily [ci skip-rust] [ci skip-build-wheels]

[ci skip-rust] [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

- split vectorisation and logic - logic always returns a value # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

it was necessary to filter out cases where dep.address was `[]` before we iterated through the addresses. Now that we iterate them, though, `[]` contributes 0 elements

- included dependency is actually included - added missing dependency

so it can be independently imported

lilatomic · 2022-11-16T19:13:27Z

Looks great! Only blocking comment is probably moving this into a dedicated experimental backend.

sure thing, done. I think we should be good for a re-review.

stuhood

Thanks!

stuhood added the category:new feature label Oct 18, 2022

stuhood reviewed Oct 18, 2022

View reviewed changes

lilatomic force-pushed the feature/debug-goals branch from 8ca59b9 to 787c0e6 Compare October 22, 2022 17:07

thejcannon self-requested a review October 26, 2022 15:40

lilatomic commented Oct 26, 2022

View reviewed changes

lilatomic marked this pull request as ready for review October 27, 2022 00:20

stuhood reviewed Oct 28, 2022

View reviewed changes

lilatomic force-pushed the feature/debug-goals branch from 3af65aa to 819a104 Compare October 29, 2022 23:13

lilatomic added 17 commits November 16, 2022 14:07

goal is resolving

86935dd

[ci skip-rust] [ci skip-build-wheels]

working version

8c8019e

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

use correct fieldset to properly instantiate iterpreter reqs

fb22a1e

[ci skip-rust] [ci skip-build-wheels]

refactor marshalling ParsePythonDependenciesRequest

ff781d0

[ci skip-rust] [ci skip-build-wheels]

properly use rules

cdac64b

wowowow [ci skip-rust] [ci skip-build-wheels]

better vectorisation

f1deda7

[ci skip-rust] [ci skip-build-wheels]

pull parsing inference results to helper rule

29d86b0

[ci skip-rust] [ci skip-build-wheels]

wire resolving parse into source analysis dump

69b5c2b

[ci skip-rust] [ci skip-build-wheels]

use peek's json encoder

1f87e1d

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

extract helper rule for resolving possible owners

acd07c4

[ci skip-rust] [ci skip-build-wheels]

wire possible other ownership into analysis

6b62b23

[ci skip-rust] [ci skip-build-wheels]

defaultdict isn't serialisable by asdict

260bb7f

[ci skip-rust] [ci skip-build-wheels]

add stage to determine importownerstatus

cb3071c

[ci skip-rust] [ci skip-build-wheels]

honour multiple unambiguous owners

44bdc28

memo to me: read the docs [ci skip-rust] [ci skip-build-wheels]

separate collection from reduction of imports information

be5f07f

[ci skip-rust] [ci skip-build-wheels]

push gathering one level down

6cde50a

exposes the ungathered results for the analysis easily [ci skip-rust] [ci skip-build-wheels]

wire in raw resolve results to analysis

22965f6

[ci skip-rust] [ci skip-build-wheels]

lilatomic added 24 commits November 16, 2022 14:07

evaporate ExecParseDepsRequest

5caa77c

[ci skip-rust] [ci skip-build-wheels]

evaporate ExecParseDepsResponse

f84667c

[ci skip-rust] [ci skip-build-wheels]

comments

8561c9f

[ci skip-rust] [ci skip-build-wheels]

have asset resolution return ImportResolveResult

d8c72b7

[ci skip-rust] [ci skip-build-wheels]

mark assets with inferred targets

ac227da

[ci skip-rust] [ci skip-build-wheels]

mark unambiguous assets as such

4ba01fb

[ci skip-rust] [ci skip-build-wheels]

flag ambiguous assets as such

a1fc482

[ci skip-rust] [ci skip-build-wheels]

wire assets into collected output

f00ac69

[ci skip-rust] [ci skip-build-wheels]

add option for flavour of analysis requested

00c41ef

[ci skip-rust] [ci skip-build-wheels]

test _get_imports_info

704e4ea

[ci skip-rust] [ci skip-build-wheels]

refactor test cases to all use helper

9d9dc0d

[ci skip-rust] [ci skip-build-wheels]

tests for finding other imports

9755415

[ci skip-rust] [ci skip-build-wheels]

add smoke test for debug goal

94a4eba

[ci skip-rust] [ci skip-build-wheels]

freeze dataclass

b322e72

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

simplify control flow

1fefb7e

- split vectorisation and logic - logic always returns a value # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

extract rule finding owners for single import

800bee2

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

rebuild as rule_helper

dff9184

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

rebuild as rule_helper

a6561af

remove now-redundant check

b345000

it was necessary to filter out cases where dep.address was `[]` before we iterated through the addresses. Now that we iterate them, though, `[]` contributes 0 elements

fix: test has proper precondition

490974b

- included dependency is actually included - added missing dependency

lint: proper name, remove print

d30a883

move to experimental

bbbc2de

better help text

ece4589

include python imports rules in debug rules backend

8e44026

so it can be independently imported

lilatomic force-pushed the feature/debug-goals branch from 72f6ae9 to 8e44026 Compare November 16, 2022 19:07

stuhood approved these changes Nov 17, 2022

View reviewed changes

stuhood enabled auto-merge (squash) November 17, 2022 17:31

stuhood merged commit c446133 into pantsbuild:main Nov 17, 2022

tdyas mentioned this pull request Nov 20, 2022

deprecation warning for python-dump-source-analysis goal #17597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug goals to python #17057

Add debug goals to python #17057

lilatomic commented Sep 29, 2022 •

edited by stuhood

Loading

lilatomic commented Sep 30, 2022

lilatomic commented Oct 1, 2022 •

edited

Loading

stuhood left a comment

stuhood Oct 18, 2022

stuhood Oct 18, 2022

lilatomic Oct 22, 2022

stuhood Oct 18, 2022

lilatomic Oct 22, 2022

stuhood Oct 18, 2022

stuhood commented Oct 18, 2022

lilatomic commented Oct 19, 2022

lilatomic commented Oct 22, 2022

lilatomic Oct 26, 2022

stuhood Oct 28, 2022

lilatomic Oct 26, 2022

stuhood Oct 28, 2022

lilatomic Nov 16, 2022

lilatomic Oct 26, 2022

lilatomic Oct 26, 2022

stuhood Oct 28, 2022

stuhood left a comment

stuhood Oct 28, 2022

stuhood Oct 28, 2022

stuhood Oct 28, 2022

stuhood Oct 28, 2022

lilatomic commented Nov 16, 2022

stuhood left a comment

		from pants.option.option_types import EnumOption


		class AnalysisFlavor(Enum):

		@rule
		async def _exec_parse_deps(

Add debug goals to python #17057

Add debug goals to python #17057

Conversation

lilatomic commented Sep 29, 2022 • edited by stuhood Loading

lilatomic commented Sep 30, 2022

lilatomic commented Oct 1, 2022 • edited Loading

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood commented Oct 18, 2022

lilatomic commented Oct 19, 2022

lilatomic commented Oct 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilatomic commented Nov 16, 2022

stuhood left a comment

Choose a reason for hiding this comment

lilatomic commented Sep 29, 2022 •

edited by stuhood

Loading

lilatomic commented Oct 1, 2022 •

edited

Loading