Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug goals to python #17057

Merged
merged 43 commits into from
Nov 17, 2022
Merged

Conversation

lilatomic
Copy link
Contributor

@lilatomic lilatomic commented Sep 29, 2022

rel #17039
We're ready for review! Given a testbed of

input
# Copyright 2022 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

import json  # unownable, root level
import os.path  # unownable, not root level

import watchdog  # dependency not included
import yaml  # dependency included
import yamlpath  # owned by other resolve

try:
    import weakimport  # weakimport missing
except ImportError:
    ...

open("src/python/configs/prod.json")  # asset
open("testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py")

we get

output
{
  "src/python/pants/backend/python/dependency_inference/t.py": {
    "imports": [
      {
        "name": "weakimport",
        "reference": {
          "lineno": 12,
          "weak": true
        },
        "resolved": {
          "status": "ImportOwnerStatus.weak_ignore",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "json",
        "reference": {
          "lineno": 4,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unownable",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "os.path",
        "reference": {
          "lineno": 5,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unownable",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "watchdog",
        "reference": {
          "lineno": 7,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "yaml",
        "reference": {
          "lineno": 8,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "3rdparty/python#PyYAML",
            "3rdparty/python#types-PyYAML"
          ]
        },
        "possible_resolve": null
      },
      {
        "name": "yamlpath",
        "reference": {
          "lineno": 9,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": [
          [
            "src/python/pants/backend/helm/subsystems:yamlpath",
            "helm-post-renderer"
          ]
        ]
      }
    ],
    "assets": [
      {
        "name": "src/python/configs/prod.json",
        "reference": "src/python/configs/prod.json",
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "reference": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py:../../../../pants_plugins_directory"
          ]
        },
        "possible_resolve": null
      }
    ]
  }
}

Telling you, for each file, for each import, what dependencies pants thought it could have, and what it decided to do with them.
This uses almost all the same code as the main dependency inference code, with the exception of the top-level orchestration of it. I think that's pretty close, there's about 100 lines of semi-duplicate code.

There's also a more advanced mode that dumps information about each stage of the process. I think this might be useful for people digging through the dependency inference process but not really for end-users. we get it for free, though.

Fixes #17039.


this is fairly critical for performance, so here are benchmarks (with comparison-of-means t-test)

main this difference P-score
hyperfine --runs=10 './pants --no-pantsd dependencies --transitive ::' 21.839 s ± 0.326 s 22.142 s ± 0.283 s 1.38% 0.0395
hyperfine --warmup=1 --runs=10 './pants dependencies --transitive ::' 1.798 s ± 0.074 s 1.811 s ± 0.076 s 0.72% 0.7029
hyperfine --runs=10 './pants --no-pantsd dependencies ::' 21.547 s ± 0.640 s 21.863 s ± 1.072 s 1.47% 0.4339
hyperfine --warmup=1 --runs=10 './pants dependencies ::' 1.828 s ± 0.091 s 1.844 s ± 0.105 s 0.88% 0.7200

So it looks like this MR might impact performance, by about 1%, although those p-values are mighty unconvincing. LMK if we want to increase runs and get more statistics, I've run the stats a few times throughout and this looks about right, so I think we can proceed with the review under the assumption that there is currently a 1% performance overhead. I'm open to suggestions on improving performance.

@lilatomic
Copy link
Contributor Author

ok, thanks to folks in the Pants Slack, this is looking much better. Code duplication is minimal. I've also hooked a few levels deeper into inferrence, it now looks like:

{
  "address": "build-support/bin/_release_helper.py:py_scripts",
  "identified": {
    "imports": {
      "__future__.annotations": {
        "lineno": 4,
        "weak": false
      },
      "argparse": {
        "lineno": 6,
        "weak": false
      },
...
      "pants.util.contextutil.temporary_dir": {
        "lineno": 34,
        "weak": false
      },
      "pants.util.memo.memoized_property": {
        "lineno": 35,
        "weak": false
      },
      "pants.util.strutil.softwrap": {
        "lineno": 36,
        "weak": false
      },
      "pants.util.strutil.strip_prefix": {
        "lineno": 36,
        "weak": false
      }
    },
    "assets": [
      "3rdparty/python/requirements.txt",
      "build-support/bin/get_os.sh"
    ]
  },
  "resolved": {
    "imports": [
      "3rdparty/python#types-requests",
      "src/python/pants/util/strutil.py",
      "build-support/bin/common.py:py_scripts",
      "3rdparty/python#requests",
      "src/python/pants/util/contextutil.py",
      "src/python/pants/util/memo.py",
      "build-support/bin/reversion.py:py_scripts",
      "3rdparty/python#packaging"
    ],
    "unowned": [],
    "assets": [],
    "explicit": {
      "address": "build-support/bin/_release_helper.py:py_scripts",
      "includes": [],
      "ignores": []
    }
  }
}

This is pretty good! But with the hinting in _handle_unowned_imports, I think I can make it more useful by putting that into the export as well. There's also the fact that this isn't super usable in itself. It would be nice to see the resolved imports placed alongside the actual import statements.

@lilatomic
Copy link
Contributor Author

lilatomic commented Oct 1, 2022

hey now that's looking good, and now with assets wired in too:

{
  "src/python/pants/backend/python/dependency_inference/t.py": {
    "imports": [
      {
        "name": "weakimport",
        "reference": {
          "lineno": 12,
          "weak": true
        },
        "resolved": {
          "status": "ImportOwnerStatus.weak_ignore",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "json",
        "reference": {
          "lineno": 4,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unownable",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "watchdog",
        "reference": {
          "lineno": 6,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": [
          [
            "src/python/pants/backend/helm/subsystems:yamlpath",
            "helm-post-renderer"
          ]
        ]
      },
      {
        "name": "yaml",
        "reference": {
          "lineno": 7,
          "weak": false
        },
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "3rdparty/python#PyYAML",
            "3rdparty/python#types-PyYAML"
          ]
        },
        "possible_resolve": null
      }
    ],
    "assets": [
      {
        "name": "src/python/configs/prod.json",
        "reference": "src/python/configs/prod.json",
        "resolved": {
          "status": "ImportOwnerStatus.unowned",
          "address": []
        },
        "possible_resolve": null
      },
      {
        "name": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "reference": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
        "resolved": {
          "status": "ImportOwnerStatus.unambiguous",
          "address": [
            "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py:../../../../pants_plugins_directory"
          ]
        },
        "possible_resolve": null
      }
    ]
  }
}

(don't ask about any of the data, it's 100% shenanigans.) Code needs a cleanup and tests, and then it'll be ready to go

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

I didn't look a whole lot at the debug goal itself: mostly at the impact on the runtime, because a general principle that I think will be important in landing this is that we don't want the debug goals to cost us any runtime performance in the hot path. So it would be good to ensure that any indirection that is introduced to allow for consuming the output in the debug goals does not impact runtime.

unownable = "unownable"


@dataclass()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@dataclass()
@dataclass(frozen=True)

resolve_results[filepath] = ImportResolveResult(
ImportOwnerStatus.unambiguous, possible_addresses
)
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this cluster of if-elses might be clearer if every branch sets a local variable ImportOwnerStatus, and then has a single line to update resolve_results with that value. Otherwise the type checker isn't actually helping to guarantee that all branches mutate the dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. I've refactored to use a function and comprehension.

Comment on lines 362 to 363
@rule
async def find_other_owners_for_unowned_imports(
req: UnownedImportsPossibleOwnersRequest,
python_setup: PythonSetup,
) -> UnownedImportsPossibleOwners:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a @rule, rather than a @_rule_helper? Is it likely to be called multiple times with the same inputs (such that it gains benefit from memoization)? If not, maybe @_rule_helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think that memoising individual possible owners would be useful (since it might be imported a few times), but not as much the whole bundle. I'll try rebuilding it.

Comment on lines 444 to 460
@rule
async def _exec_parse_deps(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is purely an internal API called to produce ResolvedParsedPythonDependencies, then ditto potentially @_rule_helper?

@stuhood
Copy link
Member

stuhood commented Oct 18, 2022

It looks like you might need to rebase this to make CI happy... not sure why though.

@lilatomic
Copy link
Contributor Author

Yes, impacting performance has always been looming over this MR. I thought of having separate "optimal" and "debug" pathways, but divergence between the 2 is a source of bugs. I don't have a good intuition for what would impact performance: I don't know the relative overhead of a rule invocation vs a function invocation, or of passing larger datastructures across calls. Do we have a way to performance-test Pants?

@lilatomic lilatomic force-pushed the feature/debug-goals branch from 8ca59b9 to 787c0e6 Compare October 22, 2022 17:07
@lilatomic
Copy link
Contributor Author

performance seems fine? I'm not sure what a good benchmark to test is

main this
hyperfine --runs=10 './pants --no-pantsd dependencies --transitive ::' 19.271 s ± 1.163 s 19.263 s ± 1.181 s
hyperfine --warmup=1 --runs=10 './pants dependencies --transitive ::' 1.687 s ± 0.048 s 1.694 s ± 0.072 s

@thejcannon thejcannon self-requested a review October 26, 2022 15:40
) -> Iterator[Address]:
for filepath in assets:
) -> dict[str, ImportResolveResult]:
def _resolve_single_asset(filepath) -> ImportResolveResult:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about this being an internal method. Extracting it would make it a bit easier to test, but would also mean that we have to pass all the other parameters through. I'd normally make a class (AssetDependencyResolver or something) to hold on to those, but that doesn't seem to be the style with Pants. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we definitely bias more toward a functional style, although there are helper classes here and there. No preference in this case, and probably fine to only test the outer method.



@dataclass(frozen=True)
class ImportResolveResult:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on whether I should add smart constructors? It's would help ensure that we always have the expected field combinations (ex every unambiguous result has an owner), but also it's more boilerplate.
ex:

@staticmethod
def disambiguated(maybe_disambiguated):
    return ImportResolveResult(ImportOwnerStatus.disambiguated, (maybe_disambiguated,))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to skip it, save's on a layer of indirection


@dataclass(frozen=True)
class UnownedImportsPossibleOwners:
value: Dict[str, list[tuple[Address, ResolveName]]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add a dataclass for tuple[Address, ResolveName]?

from pants.option.option_types import EnumOption


class AnalysisFlavor(Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to better names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too concerned with the name, but a bit longer help string on the option would be good.

@lilatomic lilatomic marked this pull request as ready for review October 27, 2022 00:20
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only blocking comment is probably moving this into a dedicated experimental backend.

) -> Iterator[Address]:
for filepath in assets:
) -> dict[str, ImportResolveResult]:
def _resolve_single_asset(filepath) -> ImportResolveResult:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we definitely bias more toward a functional style, although there are helper classes here and there. No preference in this case, and probably fine to only test the outer method.



@dataclass(frozen=True)
class ImportResolveResult:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine either way.

from pants.option.option_types import EnumOption


class AnalysisFlavor(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too concerned with the name, but a bit longer help string on the option would be good.

Comment on lines 87 to 88
# Test
*debug_goals.rules(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably go in their own backend / register.py, similar to the JVM and Go debug goals: https://github.com/pantsbuild/pants/tree/main/src/python/pants/backend/experimental/java/debug_goals

That's mostly because the goal names / flags are experimental / unstable: it would be amazing to eventually merge them.

@lilatomic lilatomic force-pushed the feature/debug-goals branch from 3af65aa to 819a104 Compare October 29, 2022 23:13
[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
wowowow

[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
memo to me: read the docs

[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
exposes the ungathered results for the analysis easily

[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
[ci skip-rust]

[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
- split vectorisation and logic
- logic always returns a value

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
it was necessary to filter out cases where dep.address was `[]`
before we iterated through the addresses.
Now that we iterate them, though, `[]` contributes 0 elements
- included dependency is actually included
- added missing dependency
so it can be independently imported
@lilatomic
Copy link
Contributor Author

Looks great! Only blocking comment is probably moving this into a dedicated experimental backend.

sure thing, done. I think we should be good for a re-review.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stuhood stuhood enabled auto-merge (squash) November 17, 2022 17:31
@stuhood stuhood merged commit c446133 into pantsbuild:main Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python dependency inference: add/emit debugging information
2 participants