-
-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add debug goals to python #17057
Add debug goals to python #17057
Conversation
ok, thanks to folks in the Pants Slack, this is looking much better. Code duplication is minimal. I've also hooked a few levels deeper into inferrence, it now looks like: {
"address": "build-support/bin/_release_helper.py:py_scripts",
"identified": {
"imports": {
"__future__.annotations": {
"lineno": 4,
"weak": false
},
"argparse": {
"lineno": 6,
"weak": false
},
...
"pants.util.contextutil.temporary_dir": {
"lineno": 34,
"weak": false
},
"pants.util.memo.memoized_property": {
"lineno": 35,
"weak": false
},
"pants.util.strutil.softwrap": {
"lineno": 36,
"weak": false
},
"pants.util.strutil.strip_prefix": {
"lineno": 36,
"weak": false
}
},
"assets": [
"3rdparty/python/requirements.txt",
"build-support/bin/get_os.sh"
]
},
"resolved": {
"imports": [
"3rdparty/python#types-requests",
"src/python/pants/util/strutil.py",
"build-support/bin/common.py:py_scripts",
"3rdparty/python#requests",
"src/python/pants/util/contextutil.py",
"src/python/pants/util/memo.py",
"build-support/bin/reversion.py:py_scripts",
"3rdparty/python#packaging"
],
"unowned": [],
"assets": [],
"explicit": {
"address": "build-support/bin/_release_helper.py:py_scripts",
"includes": [],
"ignores": []
}
}
} This is pretty good! But with the hinting in |
hey now that's looking good, and now with assets wired in too: {
"src/python/pants/backend/python/dependency_inference/t.py": {
"imports": [
{
"name": "weakimport",
"reference": {
"lineno": 12,
"weak": true
},
"resolved": {
"status": "ImportOwnerStatus.weak_ignore",
"address": []
},
"possible_resolve": null
},
{
"name": "json",
"reference": {
"lineno": 4,
"weak": false
},
"resolved": {
"status": "ImportOwnerStatus.unownable",
"address": []
},
"possible_resolve": null
},
{
"name": "watchdog",
"reference": {
"lineno": 6,
"weak": false
},
"resolved": {
"status": "ImportOwnerStatus.unowned",
"address": []
},
"possible_resolve": [
[
"src/python/pants/backend/helm/subsystems:yamlpath",
"helm-post-renderer"
]
]
},
{
"name": "yaml",
"reference": {
"lineno": 7,
"weak": false
},
"resolved": {
"status": "ImportOwnerStatus.unambiguous",
"address": [
"3rdparty/python#PyYAML",
"3rdparty/python#types-PyYAML"
]
},
"possible_resolve": null
}
],
"assets": [
{
"name": "src/python/configs/prod.json",
"reference": "src/python/configs/prod.json",
"resolved": {
"status": "ImportOwnerStatus.unowned",
"address": []
},
"possible_resolve": null
},
{
"name": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
"reference": "testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py",
"resolved": {
"status": "ImportOwnerStatus.unambiguous",
"address": [
"testprojects/pants-plugins/src/python/test_pants_plugin/__init__.py:../../../../pants_plugins_directory"
]
},
"possible_resolve": null
}
]
}
} (don't ask about any of the data, it's 100% shenanigans.) Code needs a cleanup and tests, and then it'll be ready to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
I didn't look a whole lot at the debug goal itself: mostly at the impact on the runtime, because a general principle that I think will be important in landing this is that we don't want the debug goals to cost us any runtime performance in the hot path. So it would be good to ensure that any indirection that is introduced to allow for consuming the output in the debug goals does not impact runtime.
unownable = "unownable" | ||
|
||
|
||
@dataclass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dataclass() | |
@dataclass(frozen=True) |
resolve_results[filepath] = ImportResolveResult( | ||
ImportOwnerStatus.unambiguous, possible_addresses | ||
) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this cluster of if-elses might be clearer if every branch sets a local variable ImportOwnerStatus
, and then has a single line to update resolve_results
with that value. Otherwise the type checker isn't actually helping to guarantee that all branches mutate the dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. I've refactored to use a function and comprehension.
@rule | ||
async def find_other_owners_for_unowned_imports( | ||
req: UnownedImportsPossibleOwnersRequest, | ||
python_setup: PythonSetup, | ||
) -> UnownedImportsPossibleOwners: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a @rule
, rather than a @_rule_helper
? Is it likely to be called multiple times with the same inputs (such that it gains benefit from memoization)? If not, maybe @_rule_helper
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think that memoising individual possible owners would be useful (since it might be imported a few times), but not as much the whole bundle. I'll try rebuilding it.
@rule | ||
async def _exec_parse_deps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is purely an internal API called to produce ResolvedParsedPythonDependencies
, then ditto potentially @_rule_helper
?
It looks like you might need to rebase this to make CI happy... not sure why though. |
Yes, impacting performance has always been looming over this MR. I thought of having separate "optimal" and "debug" pathways, but divergence between the 2 is a source of bugs. I don't have a good intuition for what would impact performance: I don't know the relative overhead of a rule invocation vs a function invocation, or of passing larger datastructures across calls. Do we have a way to performance-test Pants? |
8ca59b9
to
787c0e6
Compare
performance seems fine? I'm not sure what a good benchmark to test is
|
) -> Iterator[Address]: | ||
for filepath in assets: | ||
) -> dict[str, ImportResolveResult]: | ||
def _resolve_single_asset(filepath) -> ImportResolveResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about this being an internal method. Extracting it would make it a bit easier to test, but would also mean that we have to pass all the other parameters through. I'd normally make a class (AssetDependencyResolver or something) to hold on to those, but that doesn't seem to be the style with Pants. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we definitely bias more toward a functional style, although there are helper classes here and there. No preference in this case, and probably fine to only test the outer method.
|
||
|
||
@dataclass(frozen=True) | ||
class ImportResolveResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on whether I should add smart constructors? It's would help ensure that we always have the expected field combinations (ex every unambiguous result has an owner), but also it's more boilerplate.
ex:
@staticmethod
def disambiguated(maybe_disambiguated):
return ImportResolveResult(ImportOwnerStatus.disambiguated, (maybe_disambiguated,))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going to skip it, save's on a layer of indirection
|
||
@dataclass(frozen=True) | ||
class UnownedImportsPossibleOwners: | ||
value: Dict[str, list[tuple[Address, ResolveName]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add a dataclass for tuple[Address, ResolveName]
?
from pants.option.option_types import EnumOption | ||
|
||
|
||
class AnalysisFlavor(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to better names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too concerned with the name, but a bit longer help string on the option would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Only blocking comment is probably moving this into a dedicated experimental
backend.
) -> Iterator[Address]: | ||
for filepath in assets: | ||
) -> dict[str, ImportResolveResult]: | ||
def _resolve_single_asset(filepath) -> ImportResolveResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we definitely bias more toward a functional style, although there are helper classes here and there. No preference in this case, and probably fine to only test the outer method.
|
||
|
||
@dataclass(frozen=True) | ||
class ImportResolveResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine either way.
from pants.option.option_types import EnumOption | ||
|
||
|
||
class AnalysisFlavor(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too concerned with the name, but a bit longer help string on the option would be good.
# Test | ||
*debug_goals.rules(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably go in their own backend / register.py
, similar to the JVM and Go debug goals: https://github.com/pantsbuild/pants/tree/main/src/python/pants/backend/experimental/java/debug_goals
That's mostly because the goal names / flags are experimental / unstable: it would be amazing to eventually merge them.
3af65aa
to
819a104
Compare
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
wowowow [ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
memo to me: read the docs [ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
exposes the ungathered results for the analysis easily [ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
- split vectorisation and logic - logic always returns a value # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
it was necessary to filter out cases where dep.address was `[]` before we iterated through the addresses. Now that we iterate them, though, `[]` contributes 0 elements
- included dependency is actually included - added missing dependency
so it can be independently imported
72f6ae9
to
8e44026
Compare
sure thing, done. I think we should be good for a re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
rel #17039
We're ready for review! Given a testbed of
input
we get
output
Telling you, for each file, for each import, what dependencies pants thought it could have, and what it decided to do with them.
This uses almost all the same code as the main dependency inference code, with the exception of the top-level orchestration of it. I think that's pretty close, there's about 100 lines of semi-duplicate code.
There's also a more advanced mode that dumps information about each stage of the process. I think this might be useful for people digging through the dependency inference process but not really for end-users. we get it for free, though.
Fixes #17039.
this is fairly critical for performance, so here are benchmarks (with comparison-of-means t-test)
hyperfine --runs=10 './pants --no-pantsd dependencies --transitive ::'
hyperfine --warmup=1 --runs=10 './pants dependencies --transitive ::'
hyperfine --runs=10 './pants --no-pantsd dependencies ::'
hyperfine --warmup=1 --runs=10 './pants dependencies ::'
So it looks like this MR might impact performance, by about 1%, although those p-values are mighty unconvincing. LMK if we want to increase runs and get more statistics, I've run the stats a few times throughout and this looks about right, so I think we can proceed with the review under the assumption that there is currently a 1% performance overhead. I'm open to suggestions on improving performance.