Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FullyQualifiedNameProvider: Optionally consider pyproject.toml files when determining a file's module name and package #1148

Merged
merged 15 commits into from
Jun 12, 2024

Conversation

camillol
Copy link
Contributor

When determining the module name and package for a source file, LibCST currently takes the file name relative to the repository root. With this change, it will instead look for the closest directory containing a pyproject.toml file in the file's ancestors, and consider the file name relative to that directory.

This gives correct module and package names when processing a repository containing multiple packages. For files that are not covered by a pyproject.toml, the search ends with the given repository root, thus retaining the old behavior.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 14, 2024
Copy link
Member

@zsol zsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concerns with this approach:

  • calculate_module_and_package shouldn't do any IO.
  • the location of a pyproject.toml is a heuristic that shouldn't be part of the core logic, it should be behind a switch that can be enabled/disabled
  • I'd argue that the default behavior should be the old one

Maybe it's better if the passed in repo_root has already applied this heuristic if it's desirable.

@camillol
Copy link
Contributor Author

@zsol thanks for taking a look! For context, my use case is a single repository containing many interdependent packages. Because of the interdependencies, it needs to be analyzed as a whole (e.g. I need information on the entire class hierarchy, which spans multiple packages).

AFAICT, the main user of calculate_module_and_package (and the one I'm interested in) is FullyQualifiedNameProvider. It works with FullRepoManager, and the gen_cache method receives only the repo root path and a list of file paths to process.

So, it's not feasible for me to have a separate repo root per package, and I don't see a way to feed the package information into FQNP.gen_cache given the current signature.

AFAICT, gen_cache is meant to do I/O, and does I/O in other providers (e.g. TypeInferenceProvider). I assume the requirement for calculate_module_and_package not to do I/O is for testing purposes. I could make a variant of this function and have FQNP call it, although it could cause problems down the road if something else decides to call calculate_module_and_package and gets inconsistent results with FQNP.

So I think the best approach would be to have a global switch. Or perhaps an option on the FullRepoManager, but in that case I'd still have to change the provider interface a bit to be able to wire it through.

Regarding the default behavior, I'd argue that the existing behavior of inferring the package from the file path relative to the repo root is also a heuristic, and for a repo that contains pyproject.toml files it is less likely to give the intended results than what I am proposing. But I'm fine with pyproject.toml parsing not being the default.

@zsol
Copy link
Member

zsol commented May 28, 2024

AFAICT, gen_cache is meant to do I/O, and does I/O in other providers (e.g. TypeInferenceProvider).

That's right. I propose to do the pyproject.toml finding in FQNP's gen_cache, and then pass in a root_path to calculate_module_and_package based on that.

My main concern is that calculate_module_and_package is not just an internal implementation detail, it's exposed in libcst.helpers and other code (outside of libcst) might be using it.

Regarding the default behavior, I'd argue that the existing behavior of inferring the package from the file path relative to the repo root is also a heuristic, and for a repo that contains pyproject.toml files it is less likely to give the intended results than what I am proposing. But I'm fine with pyproject.toml parsing not being the default.

Agreed that the existing behavior is also just a heuristic, but I'm not convinced that the pyproject.toml-based heuristic is strictly better (or worse, it's just different), and so I don't think it warrants a backwards incompatible change. Let's make it configurable.

@camillol
Copy link
Contributor Author

Sounds good. I don't see an existing API for this kind of global configuration. Is it ok if I keep it simple and just make it a global variable?

@zsol
Copy link
Member

zsol commented May 28, 2024

I don't see an existing API for this kind of global configuration.

Ugh, that's annoying :( Ideally this would be stored in the libcst config file (that's read here) and then passed into FullRepoManager here to be passed into gen_cache.

Maybe we should change the API for gen_cache to accept a configuration object instead of just root_path and timeout. Would you be up for that?

@camillol
Copy link
Contributor Author

Oh, I'm not using the CLI at all. I'm using LibCST as a library from another program, so that config file doesn't apply.

gen_cache already has an optional argument that most implementations ignore (timeout). I could generalize that as kwargs, and add a keyword argument use_pyproject, provided by FullRepoManager.

@zsol
Copy link
Member

zsol commented May 28, 2024

Sounds good, then!

Copy link

codecov bot commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 91.15%. Comparing base (9f54920) to head (452868c).
Report is 17 commits behind head on main.

Files Patch % Lines
libcst/helpers/module.py 92.30% 1 Missing ⚠️
libcst/metadata/base_provider.py 75.00% 1 Missing ⚠️
libcst/metadata/tests/test_metadata_wrapper.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
+ Coverage   91.12%   91.15%   +0.03%     
==========================================
  Files         256      261       +5     
  Lines       26567    26864     +297     
==========================================
+ Hits        24210    24489     +279     
- Misses       2357     2375      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camillol
Copy link
Contributor Author

camillol commented Jun 3, 2024

@zsol I think it's done, PTAL!

Copy link
Member

@zsol zsol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zsol zsol changed the title Consider pyproject.toml files when determining a file's module name and package FullyQualifiedNameProvider: Optionally consider pyproject.toml files when determining a file's module name and package Jun 12, 2024
@zsol zsol merged commit 9f6e276 into Instagram:main Jun 12, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants