Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce InferContext #14956

Merged
merged 2 commits into from
Dec 18, 2024
Merged

Introduce InferContext #14956

merged 2 commits into from
Dec 18, 2024

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Dec 13, 2024

Summary

I'm currently on the fence about landing the #14760 PR because it's unclear how we'd support tracking used and unused suppression comments in a performant way:

  • Salsa adds an "untracked" dependency to every query reading accumulated values. This has the effect that the query re-runs on every revision. For example, a possible future query unused_suppression_comments(db, file) would re-run on every incremental change and for every file. I don't expect the operation itself to be expensive, but it all adds up in a project with 100k+ files
  • Salsa collects the accumulated values by traversing the entire query dependency graph. It can skip over sub-graphs if it is known that they contain no accumulated values. This makes accumulators a great tool for when they are rare; diagnostics are a good example. Unfortunately, suppressions are more common, and they often appear in many different files, making the "skip over subgraphs" optimization less effective.

Because of that, I want to wait to adopt salsa accumulators for type check diagnostics (we could start using them for other diagnostics) until we have very specific reasons that justify regressing incremental check performance.

This PR does a "small" refactor that brings us closer to what I have in #14760 but without using accumulators. To emit a diagnostic, a method needs:

  • Access to the db
  • Access to the currently checked file

This PR introduces a new InferContext that holds on to the db, the current file, and the reported diagnostics. It replaces the TypeCheckDiagnosticsBuilder. We pass the InferContext instead of the db to methods that might emit diagnostics. This simplifies some of the Outcome methods, which can now be called with a context instead of a db and the diagnostics builder. Having the db and the file on a single type like this would also be useful when using accumulators.

This PR doesn't solve the issue that the Outcome types feel somewhat complicated nor that it can be annoying when you need to report a Diagnostic, but you don't have access to an InferContext (or the file). However, I also believe that accumulators won't solve these problems because:

  • Even with accumulators, it's necessary to have a reference to the file that's being checked. The struggle would be to get a reference to that file rather than getting a reference to InferContext.
  • Users of the HasTy trait (e.g., a linter) don't want to bother getting the File when calling Type::return_ty because they aren't interested in the created diagnostics. They just want to know what calling the current expression would return (and if it even is a callable). This is what the different methods of Outcome enable today. I can ask for the return type without needing extra data that's only relevant for emitting a diagnostic.

A shortcoming of this approach is that it is now a bit confusing when to pass db and when an InferContext. An option is that we'd make the file on InferContext optional (it won't collect any diagnostics if None) and change all methods on Type to take InferContext as the first argument instead of a db. I'm interested in your opinion on this.

Accumulators are definitely harder to use incorrectly because they remove the need to merge the diagnostics explicitly and there's no risk that we accidentally merge the diagnostics twice, resulting in duplicated diagnostics. I still value performance more over making our life slightly easier.

Copy link
Contributor

github-actions bot commented Dec 13, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Dec 13, 2024
@MichaReiser MichaReiser changed the title Introduce TyContext Introduce InferContext Dec 13, 2024
@MichaReiser MichaReiser marked this pull request as ready for review December 13, 2024 13:03
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great to me! Even though it doesn't solve the problem of reporting diagnostics across the Type / TypeInferenceBuilder boundary, because of needing a node/location, it still looks like a clear improvement in API, and the DropBomb will help prevent dropping diagnostics by accident.

crates/red_knot_python_semantic/src/types/context.rs Outdated Show resolved Hide resolved
db: &'db dyn Db,
file: File,
diagnostics: std::cell::RefCell<TypeCheckDiagnostics>,
defused: DropBomb,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: the name defused confused me at first, because initially this is not defused! It's only defused after finish(). Would have been clearer to me if it were named bomb or drop_bomb

@carljm
Copy link
Contributor

carljm commented Dec 13, 2024

A shortcoming of this approach is that it is now a bit confusing when to pass db and when an InferContext. An option is that we'd make the file on InferContext optional (it won't collect any diagnostics if None) and change all methods on Type to take InferContext as the first argument instead of a db. I'm interested in your opinion on this.

I feel like the direction we were heading in the discussion this morning is to continue returning outcome/result types from Type operations, and keep nodes/locations in TypeInferenceBuilder (and sometimes passed into methods on those outcome/result types, which can then emit diagnostics.)

I think this direction makes sense. I don't want to be passing, or to require, nodes/locations everywhere in Type methods. This means we can't emit diagnostics in arbitrary places in Type methods, either.

Given that approach, I don't think it's too hard to understand where to pass Db vs InferContext. Type methods should only require a Db. InferContext should only be used where we need to emit diagnostics, which will generally just be in a few methods on result/outcome types.

@MichaReiser MichaReiser enabled auto-merge (squash) December 18, 2024 12:17
@MichaReiser MichaReiser merged commit 0fc4e8f into main Dec 18, 2024
20 checks passed
@MichaReiser MichaReiser deleted the micha/type-context branch December 18, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants