Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-based completion in the new completer #1577

Merged
merged 18 commits into from
Sep 15, 2023
Merged

Type-based completion in the new completer #1577

merged 18 commits into from
Sep 15, 2023

Conversation

jneem
Copy link
Member

@jneem jneem commented Sep 5, 2023

Updates the new completer to get completions from types also.

I also added a (hopefully short-lived) feature flag to turn off the old completer. This makes it easier to test the new completer.

@jneem
Copy link
Member Author

jneem commented Sep 5, 2023

(By the way, no particular need to get this into 1.2. I'm happy to hold off on merging until we get that out)

@github-actions github-actions bot temporarily deployed to pull request September 5, 2023 20:20 Inactive
lsp/nls/src/cache.rs Outdated Show resolved Hide resolved
lsp/nls/src/field_walker.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request September 7, 2023 14:50 Inactive
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say I'm not sure to understand the overall strategy very well, and in particular why do we need multiple linearizer and to linearize several times. Also, it seems like we do completion either using the term definition, either using the type when it's just a type, but not both. If this is indeed the case, I think it doesn't follow the philosophy that is laid out in the notes about the semantics of completion. Completion should take information from every possible source: if there is a defined value, a type annotation, and multiple contract annotations including a dict annotation with a record as a sub contract (think {foo | {_ : {bar : Number, ..}} = {baz = {other = 2}}}, then we should compute completion information from all of those sources at the same time, and propose both bar and other when trying to complete foo.baz.[.]. Dictionaries might be trickier, but this is not really the point - we can do that later.

lsp/nls/src/cache.rs Outdated Show resolved Hide resolved
/// `{ foo: Number }`
/// ^-----------------------------------------------------------|
/// then in the expression `x.foo`, the definition of `foo` is here | and the
/// value of its definition is `Number`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might understand as I go more into the code, but this part tickles me. If you have {foo : Number}, then x.foo is undefined, it's not Number (but the LSP knows that it has the type Number). Here it seems like we conflate {foo : Number} with {foo = Number} or something? And how {foo : Number = 1} is represented? What if you have something like {foo | {bar, baz, ..} = {blorg}}, where the completion information should be bar, baz, blorg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that Value (maybe not well named) is a thing that might have fields for recursing into when resolving paths. It could be a term (like { foo = 1 }) or a type (like { foo : Number }). For completion (and eventually goto definition) we need to be able to recurse through both of these cases. So for example, if x has type { foo : { bar : { baz : Number }}} and we ask for the fields on x.foo.bar, then in the first call to FieldDefs::resolve we'll find that x has a field foo with "value" { bar : { baz : Number }}, and then on the next call to FieldDefs::resolve we'll find the subfield bar with "value" { baz: Number }.

The recursion can look at both terms and types/contracts. For example, if you have
let x | { bar, baz, .. } = { blorg } in x.
then in the Term::Annotated branch we collect the completions from both the annotation and the contract. We also accumulate fields from inferred types in FieldDefs::resolve, and combine those fields with the ones from the term.

Thanks to this question, I realized that I'd forgotten to accumulate fields from contracts and types in record field metadata; this is fixed in the latest commit. With the fix, if you call resolve_path on

{ foo : { bar : Number } = { bar = 1 }}

with the path foo:

  • on the first iteration, we'll see that the overall expression has an ident foo, and its Def has value { bar = 1 } and metadata { bar : Number }.
  • on the second iteration, we'll recurse into the previous Def for foo. We'll visit the value, which has a field bar (whose Def has value 1, and no metadata). And we'll visit the metadata, which has a field bar (whose Def has value Number and no metadata). So we'll return two Defs, one for each occurrence of bar.

Copy link
Member

@yannham yannham Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. I guess it's about naming and code organization rather than semantics, but what I find strange is that a metadata (a type annotation or a contract annotation) at a field level is treated differently depending on if the field has a definite value. Correct me if I'm wrong, but:

# case 1
{ foo | {..1..} = {..2..}}

# case 2
{ foo : {..1..}}

In both cases foo has the same contract attached. But in the first case, the contract will go in the "metadata" for the completer, and the definition will be {..2..}, while, in the other case, the definition will be {..1..} (and then, will foo have "metadata" in the sense of the completer?)

It seems, from a distance, that we shouldn't actually make a distinction between what's on the right of the equal, what's in metadata.annotation.contracts and metadata.annotation.typ. All of them are equal sources of completion information, et we might have to recurse into all of them (in fact, for something which definition is a record, having this record on the right of the equal or as a contract annotation doesn't make any difference for the completer, I believe). That's why I think I find confusing the notion of definition: it seems to arbitrarily pick one source of completion as the "definition", when there are several ones. You say that the completer uses all of the information sources then, so it's fine operationally, but I'm wondering about the modeling. For example, would that make sense to (very schematically, there might be other considerations):

  • rename Defn -> CompletionSource
  • make it possible to attach multiple completion sources to each symbol/field/whatever, instead of only one definition

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I used definition/Defn a lot, but I believe I'm rather talking about Value : each field should have one definition, but instead of having one value, it would make more sense to me that they have multiple completion sources. Or, if you don't want to duplicate metadata and mimic the original's field structure, keep the metadata separate and set value to None for {foo : {bar : Number}}. I think what's confusing me is that we do something in between: when there's a definite value we keep the same structure as the original field - value inside value and the rest in the metadata -, but for some reason, when there's no definition, we put the type annotation in the value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks, I'll see if I can find a better model for this. A few considerations:

  • the name Def was chosen because the same infrastructure is supposed to eventually drive "goto definition/goto declaration"
  • we want to keep (at least part of) the metadata attached to fields: it's used for generating documentation, and not just for finding subfields

Maybe not relevant for much longer, but the current determinant for whether Value is a type or a term is that when visiting a type in resolve_type, the value is a type. When visiting a term in resolve_term, the value is a term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I better see your point of "when resolving a path, we might end up on a subpart either of a type of a term" now. Then do you use metadata only for the documentation? Can Def.metadata be set if value is a type for example?

During resolution, what happens if you end up on a value that has both a value definition, contract annotations and type annotations (say at on a field such as {foo | {..1..} : {..2..} = {..3..}. Does it gives rise to as many other values that you explore (one for each item), with the path copied?

Sorry for all the questions 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, questions are great!

For your first question, we're currently using it only for documentation. There are potentially other metadata fields in the completion response that we could be filling out; I haven't looked in much detail yet. Right now metadata is only set for fields in a Term::Record, so I guess that means it's only paired with value being a term.

The answer depends on what "end up on a value" means:

  • if you ask for all the fields on {foo | {..1..} : {..2..} = {..3..}} then you will get back a single Def, with identifier foo, value {..3..}, and the rest in metadata
  • but if you take {foo | {..1..} : {..2..} = {..3..}} and ask "what are the fields on the foo field?" (i.e. you call resolve_path with path [foo]) then there will be three recursive calls to FieldDefs::resolve: one for the value, one for the contract, and one for the type. The results of those will be merged together.

Copy link
Member Author

@jneem jneem Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this modelling issue might be resolved by a refactor that I was thinking of doing anyway: making resolve one step lazier than it currently is. That is, currently resolve returns a collection of Idents and Def. Instead, I want resolve to return a collection of things that have fields. So basically, the old way can be obtained from the new way by flat-mapping over the things that have fields and getting all their fields.

"Things that have fields" are

#[derive(Clone, Debug, PartialEq)]
pub enum FieldHaver { // I'd welcome a better name suggestion!
    RecordTerm(RecordData),
    Dict(Type),
    RecordType(RecordRows),
}

There are three advantages of this:

  • efficiency: when resolving a long path, we aren't interested in most fields of the intermediate path elements, and so it was wasteful to collect them all
  • avoids some special-casing to handle the annoyance that we can't collect all fields of a Dict
  • avoids the need to have a Def type describing the "value" of a field.

I have a draft of this and it seems to work ok. If it sounds reasonable, I'll finish it up on Monday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, from a conceptual point of view, I do find the FieldHaver modeling/naming idea simpler and more natural, indeed.

pub struct Uninit;

/// Marker trait for possible states of the linearization
pub trait LinearizationState {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here was to avoid having nonsensical things like Linearization<Ident>. Of course you could still do it but you need the extra step of impl LinearizationState for Ident. But I'm not convinced this was really useful, so it's probably fine to remove

core/src/typecheck/unif.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request September 7, 2023 17:46 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 7, 2023

We discussed this synchronously, but just so there's a record of it: I've replaced the double linearization pass by a single linearization pass that records two things. The eventual goal is to replace the AnalysisHost linearizer with the much simpler TypeCollector linearizer. But there's a substantial migration path, so in the meantime I've hopefully found a nicer way for them to coexist.

As for accumulating both the type and term information, I believe this is done correctly now. There are three cases here

  • a Term::Annotated merges the defs from the contracts, type, and term
  • every term with an inferred type merges in the defs from its type
  • defs with metadata (which come from record fields with metadata) merge in the defs from their metadata.

(This last one was missing until just recently)

@github-actions github-actions bot temporarily deployed to pull request September 7, 2023 18:02 Inactive
@jneem jneem mentioned this pull request Sep 7, 2023
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan discussed in the comments is compelling, and this PR is totally functional, so it might be easier reviewing-wise to merge this first and improve then.

@github-actions github-actions bot temporarily deployed to pull request September 11, 2023 15:35 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 11, 2023

Ah sorry, while you were typing that I finished and pushed the refactor...

@yannham
Copy link
Member

yannham commented Sep 11, 2023

It's not very important 🙂 just ping me when it's ready to review

@github-actions github-actions bot temporarily deployed to pull request September 14, 2023 16:27 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 14, 2023

@yannham This has been rebased and it's ready for another look. The main changes compared to last time are in field_walker.rs

@github-actions github-actions bot temporarily deployed to pull request September 14, 2023 16:32 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 14, 2023 17:04 Inactive
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look into all the details again, but I found the field_walker part easier to follow, and much less confusing 👍

///
/// So, for `term.x.y.z`, this will be `vec!["x", "y", "z"]`.
pub path: Vec<Ident>,
pub enum FieldHaver {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldHolder? FieldContainer? Recordoid (ok, maybe not this one) ? I'm just randomly spitting suggestions, feel free to ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 I stuck with FieldHaver for now; we can always rename it later.

pub value: Option<RichTerm>,
/// Field metadata.
pub metadata: Option<FieldMetadata>,
enum FieldValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about FieldContent or FieldData (or something else) ? All in all I believe value is generally a reasonable name for it, but it's already quite overloaded in Nickel (values in the sense of the lambda calculus, and Field already has a member named value, which is rather a RichTerm)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldContent it is.

Comment on lines 123 to 125
pub fn ident(&self) -> LocIdent {
self.ident
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's is the expected usage of self.ident() instead of just doing self.ident ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had a reason for it originally, but I can't figure out what 😕 It's gone now.

@github-actions github-actions bot temporarily deployed to pull request September 15, 2023 18:47 Inactive
@jneem jneem added this pull request to the merge queue Sep 15, 2023
Merged via the queue into master with commit b2459d3 Sep 15, 2023
4 checks passed
@jneem jneem deleted the nls-types branch September 15, 2023 19:25
suimong pushed a commit to suimong/nickel that referenced this pull request Sep 17, 2023
* Collect types into a table

* Type-based completion

This also adds a feature flag for turning off the old completer.

* Remove bogus file

* Explicitly visit type annotations

* Add a test for annotated types

* Only do one linearization traversal

* Fix docs, and avoid some clones

* Combine, and resolve metadata

* Add a test

* Slight improvements to the lsp spec

* Refactor

* Remove commented-out code

* Remove the confusing initial-env thing

* Complete the refactor, and document a little better

* Fix some rebase issues

* Clean up some unused code

* Remove obsolete doc

* Review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants