Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field names #106

Merged
merged 12 commits into from
Nov 9, 2023
Merged

Add field names #106

merged 12 commits into from
Nov 9, 2023

Conversation

nicolas-guichard
Copy link
Contributor

Hi,

This is a continuation of /pull/74 given Vladimir Reshetnikov said they would not be pursuing it anymore.

In Searchfox, we leverage tree-sitter to add some information on where symbols are found. For instance if you search for the C++ symbol mozilla::dom::MediaControlKeySource::IsOpened, and you find 9 results in the file dom/mediacontrol/MediaControlKeyManager.cpp, it's super useful to know in which function each result was found (tree-sitter is what enables those // found in comments). We are adding Kotlin support to Searchfox, hence this PR.

For now we only use the name and body fields of class_definition and function_declaration, but since there was a pull request hanging around to add named fields everywhere I just rebased it on main.
If you think it's too much (this adds almost 2M lines to parser.c!), I can reduce the number of named fields. They could be useful to someone else though.

@github-actions github-actions bot added the grammar Related to the grammar label Nov 6, 2023
@VladimirMakaev
Copy link
Collaborator

@nicolas-guichard tests are failing on CI

@nicolas-guichard
Copy link
Contributor Author

Indeed, I found the reason for the test failure:

When _type is defined as:

_type: $ => seq(
  optional($.type_modifiers),
  choice(
    $.parenthesized_type,
    $.nullable_type,
    $._type_reference,
    $.function_type,
    $.not_nullable_type
  )
)

then @Foo T & F is parsed as:

_type                @Foo T & F
  type_modifiers     @Foo
    annotation       @
      user_type       Foo
  not_nullable_type       T & F
    user_type             T
    user_type                 F

But when _type uses a named field around the choice:

_type: $ => seq(
  optional($.type_modifiers),
  field('type', choice(
    $.parenthesized_type,
    $.nullable_type,
    $._type_reference,
    $.function_type,
    $.not_nullable_type
  ))
)

then @Foo T & F is parsed as:

_type                @Foo T & F
  not_nullable_type  @Foo T & F
    type_modifiers   @Foo
      annotation     @
        user_type     Foo
    user_type             T
    user_type                 F

I don't really understand why adding a named field changes the precedence here.

types.txt explicitly tests for the first one, so I assume that's what we want, although the Kotlin grammar is quite surprising here: not_nullable_type is only ever used by type, and type already has an optional type_modifiers, so why does not_nullable_type need one too if it doesn't even have precedence?

I've removed the named field as a workaround for now, but I'm wondering if it would make sense to remove the optional type_modifiers from not_nullable_type to make the grammar less ambiguous.

@VladimirMakaev
Copy link
Collaborator

@nicolas-guichard This is a bit concerning as we might potentially miss a similar test case. I guess we can't name fields of different types without creating a wrapping node.

I've looked at this issue where field names were introduced. tree-sitter/tree-sitter#271
One thing to note is that tree-sitter parser <file> will return a tree with all field names.

What I would suggest doing is instead of adding all possible fields right now. Start smaller instead with only a couple of fields you need BUT additionally updating unit tests to verify fields are correctly named in every single place.

@nicolas-guichard
Copy link
Contributor Author

I'm mainly interested with elements that can have an inner block, so I've done those (+ type_alias because every other child of _declaration had named fields so I thought it made sense for “symmetry”).

This is split into commits that should each pass the tests for easier reviewing, but, given that parser.c is committed into the repo I think it would be wise to squash everything before merging, or extracting the parser.c update into a separate commit, to avoid needlessly increasing the repo size.

@VladimirMakaev
Copy link
Collaborator

@nicolas-guichard I'm not concerned about splitting this in multiple commits since it'll get squashed anyway. Thanks for adding the tests! If you can rebase on master I'll re-run tests and merge

@VladimirMakaev VladimirMakaev merged commit 0ef8789 into fwcd:main Nov 9, 2023
2 checks passed
itsaky added a commit to AndroidIDEOfficial/tree-sitter-kotlin that referenced this pull request Nov 25, 2023
@fwcd
Copy link
Owner

fwcd commented Mar 13, 2024

We really need to get the parser size under control again... this PR bumped it from 22 MB to 70 MB. The Git CLI warns about it too:

remote: warning: File src/parser.c is 70.42 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB        
remote: warning: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./        

fwcd added a commit that referenced this pull request Mar 13, 2024
@fwcd
Copy link
Owner

fwcd commented Mar 13, 2024

I've reverted this PR for now. Feel free to rebase and open a new PR, but please make sure to keep the parser at a reasonable size, ideally < 30 MB, and also that the CI checks pass (specifically #92).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grammar Related to the grammar
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants