Add field names #106

nicolas-guichard · 2023-11-06T11:23:35Z

Hi,

This is a continuation of /pull/74 given Vladimir Reshetnikov said they would not be pursuing it anymore.

In Searchfox, we leverage tree-sitter to add some information on where symbols are found. For instance if you search for the C++ symbol mozilla::dom::MediaControlKeySource::IsOpened, and you find 9 results in the file dom/mediacontrol/MediaControlKeyManager.cpp, it's super useful to know in which function each result was found (tree-sitter is what enables those // found in comments). We are adding Kotlin support to Searchfox, hence this PR.

For now we only use the name and body fields of class_definition and function_declaration, but since there was a pull request hanging around to add named fields everywhere I just rebased it on main.
If you think it's too much (this adds almost 2M lines to parser.c!), I can reduce the number of named fields. They could be useful to someone else though.

VladimirMakaev · 2023-11-06T11:57:12Z

@nicolas-guichard tests are failing on CI

nicolas-guichard · 2023-11-06T18:04:18Z

Indeed, I found the reason for the test failure:

When _type is defined as:

_type: $ => seq(
  optional($.type_modifiers),
  choice(
    $.parenthesized_type,
    $.nullable_type,
    $._type_reference,
    $.function_type,
    $.not_nullable_type
  )
)

then @Foo T & F is parsed as:

_type                @Foo T & F
  type_modifiers     @Foo
    annotation       @
      user_type       Foo
  not_nullable_type       T & F
    user_type             T
    user_type                 F

But when _type uses a named field around the choice:

_type: $ => seq(
  optional($.type_modifiers),
  field('type', choice(
    $.parenthesized_type,
    $.nullable_type,
    $._type_reference,
    $.function_type,
    $.not_nullable_type
  ))
)

then @Foo T & F is parsed as:

_type                @Foo T & F
  not_nullable_type  @Foo T & F
    type_modifiers   @Foo
      annotation     @
        user_type     Foo
    user_type             T
    user_type                 F

I don't really understand why adding a named field changes the precedence here.

types.txt explicitly tests for the first one, so I assume that's what we want, although the Kotlin grammar is quite surprising here: not_nullable_type is only ever used by type, and type already has an optional type_modifiers, so why does not_nullable_type need one too if it doesn't even have precedence?

I've removed the named field as a workaround for now, but I'm wondering if it would make sense to remove the optional type_modifiers from not_nullable_type to make the grammar less ambiguous.

VladimirMakaev · 2023-11-06T22:19:51Z

@nicolas-guichard This is a bit concerning as we might potentially miss a similar test case. I guess we can't name fields of different types without creating a wrapping node.

I've looked at this issue where field names were introduced. tree-sitter/tree-sitter#271
One thing to note is that tree-sitter parser <file> will return a tree with all field names.

What I would suggest doing is instead of adding all possible fields right now. Start smaller instead with only a couple of fields you need BUT additionally updating unit tests to verify fields are correctly named in every single place.

nicolas-guichard · 2023-11-07T14:58:57Z

I'm mainly interested with elements that can have an inner block, so I've done those (+ type_alias because every other child of _declaration had named fields so I thought it made sense for “symmetry”).

This is split into commits that should each pass the tests for easier reviewing, but, given that parser.c is committed into the repo I think it would be wise to squash everything before merging, or extracting the parser.c update into a separate commit, to avoid needlessly increasing the repo size.

VladimirMakaev · 2023-11-08T11:36:37Z

@nicolas-guichard I'm not concerned about splitting this in multiple commits since it'll get squashed anyway. Thanks for adding the tests! If you can rebase on master I'll re-run tests and merge

This reverts commit 0ef8789.

fwcd · 2024-03-13T02:19:01Z

We really need to get the parser size under control again... this PR bumped it from 22 MB to 70 MB. The Git CLI warns about it too:

remote: warning: File src/parser.c is 70.42 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB        
remote: warning: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./

This reverts commit 0ef8789.

fwcd · 2024-03-13T02:42:05Z

I've reverted this PR for now. Feel free to rebase and open a new PR, but please make sure to keep the parser at a reasonable size, ideally < 30 MB, and also that the CI checks pass (specifically #92).

github-actions bot added the grammar Related to the grammar label Nov 6, 2023

VladimirMakaev mentioned this pull request Nov 6, 2023

[Draft PR] Add field names to the grammar #74

Closed

nicolas-guichard force-pushed the field-names branch from d4ea76a to ed1f640 Compare November 7, 2023 14:50

nicolas-guichard added 12 commits November 8, 2023 13:09

Add named fields to function_declaration

d893811

Add named fields to class_declaration

16a938b

Add named fields to object_declaration

63376b5

Add named fields to property_declaration

6b4188e

Add named fields to type_alias

c69f345

Add named fields to constructors

a1cc737

Add named fields to anonymous_initializer

f7a89e4

Add named fields to companion_object

1937295

Add named fields to lambda_literal

ed868ec

Add named fields to anonymous_fonction

49f84e1

Add named fields to object_literal

d3e4ae0

Add named fields to control statements/expressions

81db7e0

nicolas-guichard force-pushed the field-names branch from ed1f640 to 81db7e0 Compare November 8, 2023 12:15

nicolas-guichard mentioned this pull request Nov 8, 2023

Java and Kotlin support via scip-java mozsearch/mozsearch#667

Merged

VladimirMakaev merged commit 0ef8789 into fwcd:main Nov 9, 2023
2 checks passed

VladimirMakaev mentioned this pull request Nov 9, 2023

A PR to add field names to the grammar #72

Closed

itsaky added a commit to AndroidIDEOfficial/tree-sitter-kotlin that referenced this pull request Nov 25, 2023

Revert "Add field names (fwcd#106)"

8edfc2f

This reverts commit 0ef8789.

fwcd added a commit that referenced this pull request Mar 13, 2024

Revert "Add field names (#106)"

a133581

This reverts commit 0ef8789.

This was referenced Mar 13, 2024

New parser for kotlin leads to performance degradation #107

Closed

Can we get a new release for crates.io? #112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add field names #106

Add field names #106

nicolas-guichard commented Nov 6, 2023

VladimirMakaev commented Nov 6, 2023

nicolas-guichard commented Nov 6, 2023

VladimirMakaev commented Nov 6, 2023

nicolas-guichard commented Nov 7, 2023

VladimirMakaev commented Nov 8, 2023

fwcd commented Mar 13, 2024 •

edited

Loading

fwcd commented Mar 13, 2024

Add field names #106

Add field names #106

Conversation

nicolas-guichard commented Nov 6, 2023

VladimirMakaev commented Nov 6, 2023

nicolas-guichard commented Nov 6, 2023

VladimirMakaev commented Nov 6, 2023

nicolas-guichard commented Nov 7, 2023

VladimirMakaev commented Nov 8, 2023

fwcd commented Mar 13, 2024 • edited Loading

fwcd commented Mar 13, 2024

fwcd commented Mar 13, 2024 •

edited

Loading