Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix grammar when character class is only a range #32

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

cdacamar
Copy link
Contributor

Prior to this change a character class such as [0-1] was parsed as:

(character_class)
  ([)
  (character_class)
  (character_class)
  (character_class)
  (])

So the range was completely dropped. This happened for two reasons:

  1. The grammar had a reduction for class_range which dispatched directly to _class_atom, which did more work than was strictly necessary to match a class character.
  2. I'm unsure if this made a major difference, but the order of the conflict resolution array preferred character_class over class_range.

After fixing the issues above I now get:

(character_class)
  ([)
  (class_range)
    (class_character)
    (-)
    (class_character)
  (])

While still accepting expressions such as [-0-1-] (optional - on each end).

Note: this approach does generate slightly larger tables due to hoisting the - optional check to the character_class production, but it seems like a reasonable tradeoff to get correct behavior.

@amaanq
Copy link
Member

amaanq commented Oct 19, 2024

This is great, thank you! I rebased your conflicts for you, and one thing to note is that this would break when a character class escape was used for the start/end of the range, e.g. [\u1111-\u2222], as well as [---], so I allowed a hyphen and escapes in class_range

Note that for your 2nd reason, that's incorrect, the conflicts array tells tree-sitter that these tokens in an array do conflict with each other, order doesn't matter. Order would only matter if you're sorting these conflicts out in the precedences array, in which you list rules out that conflict with each other in descending order of precedence.

@amaanq amaanq merged commit 05c04f4 into tree-sitter:master Oct 19, 2024
4 checks passed
@cdacamar cdacamar deleted the update-parser branch October 22, 2024 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants