Identify newline separated segments #15

sungshik · 2024-08-23T15:31:30Z

This PR adds improved support for segment-based "gobbling up" tokens in begin/end patterns. To motivate, suppose we have the following grammar:

lexical String = @category="string" "\"" Char* "\"";

lexical Char
    = [a-z]
    | "\\" [\"]
    ;

For instance, "asdf" and "foo\"bar" should be fully highlighted, but "foo"bar" should not be.

Before this PR, the following TextMate rule is generated (simplified):

{
  "begin": "\""
  "end": "\""
  "name": "string"
  "patterns": [
    { "match": "[a-z]" },
    { "match": "[\\]" },
    { "match": "[\"]" }
  ]
}

That is, roughly, for each symbol s in Char, a nested pattern is included to gobble up content between begin and end that matches individually s. However, gobbling up individual symbols is too fine-grained. For instance, with this rule, "foo\"bar" will not be fully highlighted: only "foo\" will be highlighted (because the \ is not considered in combination with the second ").

Instead of gobbling up individual symbols, the tokenizer should gobble up segments of symbols that belong together. That is what this PR adds.

After this PR, the following TextMate rule is generated (simplified):

{
  "begin": "\""
  "end": "\""
  "name": "string"
  "patterns": [
    { "match": "[a-z]" },
    { "match": "[\\][\"]" }
  ]
}

…ymbols

…tterns

sungshik

A few additional comments

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/Util.rsc

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/analyze/Symbols.rsc

rascal-textmate-core/src/main/rascal/util/MaybeUtil.rsc

…nals

…bility

…ies`

DavyLandman

This looks a lot better indeed 👍 also the textmate regexes look more like you would expect them too.

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/analyze/Symbols.rsc

rascal-textmate-core/src/main/rascal/util/MaybeUtil.rsc

vscode-extension/syntaxes/pico.tmLanguage.json

sungshik

Thank you for the quick review, @DavyLandman, 🙂

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/analyze/Symbols.rsc

rascal-textmate-core/src/main/rascal/util/MaybeUtil.rsc

vscode-extension/syntaxes/pico.tmLanguage.json

sungshik added 7 commits August 23, 2024 16:09

Extend destar with cases for \seq and \alt

dd9d24a

Add module MaybeUtil

8155fb6

Use new module MaybeUtil in existing code

689545c

Add function to compute the newline-separated segments of a list of s…

c464799

…ymbols

Use segments (instead of terminals) in the generation of begin/end pa…

0c3b83b

…tterns

Update generated TextMate grammar for Rascal/Pico

91b7ca9

Merge branch 'main' into identify-newline-separated-segments

5d91dcc

sungshik commented Sep 2, 2024

View reviewed changes

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/Util.rsc Show resolved Hide resolved

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/analyze/Symbols.rsc Show resolved Hide resolved

rascal-textmate-core/src/main/rascal/util/MaybeUtil.rsc Show resolved Hide resolved

sungshik added 3 commits September 2, 2024 11:27

Add a few clarifying comments

7951a7e

Add tests

d805b5a

Add another test and fix a bug to make the test pass

40a4712

sungshik marked this pull request as ready for review September 2, 2024 12:29

Update generated TextMate grammar for Rascal/Pico

49547a4

sungshik marked this pull request as draft September 6, 2024 07:48

sungshik added 5 commits September 6, 2024 11:41

Add utility functions to compute the expected min/max length of termi…

05eb324

…nals

Add sorting for terminals-to-gobble (based on segments)

e869ffd

Move removeBeginEnd to a separate private function to improve reada…

5603cc8

…bility

Fix small issue in the escaping rules for strings in `PicoWithCategor…

dafaa08

…ies`

Update generated TextMate grammar for Rascal/Pico

ed8b6b2

sungshik marked this pull request as ready for review September 6, 2024 10:39

DavyLandman approved these changes Sep 6, 2024

View reviewed changes

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/analyze/Symbols.rsc Show resolved Hide resolved

rascal-textmate-core/src/main/rascal/util/MaybeUtil.rsc Show resolved Hide resolved

vscode-extension/syntaxes/pico.tmLanguage.json Show resolved Hide resolved

Improve documentation

27c26ef

sungshik commented Sep 6, 2024

View reviewed changes

rascal-textmate-core/src/main/rascal/lang/rascal/grammar/analyze/Symbols.rsc Show resolved Hide resolved

rascal-textmate-core/src/main/rascal/util/MaybeUtil.rsc Show resolved Hide resolved

vscode-extension/syntaxes/pico.tmLanguage.json Show resolved Hide resolved

sungshik merged commit 17df6e6 into main Sep 6, 2024
2 checks passed

sungshik deleted the identify-newline-separated-segments branch September 6, 2024 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify newline separated segments #15

Identify newline separated segments #15

sungshik commented Aug 23, 2024 •

edited

Loading

sungshik left a comment

DavyLandman left a comment

sungshik left a comment

Identify newline separated segments #15

Identify newline separated segments #15

Conversation

sungshik commented Aug 23, 2024 • edited Loading

sungshik left a comment

Choose a reason for hiding this comment

DavyLandman left a comment

Choose a reason for hiding this comment

sungshik left a comment

Choose a reason for hiding this comment

sungshik commented Aug 23, 2024 •

edited

Loading