-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add modern tree-sitter support behind an experimental flag #472
Add modern tree-sitter support behind an experimental flag #472
Conversation
…folds Implementing folds for new tree-sitter
…nto fix-ruby-tree-sitter-scopes
Get indents working better. Move things around a bit.
`LanguageLayer` is re-introduced. All tree parsing and querying will eventually need to live inside an instance of `LanguageLayer`. Right now, just the syntax highlighting is doing so.
A quick explanation on this - the old code was matching if the full scope was ok. This is fine when we want to test a single grammar, but for example, TextMate grammars always added `.ruby` on the last part of the grammar. That was quite bad for TreeSitter, that basically didn't do the same thing. So now it matches a fine crafted Regexp that basically checks if the full scope is match from the beginning, OR if part of the scope (up to just before the `.`) matches. So, for example, for `constant.other.ruby` it'll match `constant`, `constant.other`, and `constant.other.ruby` but it'll NOT MATCH `constant.oth`.
…nto fix-ruby-tree-sitter-scopes
…uby-tree-sitter-scopes-with-right-tokens
Preliminary. Folds and indents are broken at the moment.
…and add a `tags.scm`.
This is the latest `master` — it works fine now that we build our own version of `web-tree-sitter`. It's twice as big as the last stable release. I don't know why, but I've been using it for about a week without any trouble.
…the same ones we applied to ERB earlier.
…when they haven't been re-parsed yet. Say we've got an injection layer on rows 100-150. If a user inserts a newline on line 1, we know for certain that that doesn't affect the parsed tree of that injection layer, except that it needs to be edited to incorporate that change so that its node positions are correct. So we wait to re-parse it because we don't need to, even though we know the tree is technically dirty. We only need to re-parse it when an edit occurs within its layer extent. But each node has a `text` property that's actually just a getter. When `text` is read, the node looks up its text using its `startIndex` and `endIndex`. If we called `parse` with a string originally, then it's going to do its lookups against a string that we know to be stale. Most `#match?` predicates would be doomed to fail, and if the display layer needed to highlight any part of that layer, it'd miss a lot of stuff. Luckily, web-tree-sitter envisions that your buffer might be represented by a strange data structure, and lets you specify a callback that is used when the parser needs to get the contents of various buffer ranges. We can have that callback do its lookups against a string copy of the buffer that we know to be fresh, because we're updating it on every buffer change. This is a big deal, because the alternative is having to re-parse every injection layer on every transaction. Now we can safely run captures against dirty trees when we know that the changes cannot have affected the structure of the tree.
…returns its expected object signature.
…where such a parse could fail because the source text changes in the middle of the parse. The change I introduced yesterday made it possible for an async parse job to get confused because its source text has changed in between async jobs. To prevent this, the buffer text must remain constant during the parse, but should use the most current buffer text _after_ the parse.
…to accept multiple node types.
…for consumption by other packages. This is nearly the simplest possible interface around the `bookmarks` package, except that it will fire events when the number of bookmarks has changed.
…when there's no indents query to help us out. This seems to be the default behavior for plaintext and for other language modes. I've got a Tree-sitter Markdown grammar locally and I noticed it wasn't preserving indentation when I was two levels deep into a bulleted list and hit `Enter` — I had to re-indent each time. (The Markdown grammar doesn't include an indents query because there's practically no way to predict indentation levels in Markdown.)
…to consider content ranges instead of extents. If you want to know which layers are operative at a given point, it makes no sense to check a layer's extent. A `LanguageLayer` could extend over the entire buffer and still only apply highlighting to one very small range in the buffer. Thus it makes no sense for `controllingLayerAtPoint` not to check a layer's content ranges, nor does it make sense for the `injectionLayersAtPoint` method that it relies on. If I felt like there was any use in knowing that a language layer extended over a range that included the given point but wasn't operative at that point, I'd be hesitant to change these methods. But I don't. This bug was discovered when I realized that the wrong indents query was being consulted in the following HTML… <style> #foo { …when I moved the cursor to the end of the `<style>` element and hit `Enter`.
Forgot to sort deeper layers first.
…as overly broad. Turns out there are some scenarios where you'd want to know about a layer whose extent may include a given point, even if its content ranges _don't_ include that point. I'd have realized this if I'd done something as simple as run the test suite. Silly me. But `controllingLayerAtPoint` will still enforce the content range constraint because we use it to pick a winner for indent queries, fold queries, and so on.
So - Ruby test is failing, and it's expected to do - I wrote that with the scopes that the queries from NeoVIM where returning at the time, and we changed that. Autocomplete-CSS is also failing, and this one I have no idea if the failure is correct or not... |
…when straddling an injection boundary. When we do this one row at a time, the controlling layer for an indent query is the comparison (previous) row, because that's where the query starts. The batch version should be no different. The new spec describes the exact scenario that revealed this bug.
…not to expect empty tokens. I made a change in 29cfcad that neglected to apply any scopes for a range that was zero characters long. My instincts tell me that this is a safe change to make, but it does affect any tests that used `tokensForScreenRow` and expected it to report information about zero-length tokens. So the expected results needed to be updated.
This got put into a conditional, but I neglected to remove the original line.
File that one under “how did this ever work?” — a spec is failing because it's expecting a scope name that is present in the TextMate-style CSS grammar but missing from the tree-sitter CSS grammar. This drives me nuts. They shipped tree-sitter back in the day without even checking for stuff like this. |
This is just a band-aid. There are deeper issues here, like lack of compatibility with tree-sitter grammars and lack of SCSS (new syntax) support.
Yeah, my inclination right now is to skip those specs. I haven't written any for the other grammars because I expect there to be a lot of flux as we adjust |
Temporarily skipping the tree-sitter specs; we can formalize them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's GOOOOO!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets do this!!
Depends on #454 (I think).
Description of the Change
This PR brings the
master
branch up-to-date with mytree-sitter-hell
branch (which has been the de facto branch for iterating on the modern-tree-sitter language mode).This will add an experimental feature that is explicitly marked as experimental, but which should not affect anyone who doesn’t opt into it. So here’s how I’d like this to work, if folks are OK with it:
master
.core.useExperimentalModernTreeSitter
setting.core.useExperimentalModernTreeSitter
is set totrue
and make sure it can do simple things, like editing and saving files, without breaking the editor.I know this seems like a low bar to set, but don’t get me wrong: in practice, the new mode is working great for me. My aim is just to define a realistic standard for getting an experimental feature into the editor, rather than wait until everything is at 100% feature parity.
How does this work?
The
core.useExperimentalModernTreeSitter
setting will be the opt-in setting for using experimentalweb-tree-sitter
grammars.If
core.useExperimentalModernTreeSitter
is set tofalse
:If
core.useExperimentalModernTreeSitter
is set totrue
:grammar-selector
is unchecked; otherwise the user will choose between legacy-TS and modern-TS grammars when all three exist, and will see TM grammars only when neither kind of TS grammar exists for a languageNo, I mean, like, the whole system. How does it work now?
Here's most of the documentation I've written as I've gone. I'm open to any suggestions or critiques about the design choices themselves, though perhaps they're a bit big of a subject for this PR and could take place in Discord.
Which grammars have been updated?
This will add new
modern-tree-sitter
versions of every grammar that had a legacytree-sitter
equivalent:language-c
:source.c
andsource.cpp
language-css
:source.css
language-go
:source.go
language-html
:text.html.basic
,text.html.erb
,text.html.ejs
language-hyperlink
: A hyperlink-highlighting parser for injecting into strings and comments (not yet injected into all grammars)language-java
:source.java
language-javascript
:source.javascript
, plus an injection for highlighting regular expressions and one for highlighting JSDoc (JSX has built-in support)language-json
:source.json
language-python
:source.python
language-ruby
:source.ruby
, plus an injection for highlighting regular expressionslanguage-rust-bundled
:source.rust
language-shellscript
:source.shell
language-todo
: A parser for highlighting TODOs and FIXMEs and whatnot for injecting into code comments (not yet injected into all grammars)language-typescript
:source.ts
andsource.tsx
(two different tree-sitter parsers)There are a few other tree-sitter parsers that are good enough to be added to one of our built-in packages — Markdown, PHP, YAML — but those don’t have legacy tree-sitter grammars, so there’s no urgency to replace them.
Alternate Designs
Oh, that ship has sailed.
Possible Drawbacks
The infrastructure for modern tree-sitter grammars will still be loaded for everyone, including the
web-tree-sitter
package and all WASM files. We can mitigate this somewhat by ignoring modern tree-sitter grammars during window load unless the user has opted in, but at a cost: changingcore.useExperimentalModernTreeSitter
to opt into the experimental mode would also require that the user reload the window. That’s why I chose not to do it here, but I’m open to the idea.This mode will require some documentation. It’s mostly written on my end, but amid the overall docs rewrite, it’s not clear to me where that text should live.
Verification Process
Here are some things to try after checking out this PR branch:
Then, after changing
core.useExperimentalModernTreeSitter
totrue
:Here are some things that may be different, but which I think will make you happy:
TODO
s in comments, and URLs in comments and strings, the way that TM-style grammars can do, but which legacy TS grammars never could. The integration requires work from a grammar author and isn’t yet as simple as I’d like, but it’s present in the new JavaScript grammar.Here are some things that may be different, and may take you by surprise:
Your syntax highlighting will be different. Some of that is accidental, but some of it is the result of intentional renaming of some scopes. The bundled syntax themes will require some tweaks; I plan to have that discussion once I get some feedback.
Indentation rules might behave a bit differently. In general, if it works better, great; if it’s not as good as what you were used to, please do file an issue once this lands on
master
.Code folding will behave differently. In general, we are now preferring to exclude a closing delimiter from the fold, so that most folds will look like…
…but otherwise should be very similar to code folding in a legacy grammar. If you notice otherwise, please do file an issue once this lands on
master
.I’ve noticed some occasional odd behaviors around highlighting of injected layers, but it’s been very hard to reproduce consistently. If you do notice any of these, please do file an issue once this lands on(I think these are all fixed, but absolutely let me know if I'm wrong.)master
. If you can reproduce it consistently, I will send you gifts.Release Notes
Added an experimental new tree-sitter mode that will eventually replace Pulsar’s legacy support for tree-sitter grammars. You can opt into this mode via the Core → Use Experimental Modern Tree-Sitter setting.