Incremental reparsing #5

c42f · 2022-02-07T06:31:06Z

@davidanthoff asked on Zulip about incremental reparsing.

is there support for partial reparses, i.e. some sort of incremental parsing? Basic idea is that user presses one key in the editor, and we don't want to reparse the whole document on every key press, but only a subset, based on the precise range of the doc that was edited

To capture my thoughts on this somewhere more permanent, I think this should work fine but there's a couple of tricky things to work out:

First, how are the changed bytes supplied to the parser system? I haven't looked into LanguageServer yet. But presumably it's "insert this byte here" or "change line 10 to 'such-and-such' string". Those might require a representation of the source which isn't a String (or Vector{UInt8} buffer). It might be a rope data structure or something? Should we extend the SourceFile abstraction to allow different AbstractString types? Or perhaps this state should be managed outside the parser completely? Internally, I feel the lexer and parser should always operate on Vector{UInt8} as a concrete efficient datastructure for UTF-8 encoded text, so the subrange of text which is being parsed should probably be copied into one of these for use by the tokenizer.

Second, the new source text intersects with the existing parse tree node(s) which cover some range of bytes. There can be several such nodes nested together; which one do we choose? Equivalently, which production (JuliaSyntax.parse_* function) do we start reparsing from? Starting deeper in the tree is good because it implies a smaller span, but the parser may have nontrivial state which isn't explicit in the parse tree. For example, space sensitive parsing within [] or macro calls. Or the special parsing of in as = within iterator specification of a for loop. So we'd need a list of rules to specify which productions we can restart parsing from, and correctly reconstruct the ParseState for those cases. To start with, toplevel/module scope is probably fine and we could throw something together quickly for that, I think.

The text was updated successfully, but these errors were encountered:

pfitzseb · 2022-02-07T08:29:11Z

To start with, toplevel/module scope is probably fine and we could throw something together quickly for that, I think.

Agreed. I also think that it's not hugely important for performance though (the current LS ecosystem has incremental parsing disabled, I think, and edit performance isn't typically bad)

c42f · 2022-02-07T09:51:08Z

To measure current performance a little...

The largest Julia source file in JuliaLang/julia seems to be test/core.jl, and we've got

julia> text = read(joinpath(Sys.BINDIR, "..", "share", "julia", "test", "core.jl"), String);

julia> sizeof(text)
198113

The current time to parse this on my machine and convert it into a SyntaxNode:

julia> @time JuliaSyntax.parse(JuliaSyntax.SyntaxNode, text);
  0.060513 seconds (549.17 k allocations: 33.554 MiB, 19.23% gc time)

With some optimization to tree creation we might approach the current time to parse this without constructing the AST:

julia> @time JuliaSyntax.parse(JuliaSyntax.ParseStream(text))
  0.022842 seconds (111.16 k allocations: 10.286 MiB)
ParseStream at position 198114

60 ms is probably detectable latency and annoying to some people. 20 ms should getting low enough to be close to "instant".

But I can imagine people might

Generate much larger source files than the 200kB here (inadvisable? but rather probable)
Have a much slower machine than my Intel i7 laptop. Which is several years old but by no means slow.

Conversely, the hot token-handling loops and data structures in JuliaSyntax.jl haven't been optimized at all, so the parser might become quite a bit faster in the future.

Anyway, with all that in mind I think we don't need incremental parsing immediately but we should probably have it in the future and there will likely be use cases where it matters. I think the current design accommodates it and that's the most important thing at this early stage.

LilithHafner · 2023-06-22T17:05:29Z

Agree that it isn't needed now, and also that we should keep incremental parsing in mind when defining public API.

gafter · 2023-07-08T00:38:15Z

In Roslyn, any nontrivial parser state is recorded in the green nodes. If the state isn't the same during reparsing, the node is "crumbled" to its constituent parts (one level at a time) and reparsed.

c42f added enhancement New feature or request parser labels May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental reparsing #5

Incremental reparsing #5

c42f commented Feb 7, 2022 •

edited

Loading

pfitzseb commented Feb 7, 2022

c42f commented Feb 7, 2022 •

edited

Loading

LilithHafner commented Jun 22, 2023

gafter commented Jul 8, 2023

Incremental reparsing #5

Incremental reparsing #5

Comments

c42f commented Feb 7, 2022 • edited Loading

pfitzseb commented Feb 7, 2022

c42f commented Feb 7, 2022 • edited Loading

LilithHafner commented Jun 22, 2023

gafter commented Jul 8, 2023

c42f commented Feb 7, 2022 •

edited

Loading

c42f commented Feb 7, 2022 •

edited

Loading