-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental reparsing #5
Comments
Agreed. I also think that it's not hugely important for performance though (the current LS ecosystem has incremental parsing disabled, I think, and edit performance isn't typically bad) |
To measure current performance a little... The largest Julia source file in JuliaLang/julia seems to be
The current time to parse this on my machine and convert it into a SyntaxNode:
With some optimization to tree creation we might approach the current time to parse this without constructing the AST:
60 ms is probably detectable latency and annoying to some people. 20 ms should getting low enough to be close to "instant". But I can imagine people might
Conversely, the hot token-handling loops and data structures in JuliaSyntax.jl haven't been optimized at all, so the parser might become quite a bit faster in the future. Anyway, with all that in mind I think we don't need incremental parsing immediately but we should probably have it in the future and there will likely be use cases where it matters. I think the current design accommodates it and that's the most important thing at this early stage. |
Agree that it isn't needed now, and also that we should keep incremental parsing in mind when defining public API. |
In Roslyn, any nontrivial parser state is recorded in the green nodes. If the state isn't the same during reparsing, the node is "crumbled" to its constituent parts (one level at a time) and reparsed. |
@davidanthoff asked on Zulip about incremental reparsing.
To capture my thoughts on this somewhere more permanent, I think this should work fine but there's a couple of tricky things to work out:
First, how are the changed bytes supplied to the parser system? I haven't looked into LanguageServer yet. But presumably it's "insert this byte here" or "change line 10 to 'such-and-such' string". Those might require a representation of the source which isn't a
String
(orVector{UInt8}
buffer). It might be a rope data structure or something? Should we extend theSourceFile
abstraction to allow differentAbstractString
types? Or perhaps this state should be managed outside the parser completely? Internally, I feel the lexer and parser should always operate onVector{UInt8}
as a concrete efficient datastructure for UTF-8 encoded text, so the subrange of text which is being parsed should probably be copied into one of these for use by the tokenizer.Second, the new source text intersects with the existing parse tree node(s) which cover some range of bytes. There can be several such nodes nested together; which one do we choose? Equivalently, which production (
JuliaSyntax.parse_*
function) do we start reparsing from? Starting deeper in the tree is good because it implies a smaller span, but the parser may have nontrivial state which isn't explicit in the parse tree. For example, space sensitive parsing within[]
or macro calls. Or the special parsing ofin
as=
within iterator specification of a for loop. So we'd need a list of rules to specify which productions we can restart parsing from, and correctly reconstruct the ParseState for those cases. To start with, toplevel/module scope is probably fine and we could throw something together quickly for that, I think.The text was updated successfully, but these errors were encountered: