-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental Parsing #114
Comments
@sirinath , thank you for information. I heard of it. It is on my queue to borrow some features to parboiled2. |
I don't think the incremental parsing approach from papa-carlo applies to pb2. I think what we want is #42, i.e. error recovery. |
@sirthias well if you reparse multiple times (like the scala syntax in the ide context) it will be a constant time parsing vs linear. |
Ok. The question is what is faster in a real-world setting. |
Hello, guys. Thanks for mentioning my project here. First of all I have to say that I myself is a big fan of Parboiled. I have been using it for years, and Mathias does really great job by implementing parsing library that in the first place has parser-generator's performance characteristics, and on the other hand provides an API that is not a pain in the head when you are using it in practice in contrast to many other parser libraries. In Papa Carlo my primary goal was reaching incrementality characteristics first. And that's true that on startup it is slow in contrast to well optimized linear time parsers such as Parboiled. That is one of the drawbacks of the Papa Carlo, that, I believe, sooner or later will be fixed... maybe by integrating Parboiled's backend to the Papa Carlo - why not? So we could win by combining good parts of both projects. Talking about Json parser. I have to admit that for Json it doesn't really matter do we have incrementality or not. Because in this case any well optimized linear-time parser is extremely fast. But I think for something lesser synthetic, let's say, Scala syntax @MasseGuillaume argument takes place: contstant-time parser performance may save much time than the linear time parser does in practice maybe even for not so large files. Advocating incremental parsing approach I also would like to add that Papa Carlo is rewriting relative fragments of Parsing Tree rather than rewriting the whole tree. It is very useful for the next compilation phases, such as semantic analysis - since it makes it possible to implement this phase in incremental/caching fashion too. And I think this is very important part, because in fact semantic phase is usually the most slow compiler's phase. |
@Eliah-Lakhin naive question is it possible to initialize the parser with parboiled ? Then switch to incremental mode. |
@MasseGuillaume I think it is possible. Code parsing loop is running in this procedure: https://github.com/Eliah-Lakhin/papa-carlo/blob/cb8722235227bf57873f3051754c537cc1e6bf74/src/main/scala/name.lakhin.eliah.projects/papacarlo/syntax/Session.scala#L65 using PEG rule interpreters on top of lexical tokens. So the thing is that we need to replace these interpreters with the Parboiled-based parsers. For each Papa Carlo rule there are similar rules exist in Parboiled and work the same way. |
Perhaps something worth exploring more. Even merge the efforts of both projects. |
And don't underestimate the importance and potential huge benefits of understanding language semantics differentially. We should be designing programming languages with the primary ambition that it is easy to know the consequences of a change, because that is how we create all non-trivial programs: by changing existing ones. Instead, every language of which I am aware has been designed at least implicitly as if we would be writing every program from scratch. Of course, one reason this isn't the way things are done is that it will be difficult, and because "language engineering" is almost non-existent - programming languages are mostly developed as if they are one-off throwaways, not as modular foundations for exploring the interplay between the many design tradeoffs. The fact that odersky thinks scalac's parser is stable and all finished says enough. In summary, I intend to have the benefits of both these approaches, one way or another! |
May be this can be re opened? |
I currently don't see incremental parsing as a priority for pb2. My take here is still that parsing is something that could be (and should be) made very fast, so that reparsing a "usually sized" input unit is not a problem. What I would be more interested in is a "streamified parsing" approach, where pb2 can be used to parse potentially infinite streams of, say JSON data, in a nice fashion. This is somewhat related to incremental parsing, but less general. |
Oh, an event parser. That's a good idea. |
Current parer does not need any change to accommodate this. Just have to effectively find where to 1) start and stop parsing such that the change in between and 2) then update the tree. |
Following is a similar project which supports incremental parsing: https://github.com/Eliah-Lakhin/papa-carlo. Perhaps some cool features from this project can be incorporated into parboiled2
The text was updated successfully, but these errors were encountered: