Incremental Parsing #114

sirinath · 2014-12-01T09:33:35Z

Following is a similar project which supports incremental parsing: https://github.com/Eliah-Lakhin/papa-carlo. Perhaps some cool features from this project can be incorporated into parboiled2

alexander-myltsev · 2014-12-01T10:16:17Z

@sirinath , thank you for information. I heard of it. It is on my queue to borrow some features to parboiled2.

sirthias · 2014-12-01T10:50:13Z

I don't think the incremental parsing approach from papa-carlo applies to pb2.
Also, I'm not sure it would help in any way to speed things up.
The README from papa-carlo says that it parses a 600 lines JSON file in ca. 250ms.
This is something like 2400 JSON lines per second and a few orders of magnitude slower than pb2, which does about 2 million JSON lines per second with the jsonBenchmark example.

I think what we want is #42, i.e. error recovery.
But not necessarily in an incremental fashion.

MasseGuillaume · 2014-12-02T04:51:24Z

@sirthias well if you reparse multiple times (like the scala syntax in the ide context) it will be a constant time parsing vs linear.

sirthias · 2014-12-02T08:15:20Z

Ok. The question is what is faster in a real-world setting.
If papa-carlo is 1000 times slower the benefit of constant vs. linear runtime only manifests itself for extremely large files.

Eliah-Lakhin · 2014-12-08T15:32:44Z

Hello, guys.

Thanks for mentioning my project here.

First of all I have to say that I myself is a big fan of Parboiled. I have been using it for years, and Mathias does really great job by implementing parsing library that in the first place has parser-generator's performance characteristics, and on the other hand provides an API that is not a pain in the head when you are using it in practice in contrast to many other parser libraries.

In Papa Carlo my primary goal was reaching incrementality characteristics first. And that's true that on startup it is slow in contrast to well optimized linear time parsers such as Parboiled. That is one of the drawbacks of the Papa Carlo, that, I believe, sooner or later will be fixed... maybe by integrating Parboiled's backend to the Papa Carlo - why not? So we could win by combining good parts of both projects.

Talking about Json parser. I have to admit that for Json it doesn't really matter do we have incrementality or not. Because in this case any well optimized linear-time parser is extremely fast. But I think for something lesser synthetic, let's say, Scala syntax @MasseGuillaume argument takes place: contstant-time parser performance may save much time than the linear time parser does in practice maybe even for not so large files.

Advocating incremental parsing approach I also would like to add that Papa Carlo is rewriting relative fragments of Parsing Tree rather than rewriting the whole tree. It is very useful for the next compilation phases, such as semantic analysis - since it makes it possible to implement this phase in incremental/caching fashion too. And I think this is very important part, because in fact semantic phase is usually the most slow compiler's phase.

MasseGuillaume · 2014-12-08T16:47:11Z

@Eliah-Lakhin naive question is it possible to initialize the parser with parboiled ? Then switch to incremental mode.

Eliah-Lakhin · 2014-12-08T17:01:11Z

@MasseGuillaume I think it is possible. Code parsing loop is running in this procedure: https://github.com/Eliah-Lakhin/papa-carlo/blob/cb8722235227bf57873f3051754c537cc1e6bf74/src/main/scala/name.lakhin.eliah.projects/papacarlo/syntax/Session.scala#L65 using PEG rule interpreters on top of lexical tokens. So the thing is that we need to replace these interpreters with the Parboiled-based parsers. For each Papa Carlo rule there are similar rules exist in Parboiled and work the same way.

sirinath · 2014-12-08T18:01:26Z

Perhaps something worth exploring more. Even merge the efforts of both projects.

paulp · 2014-12-09T21:07:28Z

Advocating incremental parsing approach I also would like to add that Papa Carlo is rewriting relative fragments of Parsing Tree rather than rewriting the whole tree. It is very useful for the next compilation phases, such as semantic analysis - since it makes it possible to implement this phase in incremental/caching fashion too.

And don't underestimate the importance and potential huge benefits of understanding language semantics differentially. We should be designing programming languages with the primary ambition that it is easy to know the consequences of a change, because that is how we create all non-trivial programs: by changing existing ones. Instead, every language of which I am aware has been designed at least implicitly as if we would be writing every program from scratch.

Of course, one reason this isn't the way things are done is that it will be difficult, and because "language engineering" is almost non-existent - programming languages are mostly developed as if they are one-off throwaways, not as modular foundations for exploring the interplay between the many design tradeoffs. The fact that odersky thinks scalac's parser is stable and all finished says enough.

In summary, I intend to have the benefits of both these approaches, one way or another!

sirinath · 2014-12-10T03:39:35Z

May be this can be re opened?

sirthias · 2014-12-10T12:29:21Z

I currently don't see incremental parsing as a priority for pb2.
There are too many other focus points for us at this time and I don't have a clear picture of what exactly or how exactly incremental parsing could work in pb2 in a way that wouldn't deteriorate general parsing performance or introduce too much complexity.

My take here is still that parsing is something that could be (and should be) made very fast, so that reparsing a "usually sized" input unit is not a problem.

What I would be more interested in is a "streamified parsing" approach, where pb2 can be used to parse potentially infinite streams of, say JSON data, in a nice fashion. This is somewhat related to incremental parsing, but less general.

paulp · 2014-12-10T22:00:55Z

Oh, an event parser. That's a good idea.

sirinath · 2014-12-11T05:46:55Z

Current parer does not need any change to accommodate this. Just have to effectively find where to 1) start and stop parsing such that the change in between and 2) then update the tree.

sirthias closed this as completed Dec 1, 2014

Eliah-Lakhin mentioned this issue Dec 13, 2014

idea: use parboiled as backend Eliah-Lakhin/papa-carlo#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Parsing #114

Incremental Parsing #114

sirinath commented Dec 1, 2014

alexander-myltsev commented Dec 1, 2014

sirthias commented Dec 1, 2014

MasseGuillaume commented Dec 2, 2014

sirthias commented Dec 2, 2014

Eliah-Lakhin commented Dec 8, 2014

MasseGuillaume commented Dec 8, 2014

Eliah-Lakhin commented Dec 8, 2014

sirinath commented Dec 8, 2014

paulp commented Dec 9, 2014

sirinath commented Dec 10, 2014

sirthias commented Dec 10, 2014

paulp commented Dec 10, 2014

sirinath commented Dec 11, 2014

Incremental Parsing #114

Incremental Parsing #114

Comments

sirinath commented Dec 1, 2014

alexander-myltsev commented Dec 1, 2014

sirthias commented Dec 1, 2014

MasseGuillaume commented Dec 2, 2014

sirthias commented Dec 2, 2014

Eliah-Lakhin commented Dec 8, 2014

MasseGuillaume commented Dec 8, 2014

Eliah-Lakhin commented Dec 8, 2014

sirinath commented Dec 8, 2014

paulp commented Dec 9, 2014

sirinath commented Dec 10, 2014

sirthias commented Dec 10, 2014

paulp commented Dec 10, 2014

sirinath commented Dec 11, 2014