-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nom 5.0 internal design #878
Comments
Methods was marked as deprecated and reference to nonexistent |
I have not pushed the crate yet. This major version will be a big cleanup, and it's a good occasion to extract things that are not as closely maintained as the rest |
How what about compatibility around a macros and functions moved to different crate? BTW, in my opinion |
My vote is definitely on having only one version. It keeps the amount of duplicate code to a minimum and encourages first building a parser and then feeding it some input later. |
I don't fully understand marking a macro as deprecated on the latest version, without a proper replacement crate existing. I can just roll back to 4.1.1 but as someone that wants to port a decently sized hand written parser to nom, it doesn't instill much confidence in that this library will be maintained with some stability in the long run. Don't want this to sound overtly critical, I'm a happy user of nom 😄 |
After working a while on nom 5 (cf https://github.com/Geal/nom/compare/5.0 ), I have some answers to these questions:
@0X1A if you've followed nom for a few versions, you should have seen that I care a lot about backward compatibility, and major version upgrades should always try to minimize breaks. But this is also a good occasion to clean things up, and there's also a limit to the quantity of code I can maintain. So I'd rather have nom's core be clean and stable, and have external crates with their own release cycles (they should be easier to maintain than nom itself) |
Will the macro syntax be kept for 5.0 or is it gonna be only the new function based combinators? Also, I'd really like to write a (procedural) macro for nom 5 that accepts PEG syntax and converts it to the regular nom combinators under the hood. I feel like that would add a really nice middle ground between parser combinators and parser generators and might speed up writing parsers for simple syntaxes a lot. Especially if you can use previously defined regular nom parsers as a component in the PEG parser. Is this something you'd be interested in? A quick example for parsing mathematical expressions:
|
the macros will stay in nom 5, but they will use the functions under the hood. A PEG generator sounds nice :) |
Great! |
I needed to do some stuff in a parser that would've been difficult or impossible to do with do_parse, and wrote it as a function. It was annoying to have to update the input every time I used a parser. This is untested, but something like this would've been useful: pub fn replace_buf<B, T>(input: &mut B, tuple: (B, T)) -> T {
*input = tuple.0;
tuple.1
} This would be called like: let num = replace_buf(&mut input, le_u8(input)?); This might've been possible to do using pattern matching, but I wasn't able to figure out how. |
personally, I don't think it's too complex to do this: let (input, val1) = parser1(input)?;
let (input, _) = parser2(input)?;
let (input, val3) = parser3(input)?;
Ok((input, MyStruct { val1, val3 })) But I'm planning to add a tuple based version, so you could do it like that: let (input, (val1, _, val2)) = parse((parser1, parser2, parser3))(input)?;
Ok((input, MyStruct { val1, val3 })) |
That doesn't work in loops. |
Use `many0` then
…On Sun, Apr 14, 2019, 19:14 leo60228 ***@***.***> wrote:
That doesn't work in loops.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#878 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHSAGUR0pQBV2Aqk_WtN8zP7SSodQWvks5vg2IQgaJpZM4bheuz>
.
|
My specific code was: let (mut buf, child_count) = le_u16(buf)?;
for _ in 0..child_count {
let (child_buf, child) = take_element(buf, &lookup)?;
buf = child_buf;
binel.insert(child);
} BinEl is a struct that gets returned from the parser and lookup is an argument to the parser. I suppose I could use a parser that returns a Vec and iterate over it, but that seems like it would have a performance penalty. |
That does not look very complex to me, I have a lot of code that looks like
that, especially in nom. I guess you could use fold_many_m_n for this, to
avoid allocations.
The switch to functions also mean we can now integrate parsers more easily
with iterators. I have not explored it in detail yet, but I guess some
interesting patterns could appear.
…On Sun, Apr 14, 2019, 19:20 leo60228 ***@***.***> wrote:
My specific code was:
let (mut buf, child_count) = le_u16(buf)?;
for _ in 0..child_count {
let (child_buf, child) = take_element(buf, &lookup)?;
buf = child_buf;
binel.insert(child);
}
BinEl is a struct that gets returned from the parser and lookup is an
argument to the parser. I suppose I could use a parser that returns a Vec
and iterate over it, but that seems like it would have a performance
penalty.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#878 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHSALO3t7-qubGaapn-eWrWXHKCqrCrks5vg2N2gaJpZM4bheuz>
.
|
It doesn't matter too much, it was just that |
the nom 5 rewrite is now done |
After some thought, I reached a satisfying new design for nom 5.0, that I tried in the nomfun repository.
This design uses functions instead of macros, with the same signature as macros combinators, mostly having
I -> IResult<I,O,E>
functions as arguments, and returning other functions, or applying them directly on some input.As an example, here is how the
pair
combinator would be written:This way we have two versions, one that combines two parsers and makes another one, and another that can take some input.
The macro version can then be rewritten that way:
As we can see currently in the 5.0 branch, most combinators are easy to replace:
The resulting code is functionally equivalent, has less type inference issues and is much faster when built with link time optimization (I have seen results like 20% faster with the new design on some benchmarks).
Another benefit of this system is that it benefits from better import behaviour. Right now, even in edition 2018, macros that are exported are put at the top level (so the module import like macros use is actually a lie). So I cannot make variants of macros 'except by changing their name, to do stuff like
separated_list
andseparated_list_complete
.This was an issue because we expect slightly different behaviour in streaming parsers (where we're not sure we'll get the whole data at once) or in complete parsers (where we're sure we have the whole input). In nom 4, I tried to solve this by introducing the
CompleteByteSlice
andCompleteStr
input types, that would behave differently, so you could use the same macros but have different parsers depending on the input. This proved difficult to use, especially considering that we might want to switch back and forth between behaviours (streaming parsers using TLV will know that they have the complete data inside the TLV, complete parsers might want to use methods that work directly on&[u8]
and&str
. Also, most people did not bother reading the documentation about it and started directly using&[u8]
and&str
as input when they expected the other behaviour, which resulted in quite some time spent explaining it.So with functions, we can actually make specialized versions of combinators. We could imagine having streaming and complete versions of
many0
,tag
, etc. And we would let people use those versions by importing them directly (use nom::multi::streaming::many0
, etc), and they could even use both versions in the same file.The downside is that there's an enormous amount of work for this:
CompleteByteSlice
andCompleteStr
)These functions will also require their own documentation and tests, and all of nom's documentation and examples should probably be adapted to this.
I'm making steady progress on converting the combinators, but there's still a lot to do. (TODO: make a checklist of which combinators were ported over or not)
Questions I have to solve now:
CompleteByteSlice
andCompleteStr
?pair(first, second)(input)
without any issues?do_parse
(this one could probably be written directly with the?
syntax like this: https://github.com/Geal/nomfun/blob/master/benches/http.rs#L93-L102 ),tuple
,permutation
,alt
,switch
?ws
?method
andws
in their own crates? They're not strictly necessary to nom and would make sense as separate librariesThe text was updated successfully, but these errors were encountered: