Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Upgrade nom to version 4 #5

Closed
wants to merge 7 commits into from

Conversation

akubera
Copy link
Contributor

@akubera akubera commented Dec 20, 2018

I thought I'd update the root-io dependencies.

nom 4 changes the implementation of IResult enum from nom::IResult<I, O, E> to std::Result<(I, O), E>, so a few closures and matches had to be changed to accept tuples, and remove IResult::Done references.

It compiles but it doesn't pass the core::file_item::tests::open_simple & tests::high_level_io::root_file_methods tests due to reaching the end of file too early.

I don't think I changed any logic, so it must be a new nom behavior upon (I'm guessing) reading the last byte.

Any suggestions?

Also, I updated the versions to reflect an update in the dependencies.

Hope all is well,
Andrew

@akubera
Copy link
Contributor Author

akubera commented Dec 20, 2018

Yeah, the travis sees the same error. It comes from the unwrap located here: https://github.com/akubera/alice-rs/blob/upgrade-dep/nom/root-io/src/core/file.rs#L165

Copy link
Owner

@cbourjau cbourjau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! I am not sure, yet, what is going wrong here. Maybe you could insert at inspect before it goes south and dump the raw bytes to stdout or have a look with nom's dbg_dmp! macro?
I hope to find some time to look into this over the holidays. Would you mind rebasing this PR on master and force-pushing it so that it includes the changes from #4?

@@ -32,7 +32,7 @@ pub struct DatasetIntoIter {
impl DatasetIntoIter {
/// Create a new `DatasetIntoIter` from the given `root_io::Tree`. The `Tree` must be a so-called "ESD" tree.
pub fn new(t: &Tree) -> Result<DatasetIntoIter, Error> {
use nom::*;
use nom::{be_u8, be_u16, be_u32, be_u64, be_i8, be_i32, be_f32};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really was lazy here and those import should just be at the top of the file.

@@ -118,17 +117,17 @@ fn parse_trigger_classes(input: &[u8]) -> nom::IResult<&[u8], Vec<String>> {
/// This function reconstructs a float from the exponent and mantissa
/// TODO: Use ByteOrder crate to be cross-platform!
fn parse_custom_mantissa(input: &[u8], nbits: usize) -> nom::IResult<&[u8], f32> {
use nom::*; // cannot use module path in macro
pair!(input, be_u8, be_u16).map(|(exp, man)| {
use nom::{be_u8, be_u16};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Could you move it to the top of the file?


use ::core::*;

use crate::core::{*, Context};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you do the glob import and the Context explicitly? If reasonably possible, it would be better to avoid the glob imports. Also, is crate a new extension in 1.31 or is it backwards compatible? Would be nice to keep also support older versions, if its so little additional effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there was an error that Context was ambiguous, so there must be a new one exported by nom.

"crate"-scope may be a new feature. I was initially confused importing from "core" things that were not in rust's core, so I thought I'd make it a little more explicit.

It looks like this particular case I screwed up, because it was use ::core and not use core,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, crate scope is new in 1.30, invented to resolve ambiguities

@@ -3,7 +3,9 @@ use std::path::PathBuf;
use nom::*;

use core::parsers::*;
use core::types::*;
// use crate::core::types::*;
use crate::core::types::{Raw, Context};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove the old code :)

@akubera
Copy link
Contributor Author

akubera commented Dec 20, 2018

Stepping through debugger is didn't really tell me much, unfortunately the macros hide much of the relevant information from LLDB.

Removing the eof!() line here prints a bunch of output like this: (maybe this is helpful to you)

Failed to parse TStreamer for TStreamerBase:
00000000	00 03 40 00 00 69 00 04 40 00 00 3a 00 01 00 01 	..@..i..@..:....
00000010	00 00 00 00 03 00 00 00 05 54 4c 65 61 66 27 4c 	.........TLeaf'L
00000020	65 61 66 3a 20 64 65 73 63 72 69 70 74 69 6f 6e 	eaf: description
00000030	20 6f 66 20 61 20 42 72 61 6e 63 68 20 64 61 74 	 of a Branch dat
00000040	61 20 74 79 70 65 00 00 00 00 00 00 00 00 00 00 	a type..........
00000050	00 00 00 00 00 00 00 00 00 00 6d 1e 81 52 00 00 	..........m.�R..
00000060	00 00 00 00 00 00 00 00 00 00 04 42 41 53 45 00 	...........BASE.
00000070	00 00 02                                        	...

The nom4 upgrading guide says it's "stricter about the behaviour with partial data" and suggests using the CompleteByteSlice. I'll look into working on that sometime... later.

@cbourjau
Copy link
Owner

I think you are on to something with the end of file. Also looking at the error message, it indeed originates from the eof!()! Right now, it seems that we are indeed at the end of the slice, when we hit the end of the tstreamerinfo parser - as expected. So using CompleteByteSlice in the function signature as suggested in the upgrading to nom4 docs might solve the issue. By the way, if you want to make use of dbg_dmp! you would do it like this:

    // Old:
    .map(|i| tstreamerinfo(i &context).unwrap().1)
    // With dbg:
     .map(|i| dbg_dmp!(i, apply!(tstreamerinfo, &context)).unwrap().1)

@akubera
Copy link
Contributor Author

akubera commented Dec 27, 2018

It looks like the nom functions like be_u32 only accept &[u8], and I don't see any easy way to convert inputs of CompleteByteSlice to raw slices. There are not enough tutorials on the new nom version to suggest a recipe.

This fails to compile because it expects a &[u8]:

named!(
    file_header<CompleteByteSlice, FileHeader>,
    do_parse!(
        tag!("root") >>
        version: be_i32 >>
        begin:   be_i32 >>
        ...

@cbourjau
Copy link
Owner

Indeed, looking at the nom issue tracker it seems like that feature is not working as smoothly as it should. For the time being, we could just write a custom eof (or rather end of buffer) function / macro and not use the CompleteByteSlice type. Such a function would look something like this:

named!(
    #[doc="Succeeds if the input buffer is empty"],
    end_of_buf,
    verify!(rest, |s: &[u8]| s.is_empty())
);

This end_of_buf function can be used as a drop in replacement for the eof!() macros in the current code. Using this all the tests pass!

On the other hand, this actually might point to a design flaw. I added those eofs at places for debugging purposes when "reverse engineering" the root-format. So they are kind of like run time sanity checks. However, I don't see a reason (right now) why the parsers should not work on streaming data with arbitrarily large input buffers. It might be worth considering removing these all together?

Alternatively, If there are places where we depend on the parsers failing for too large slices we could use map! to error if the output of the parser is a non-emtpy buffer.

@cbourjau
Copy link
Owner

Just bumping this. Are you still looking into this or are you stuck with something?

@akubera
Copy link
Contributor Author

akubera commented Jan 24, 2019

Ah, no, sorry. I thought I'd work on this again over the holidays, but I didn't and I haven't looked at any of my Rust stuff since.

@cbourjau
Copy link
Owner

cbourjau commented Apr 11, 2019

It seems like nom is getting close to a version 5 release which comes with changes to CompleteByteSlice which according to the developer turned out to be too hard to use. Over all, a new release does not mean that we have to migrate, but might be an interesting endeavour anyways!

Edit: here is a link to the nom 5 roadmap: rust-bakery/nom#903

@akubera
Copy link
Contributor Author

akubera commented Apr 11, 2019

Ah, well I'm glad it wasn't just me. Sorry I haven't looked at this for months, there's been plenty to work on in AliPhysics/AliFemto.

@cbourjau
Copy link
Owner

No problem at all! This is meant to be for fun and learning primarily :)

@cbourjau
Copy link
Owner

Nom 5 is now released it it looks pretty great:
http://unhandledexpression.com/general/2019/06/17/nom-5-is-here.html

Of particular interest to this PR is that CompleteByteSlice is gone! As such that also means that this PR is somewhat obsolete now. I will close it for now but I still think that updating Nom is a great way to get to know this project! I'd be very happy to help out if you or somebody else wants to give it another stab!

@akubera
Copy link
Contributor Author

akubera commented Jun 30, 2019

Yeah, I saw that and thought about closing this. New version looks good with straightforward functions over macros.

Congratulations on your publication, by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants