[WIP] Upgrade nom to version 4 #5

akubera · 2018-12-20T01:32:10Z

I thought I'd update the root-io dependencies.

nom 4 changes the implementation of IResult enum from nom::IResult<I, O, E> to std::Result<(I, O), E>, so a few closures and matches had to be changed to accept tuples, and remove IResult::Done references.

It compiles but it doesn't pass the core::file_item::tests::open_simple & tests::high_level_io::root_file_methods tests due to reaching the end of file too early.

I don't think I changed any logic, so it must be a new nom behavior upon (I'm guessing) reading the last byte.

Any suggestions?

Also, I updated the versions to reflect an update in the dependencies.

Hope all is well,
Andrew

akubera · 2018-12-20T05:11:51Z

Yeah, the travis sees the same error. It comes from the unwrap located here: https://github.com/akubera/alice-rs/blob/upgrade-dep/nom/root-io/src/core/file.rs#L165

cbourjau

Thanks again! I am not sure, yet, what is going wrong here. Maybe you could insert at inspect before it goes south and dump the raw bytes to stdout or have a look with nom's dbg_dmp! macro?
I hope to find some time to look into this over the holidays. Would you mind rebasing this PR on master and force-pushing it so that it includes the changes from #4?

cbourjau · 2018-12-20T15:46:21Z

malice/src/dataset_rust.rs

@@ -32,7 +32,7 @@ pub struct DatasetIntoIter {
 impl DatasetIntoIter {
    /// Create a new `DatasetIntoIter` from the given `root_io::Tree`. The `Tree` must be a so-called "ESD" tree.
    pub fn new(t: &Tree) -> Result<DatasetIntoIter, Error> {
-        use nom::*;
+        use nom::{be_u8, be_u16, be_u32, be_u64, be_i8, be_i32, be_f32};


I really was lazy here and those import should just be at the top of the file.

cbourjau · 2018-12-20T15:47:12Z

malice/src/dataset_rust.rs

@@ -118,17 +117,17 @@ fn parse_trigger_classes(input: &[u8]) -> nom::IResult<&[u8], Vec<String>> {
 /// This function reconstructs a float from the exponent and mantissa
 /// TODO: Use ByteOrder crate to be cross-platform!
 fn parse_custom_mantissa(input: &[u8], nbits: usize) -> nom::IResult<&[u8], f32> {
-    use nom::*; // cannot use module path in macro
-    pair!(input, be_u8, be_u16).map(|(exp, man)| {
+    use nom::{be_u8, be_u16};


Same here. Could you move it to the top of the file?

cbourjau · 2018-12-20T15:58:19Z

root-io/src/core/tstreamerinfo.rs


-use ::core::*;
+
+use crate::core::{*, Context};


Is there a reason why you do the glob import and the Context explicitly? If reasonably possible, it would be better to avoid the glob imports. Also, is crate a new extension in 1.31 or is it backwards compatible? Would be nice to keep also support older versions, if its so little additional effort.

Yeah there was an error that Context was ambiguous, so there must be a new one exported by nom.

"crate"-scope may be a new feature. I was initially confused importing from "core" things that were not in rust's core, so I thought I'd make it a little more explicit.

It looks like this particular case I screwed up, because it was use ::core and not use core,

yeah, crate scope is new in 1.30, invented to resolve ambiguities

cbourjau · 2018-12-20T15:59:07Z

root-io/src/tree_reader/branch.rs

@@ -3,7 +3,9 @@ use std::path::PathBuf;
 use nom::*;

 use core::parsers::*;
-use core::types::*;
+// use crate::core::types::*;
+use crate::core::types::{Raw, Context};


Feel free to remove the old code :)

A `Needed::Size(sz)` now only returns additional data needed, not total required

akubera · 2018-12-20T17:31:25Z

Stepping through debugger is didn't really tell me much, unfortunately the macros hide much of the relevant information from LLDB.

Removing the eof!() line here prints a bunch of output like this: (maybe this is helpful to you)

Failed to parse TStreamer for TStreamerBase:
00000000	00 03 40 00 00 69 00 04 40 00 00 3a 00 01 00 01 	..@..i..@..:....
00000010	00 00 00 00 03 00 00 00 05 54 4c 65 61 66 27 4c 	.........TLeaf'L
00000020	65 61 66 3a 20 64 65 73 63 72 69 70 74 69 6f 6e 	eaf: description
00000030	20 6f 66 20 61 20 42 72 61 6e 63 68 20 64 61 74 	 of a Branch dat
00000040	61 20 74 79 70 65 00 00 00 00 00 00 00 00 00 00 	a type..........
00000050	00 00 00 00 00 00 00 00 00 00 6d 1e 81 52 00 00 	..........m.�R..
00000060	00 00 00 00 00 00 00 00 00 00 04 42 41 53 45 00 	...........BASE.
00000070	00 00 02                                        	...

The nom4 upgrading guide says it's "stricter about the behaviour with partial data" and suggests using the CompleteByteSlice. I'll look into working on that sometime... later.

cbourjau · 2018-12-20T18:25:51Z

I think you are on to something with the end of file. Also looking at the error message, it indeed originates from the eof!()! Right now, it seems that we are indeed at the end of the slice, when we hit the end of the tstreamerinfo parser - as expected. So using CompleteByteSlice in the function signature as suggested in the upgrading to nom4 docs might solve the issue. By the way, if you want to make use of dbg_dmp! you would do it like this:

    // Old:
    .map(|i| tstreamerinfo(i &context).unwrap().1)
    // With dbg:
     .map(|i| dbg_dmp!(i, apply!(tstreamerinfo, &context)).unwrap().1)

akubera · 2018-12-27T17:37:14Z

It looks like the nom functions like be_u32 only accept &[u8], and I don't see any easy way to convert inputs of CompleteByteSlice to raw slices. There are not enough tutorials on the new nom version to suggest a recipe.

This fails to compile because it expects a &[u8]:

named!(
    file_header<CompleteByteSlice, FileHeader>,
    do_parse!(
        tag!("root") >>
        version: be_i32 >>
        begin:   be_i32 >>
        ...

cbourjau · 2018-12-27T19:11:01Z

Indeed, looking at the nom issue tracker it seems like that feature is not working as smoothly as it should. For the time being, we could just write a custom eof (or rather end of buffer) function / macro and not use the CompleteByteSlice type. Such a function would look something like this:

named!(
    #[doc="Succeeds if the input buffer is empty"],
    end_of_buf,
    verify!(rest, |s: &[u8]| s.is_empty())
);

This end_of_buf function can be used as a drop in replacement for the eof!() macros in the current code. Using this all the tests pass!

On the other hand, this actually might point to a design flaw. I added those eofs at places for debugging purposes when "reverse engineering" the root-format. So they are kind of like run time sanity checks. However, I don't see a reason (right now) why the parsers should not work on streaming data with arbitrarily large input buffers. It might be worth considering removing these all together?

Alternatively, If there are places where we depend on the parsers failing for too large slices we could use map! to error if the output of the parser is a non-emtpy buffer.

cbourjau · 2019-01-23T12:44:53Z

Just bumping this. Are you still looking into this or are you stuck with something?

akubera · 2019-01-24T03:38:43Z

Ah, no, sorry. I thought I'd work on this again over the holidays, but I didn't and I haven't looked at any of my Rust stuff since.

cbourjau · 2019-04-11T11:34:45Z

It seems like nom is getting close to a version 5 release which comes with changes to CompleteByteSlice which according to the developer turned out to be too hard to use. Over all, a new release does not mean that we have to migrate, but might be an interesting endeavour anyways!

Edit: here is a link to the nom 5 roadmap: rust-bakery/nom#903

akubera · 2019-04-11T15:41:47Z

Ah, well I'm glad it wasn't just me. Sorry I haven't looked at this for months, there's been plenty to work on in AliPhysics/AliFemto.

cbourjau · 2019-04-12T08:58:07Z

No problem at all! This is meant to be for fun and learning primarily :)

cbourjau · 2019-06-30T13:41:00Z

Nom 5 is now released it it looks pretty great:
http://unhandledexpression.com/general/2019/06/17/nom-5-is-here.html

Of particular interest to this PR is that CompleteByteSlice is gone! As such that also means that this PR is somewhat obsolete now. I will close it for now but I still think that updating Nom is a great way to get to know this project! I'd be very happy to help out if you or somebody else wants to give it another stab!

akubera · 2019-06-30T19:37:29Z

Yeah, I saw that and thought about closing this. New version looks good with straightforward functions over macros.

Congratulations on your publication, by the way.

akubera force-pushed the upgrade-dep/nom branch from 3dcf6d4 to 4fa8896 Compare December 20, 2018 01:39

cbourjau reviewed Dec 20, 2018

View reviewed changes

akubera added 7 commits December 20, 2018 11:24

root-io: Update nom version 4

9fa4230

root-io: Bump version to 0.1.2

fb3739b

root-io/tests - update format

c5974a2

malice: Update nom to version 4 and find root-io by relative path

a3737fe

Fix nom4 'nom::Needed' compatibility bug

d0ea116

A `Needed::Size(sz)` now only returns additional data needed, not total required

malice: Move use statements to top of file

e90f56f

Remove use of use crate::

d22ae0d

akubera force-pushed the upgrade-dep/nom branch from 5dd86c0 to d22ae0d Compare December 20, 2018 16:57

cbourjau closed this Jun 30, 2019

cbourjau mentioned this pull request Jun 30, 2019

Update binary parser to nom 5 #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Upgrade nom to version 4 #5

[WIP] Upgrade nom to version 4 #5

akubera commented Dec 20, 2018

akubera commented Dec 20, 2018

cbourjau left a comment

cbourjau Dec 20, 2018

cbourjau Dec 20, 2018

cbourjau Dec 20, 2018

akubera Dec 20, 2018

akubera Dec 20, 2018

cbourjau Dec 20, 2018

akubera commented Dec 20, 2018

cbourjau commented Dec 20, 2018

akubera commented Dec 27, 2018

cbourjau commented Dec 27, 2018

cbourjau commented Jan 23, 2019

akubera commented Jan 24, 2019

cbourjau commented Apr 11, 2019 •

edited

Loading

akubera commented Apr 11, 2019

cbourjau commented Apr 12, 2019

cbourjau commented Jun 30, 2019

akubera commented Jun 30, 2019

[WIP] Upgrade nom to version 4 #5

[WIP] Upgrade nom to version 4 #5

Conversation

akubera commented Dec 20, 2018

akubera commented Dec 20, 2018

cbourjau left a comment

Choose a reason for hiding this comment

cbourjau Dec 20, 2018

Choose a reason for hiding this comment

cbourjau Dec 20, 2018

Choose a reason for hiding this comment

cbourjau Dec 20, 2018

Choose a reason for hiding this comment

akubera Dec 20, 2018

Choose a reason for hiding this comment

akubera Dec 20, 2018

Choose a reason for hiding this comment

cbourjau Dec 20, 2018

Choose a reason for hiding this comment

akubera commented Dec 20, 2018

cbourjau commented Dec 20, 2018

akubera commented Dec 27, 2018

cbourjau commented Dec 27, 2018

cbourjau commented Jan 23, 2019

akubera commented Jan 24, 2019

cbourjau commented Apr 11, 2019 • edited Loading

akubera commented Apr 11, 2019

cbourjau commented Apr 12, 2019

cbourjau commented Jun 30, 2019

akubera commented Jun 30, 2019

cbourjau commented Apr 11, 2019 •

edited

Loading