Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Several changes #9

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

strangeglyph
Copy link

Hi @cbourjau, I found alice-rs when I was looking to write a little command-line TBrowser-like tool in Rust. Sadly it wasn't able to open my root files, and long story short, here's a big chunk of changes:

  • Make sure the projects compile on newer rust versions (in particular, this means not using rustfmt as a library, as it doesn't seem to compile on modern nightlies. We now assume that rustfmt is provided as a tool by the system)
  • Version bumps of several libraries, and get rid of several unnecessarily specific patch-level version pins.

More importantly, in root-io, where I did most my work:

  • Bump nom to v7
  • That means getting rid of the macros and using the function-based parsers instead
  • Include nom-supreme for its postfix combinators
  • Semantic error types
  • Pretty printing of errors! Look at it:
While trying to parse tbranch object:
00000180	00 01 0b 40 00 02 35 00 0a 40 00 01 e6 00 0c 40 	...@..5..@..�..@
        	                     ^~~~~~~~~~~~~~~~~~~~~~~~~~
00000190	00 00 1a 00 01 00 01 00 00 00 00 03 50 00 00 06 	............P...
000001a0	72 75 6e 4e 75 6d 06 72 75 6e 4e 75 6d 40 00 00 	runNum.runNum@..
000001b0	06 00 02 00 00 03 e9 00 00 00 01 00 00 a0 00 00 	......�......�..
000001c0	00 00 00 00 00 00 01 00 00 00 00 00 00 00 0a 00 	................
000001d0	00 00 00 00 00 00 0a 00 00 00 62 00 00 00 00 00 	..........b.....
000001e0	00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 	................
000001f0	00 00 73 00 00 00 00 00 00 00 63 40 00 00 15 00 	..s.......c@....
00000200	03 00 01 00 00 00 00 03 00 00 00 00 00 00 00 00 	................
00000210	00 00 00 00 40 00 00 6e 00 03 00 01 00 00 00 00 	....@..n........
00000220	03 00 00 00 00 00 00 00 01 00 00 00 00 40 00 00 	.............@..
00000230	55 ff ff ff ff 54 4c 65 61 66 45 6c 65 6d 65 6e 	U����TLeafElemen
00000240	74 00 40 00 00 40 00 01 40 00 00 32 00 02 40 00 	t.@..@..@..2..@.
00000250	00 1a 00 01 00 01 00 00 00 00 03 00 00 00 06 72 	...............r
00000260	75 6e 4e 75 6d 06 72 75 6e 4e 75 6d 00 00 00 01 	unNum.runNum....
00000270	00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 	................
00000280	00 01 00 00 00 0d 40 00 00 1d 00 03 00 01 00 00 	......@.........
        	[0x135 bytes omitted]

Expected EOF, but found excess data
00000370	00 00 00 14 62 61 63 6f 6e 68 65 70 3a 3a 54 45 	....baconhep::TE
        	         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00000380	76 65 6e 74 49 6e 66 6f 14 62 61 63 6f 6e 68 65 	ventInfo.baconhe
00000390	70 3a 3a 54 45 76 65 6e 74 49 6e 66 6f 00 cd 50 	p::TEventInfo.�P
000003a0	8e 09 00 07 00 00 00 01 00 00 00 00 00 00 00 0d 	�...............
000003b0	00 00 00 00 00 00 00 00 00 00 00 00 40 00 02 30 	............@..0
        	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is some errors in the test-suite I haven't managed to get rid of yet though, so this PR remains as a WIP for now. Could you point me at the actual documentation of the root file format? I haven't been able to find it online.

@cbourjau
Copy link
Owner

cbourjau commented Mar 2, 2022

Cool! I'll take a closer look in the next few days. Concerning documentation I had the same problem back in the days. I recommend:

@cbourjau
Copy link
Owner

cbourjau commented Mar 13, 2022

@strangeglyph I finally found the time to take a closer look at this PR. Thanks again! Clearly, there is a lot going on in this PR and I think it would be easier if it were broken up in smaller chunks that are known to work. Would it be possible to break it up into the following:

  • Replace rustfmt with prettyplease
  • Remove version pins
  • Update nom to v7 and remove macros (might be a smoother transition to convert macros to functions on the current version an then update)

Once that is done and known to work we can look at the next steps. If we touch the error handling of this project it, I think it would be nice to replace the failure crate completely with the current best practices in error handling (many of the failure features have made it to the standard library in the mean time). I have to admit that I am not completely up to date in this regard, but I believe thiserror fits that bill?

The postfix operators of nom-supreme seem nice, but I am hesitant to pull in a dependency with fairly small adoption in the wider ecosystem just for that. Is nom-supreme also needed for the pretty error messages?

Is it reasonable to break up this PR along those lines?

Edit:
I converted this to a bit of a tracking issue. Your comments about failure/thiserror make perfect sense!

@strangeglyph
Copy link
Author

Breaking up the PR absolutely makes sense, I'll get around to doing that soon. prettyplease looks like a good find, I'll see if it can be easily integrated.

I have tried to replace failure everywhere I could find it in root-io. I think it's reasonable to replace it in the other crates as well. thiserror is what I've used in root-io so far and I believe it's the current standard for library-level errors. Speaking of, might be worth splitting this whole repo into several, one per crate at some point?

nom-supreme is only needed for the postfix combinators. I was looking into its error types as well, but they aren't particularly suited to what we want (in particular, some error types are only implemented over string inputs, not bytes). However, I can only stress how much the postfix notation improves readability. In my opinion, it would be worth keeping it in.

@cbourjau
Copy link
Owner

Are you still interested in pushing this PR forward? I think it would be very nice to see these changes land and I am also warming up to nom-supreme :D

@strangeglyph
Copy link
Author

Ah, yes, sorry. I've been a bit busy with life the last couple of weeks but I should hopefully find more time for this project again soon.

@strangeglyph
Copy link
Author

Got around to investigating these bugs a bit more. Looks like we don't actually parse everything from a tbranch blob? I just verified this happened before the nom upgrades. Is this intended behavior? I had wrapped the call in tbranch_hdr with all_consuming, but if this isn't actually intended to be read in its entirety, I'll remove it.

@cbourjau
Copy link
Owner

I was actually just looking into this, too! I think it would be nice to use eof or all_consuming as much as possible, but as it happens, there are sometimes unexpected paddings or mystery bytes in the ROOT format which make this difficult...

@strangeglyph
Copy link
Author

I think I wrapped all_consuming everywhere where we pass around Raw buffers and the like. I'll remove it from tbranch_hdr for now and make a note on tbranch (and open an issue?) about the case of the mysterious bytes.

@strangeglyph
Copy link
Author

So, this version passes the test suite and as far as I can tell matches the behavior of the current master branch. No clue why the AppVeyor build is failing. The entire output just seems to be a big diff, but I can't see an error message.

@cbourjau
Copy link
Owner

Sorry for the late reply. sounds like a rustfmt issue. I'll take a look!

@cbourjau
Copy link
Owner

cbourjau commented Jul 4, 2022

Alright! things were indeed just a cargo fmt to get everything formatted. That said, I am afraid that this PR is still too large and has a few things where I am not quite sure about their value given their complexity. Would you be interested in spinning out the nom update thiserror error handling (without nom-locate and ouroboros) into a separate PR?

@strangeglyph
Copy link
Author

Glad to hear that it was just cargo fmt - I'll remember that in the future. Let me make sure I understand you correctly, you'd like two PRs:

  • PR 1 contains the nom v7 updates and replaces failure with thiserror
  • PR 2 has the improved error reporting with nom-locate?

@cbourjau
Copy link
Owner

cbourjau commented Jul 5, 2022

Yes, pretty much. I have to say that I am not quite sold on the merits of nom-local and ouroboros, though (given their added complexity and dependencies). This is one of the reasons why I think it would be nice to see them in isolation. Just by the way: are you aware of dbg_dmp in nom? It is what I usually use for debugging. Certainly, it is nicer to automatize printing such an output when something goes wrong, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants