Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not decompressing the full stream #57

Closed
Sorseg opened this issue Mar 24, 2024 · 9 comments · Fixed by #59
Closed

Not decompressing the full stream #57

Sorseg opened this issue Mar 24, 2024 · 9 comments · Fixed by #59

Comments

@Sorseg
Copy link
Contributor

Sorseg commented Mar 24, 2024

Inside this zip archive there is a zstandard compressed file, produced by blender. (I had to compress it into a zip archive, because github does not allow uploading random binaries)

$ file l0.blend 
Zstandard compressed data (v0.8+), Dictionary ID: None
$ zstdcat -l l0.blend
Frames  Skips  Compressed  Uncompressed  Ratio  Check  Filename
    18      1   102.38 KB     902.79 KB  8.818   None  l0.blend

zstd crate, when used to decompress, produces a buffer of length 924460, while ruzstd only decompresses 66804 bytes

The code to repro:

let mut decoded = ruzstd::StreamingDecoder::new(File::open(fname).unwrap()).unwrap();
println!("ruzstd {}", decoded.read_to_end(&mut vec![]).unwrap());

let mut decoded = zstd::Decoder::new(File::open(fname).unwrap()).unwrap();
println!("zstd {}", decoded.read_to_end(&mut vec![]).unwrap());

versions used

ruzstd = "=0.6.0"
zstd = "=0.13.0"

Am I using it wrong or is there a bug?

Thank you for maintaining this! ❤️

@KillingSpark
Copy link
Owner

Thanks for reporting this!

I'll try to reproduce this and pinpoint the error

@KillingSpark
Copy link
Owner

KillingSpark commented Mar 25, 2024

Ok so I can definitely reproduce this. Interestingly there are at least two issues here:

  1. Using the zstd_stream binary from ruzstd (wich uses the StreamingDecoder) ruzstd produces a 66k file without reporting any error
  2. Using the zstd binary from ruzstd (which directly uses the FrameDecoder) ruzstd produces a 903k file and gives an error print: called 'Result::unwrap()' on an 'Err' value: ReadFrameHeaderError(MagicNumberReadError(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }))

This suggests that

  1. There is something special with this file that ruzstd does not parse correctly
  2. The streaming decoder does not handle Errors particularly well

What's weird is that the streaming decoder apparently catches on to the error way earlier than the usage of the FrameDecoder.

Anyways just wanted to let you know this isn't your fault. I'll investigate this further and see what's going on here.

Update: Apparently ruzstd does decode the contents correctly but gets confused by something at the very end of the original file, checksumming the output of the zstd binary against the output of the original zstd implementation gives the same result

$ sha512sum *.blend
b0ea893072ffa59a8db0f280e66d2794ec394cae819b4fd368806e6de2a29af95119825ac8c9a6bbb8028e6eeac89830e50658486dcf7c301c62de8330a5f324  correct.blend
b0ea893072ffa59a8db0f280e66d2794ec394cae819b4fd368806e6de2a29af95119825ac8c9a6bbb8028e6eeac89830e50658486dcf7c301c62de8330a5f324  ruzstd.blend

@KillingSpark
Copy link
Owner

KillingSpark commented Mar 25, 2024

Okay so this is (thankfully) way more mundane than I thought.

The first error with the weird error was just a bookkeeping thing not in the libary code but in the binary code related to skippable frames that did not manifest for skippable frames in the middle of a file, but only when they are the last frame. I pushed a fix for that. (But that did not affect your usage of the library)

The second issue (that is actually related to your issue) is this: The StreamingDecoder is coded in a way that it expects the reader to only contain one frame, not a stream of frames. So it decodes only the first frame in the file and then stops. Which apparently results in 66k of data being decoded.

So you need to make a loop that continues to construct streaming decoders until the whole file has been processed.

The reason the StreamingDecoder does not do this is that a reader has no function on it that can tell you if it's going to return any more bytes or not.

@Sorseg
Copy link
Contributor Author

Sorseg commented Apr 1, 2024

Thanks a lot for the investigation! Will you accept a PR that adds this tidbit to the documentation?

@Sorseg
Copy link
Contributor Author

Sorseg commented Apr 1, 2024

I tried the latest master, and get a similar issue as you described, the file decompresses correctly, but I get an error, however a different one. SkipFrame(407710302, 153) Gist for repro:
https://gist.github.com/Sorseg/2d440274aca50487db21b8cec1a89dc5
and the file:
zipped.zip

@KillingSpark
Copy link
Owner

Will you accept a PR that adds this tidbit to the documentation?

Sure :)

SkipFrame(407710302, 153)

This error signals that a skippable frame has been encountered, the first number is the magic number in the frame header (used to identify what kind of content is in this frame, opaque to the zstd decoder) and the second one is the amount of bytes to jump forward.

Maybe this should have been solved a bit cleaner instead of using an error...

@Sorseg
Copy link
Contributor Author

Sorseg commented Apr 1, 2024

The error outcome in this case is not obvious, but I think it is fixable with a little bit of documentation. I don't think this makes a very ergonomic API, ideally I would expect the streaming reader to decode the full file, dealing with zstd peculiarities under the hood, but I guess the implementation requires a bit more thinking

@Sorseg
Copy link
Contributor Author

Sorseg commented Apr 1, 2024

I also understand that someone might need access to these skip frames, but I would expect this to be achievable through a different API maybe. Designing API's are difficult 🙃

@KillingSpark
Copy link
Owner

The error outcome in this case is not obvious, but I think it is fixable with a little bit of documentation. I don't think this makes a very ergonomic API, ideally I would expect the streaming reader to decode the full file, dealing with zstd peculiarities under the hood, but I guess the implementation requires a bit more thinking

The problem here is that the decoder just deals with a reader. This could also be a e.g. a tcp socket where you probably want to stop after each frame because the decoded stream is potentially endless. The StreamDecoder is still a pretty low-level abstraction.

I could maybe include a FileDecoder that only deals with files, which could actually go ahead an gloss over all those details. But then again that is kind of a feature creep and introduces more complexity because this would depend on stdlib whereas this libary is also available for no_std environments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants