Not decompressing the full stream #57

Sorseg · 2024-03-24T23:15:25Z

Inside this zip archive there is a zstandard compressed file, produced by blender. (I had to compress it into a zip archive, because github does not allow uploading random binaries)

$ file l0.blend 
Zstandard compressed data (v0.8+), Dictionary ID: None
$ zstdcat -l l0.blend
Frames  Skips  Compressed  Uncompressed  Ratio  Check  Filename
    18      1   102.38 KB     902.79 KB  8.818   None  l0.blend

zstd crate, when used to decompress, produces a buffer of length 924460, while ruzstd only decompresses 66804 bytes

The code to repro:

let mut decoded = ruzstd::StreamingDecoder::new(File::open(fname).unwrap()).unwrap();
println!("ruzstd {}", decoded.read_to_end(&mut vec![]).unwrap());

let mut decoded = zstd::Decoder::new(File::open(fname).unwrap()).unwrap();
println!("zstd {}", decoded.read_to_end(&mut vec![]).unwrap());

versions used

ruzstd = "=0.6.0"
zstd = "=0.13.0"

Am I using it wrong or is there a bug?

Thank you for maintaining this! ❤️

The text was updated successfully, but these errors were encountered:

KillingSpark · 2024-03-25T07:25:43Z

Thanks for reporting this!

I'll try to reproduce this and pinpoint the error

KillingSpark · 2024-03-25T17:07:29Z

Ok so I can definitely reproduce this. Interestingly there are at least two issues here:

Using the zstd_stream binary from ruzstd (wich uses the StreamingDecoder) ruzstd produces a 66k file without reporting any error
Using the zstd binary from ruzstd (which directly uses the FrameDecoder) ruzstd produces a 903k file and gives an error print: called 'Result::unwrap()' on an 'Err' value: ReadFrameHeaderError(MagicNumberReadError(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }))

This suggests that

There is something special with this file that ruzstd does not parse correctly
The streaming decoder does not handle Errors particularly well

What's weird is that the streaming decoder apparently catches on to the error way earlier than the usage of the FrameDecoder.

Anyways just wanted to let you know this isn't your fault. I'll investigate this further and see what's going on here.

Update: Apparently ruzstd does decode the contents correctly but gets confused by something at the very end of the original file, checksumming the output of the zstd binary against the output of the original zstd implementation gives the same result

$ sha512sum *.blend
b0ea893072ffa59a8db0f280e66d2794ec394cae819b4fd368806e6de2a29af95119825ac8c9a6bbb8028e6eeac89830e50658486dcf7c301c62de8330a5f324  correct.blend
b0ea893072ffa59a8db0f280e66d2794ec394cae819b4fd368806e6de2a29af95119825ac8c9a6bbb8028e6eeac89830e50658486dcf7c301c62de8330a5f324  ruzstd.blend

KillingSpark · 2024-03-25T17:51:35Z

Okay so this is (thankfully) way more mundane than I thought.

The first error with the weird error was just a bookkeeping thing not in the libary code but in the binary code related to skippable frames that did not manifest for skippable frames in the middle of a file, but only when they are the last frame. I pushed a fix for that. (But that did not affect your usage of the library)

The second issue (that is actually related to your issue) is this: The StreamingDecoder is coded in a way that it expects the reader to only contain one frame, not a stream of frames. So it decodes only the first frame in the file and then stops. Which apparently results in 66k of data being decoded.

So you need to make a loop that continues to construct streaming decoders until the whole file has been processed.

The reason the StreamingDecoder does not do this is that a reader has no function on it that can tell you if it's going to return any more bytes or not.

Sorseg · 2024-04-01T10:42:56Z

Thanks a lot for the investigation! Will you accept a PR that adds this tidbit to the documentation?

Sorseg · 2024-04-01T11:26:35Z

I tried the latest master, and get a similar issue as you described, the file decompresses correctly, but I get an error, however a different one. SkipFrame(407710302, 153) Gist for repro:
https://gist.github.com/Sorseg/2d440274aca50487db21b8cec1a89dc5
and the file:
zipped.zip

KillingSpark · 2024-04-01T12:13:24Z

Will you accept a PR that adds this tidbit to the documentation?

Sure :)

SkipFrame(407710302, 153)

This error signals that a skippable frame has been encountered, the first number is the magic number in the frame header (used to identify what kind of content is in this frame, opaque to the zstd decoder) and the second one is the amount of bytes to jump forward.

Maybe this should have been solved a bit cleaner instead of using an error...

Sorseg · 2024-04-01T12:43:56Z

The error outcome in this case is not obvious, but I think it is fixable with a little bit of documentation. I don't think this makes a very ergonomic API, ideally I would expect the streaming reader to decode the full file, dealing with zstd peculiarities under the hood, but I guess the implementation requires a bit more thinking

Sorseg · 2024-04-01T12:47:09Z

I also understand that someone might need access to these skip frames, but I would expect this to be achievable through a different API maybe. Designing API's are difficult 🙃

KillingSpark · 2024-04-01T12:54:06Z

The error outcome in this case is not obvious, but I think it is fixable with a little bit of documentation. I don't think this makes a very ergonomic API, ideally I would expect the streaming reader to decode the full file, dealing with zstd peculiarities under the hood, but I guess the implementation requires a bit more thinking

The problem here is that the decoder just deals with a reader. This could also be a e.g. a tcp socket where you probably want to stop after each frame because the decoded stream is potentially endless. The StreamDecoder is still a pretty low-level abstraction.

I could maybe include a FileDecoder that only deals with files, which could actually go ahead an gloss over all those details. But then again that is kind of a feature creep and introduces more complexity because this would depend on stdlib whereas this libary is also available for no_std environments

Sorseg mentioned this issue Apr 1, 2024

Document how to deal with multi-segment streams #59

Merged

KillingSpark closed this as completed in #59 Apr 1, 2024

workingjubilee mentioned this issue Sep 13, 2024

Should StreamingDecoder::read_exact prefer Read::read_exact's contract over its own? #69

Closed

jvff mentioned this issue Oct 24, 2024

Bytecodes compressed into multiple frames can't be decompressed in the web wallet linera-io/linera-protocol#2710

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not decompressing the full stream #57

Not decompressing the full stream #57

Sorseg commented Mar 24, 2024

KillingSpark commented Mar 25, 2024

KillingSpark commented Mar 25, 2024 •

edited

Loading

KillingSpark commented Mar 25, 2024 •

edited

Loading

Sorseg commented Apr 1, 2024

Sorseg commented Apr 1, 2024 •

edited

Loading

KillingSpark commented Apr 1, 2024

Sorseg commented Apr 1, 2024

Sorseg commented Apr 1, 2024

KillingSpark commented Apr 1, 2024

Not decompressing the full stream #57

Not decompressing the full stream #57

Comments

Sorseg commented Mar 24, 2024

KillingSpark commented Mar 25, 2024

KillingSpark commented Mar 25, 2024 • edited Loading

KillingSpark commented Mar 25, 2024 • edited Loading

Sorseg commented Apr 1, 2024

Sorseg commented Apr 1, 2024 • edited Loading

KillingSpark commented Apr 1, 2024

Sorseg commented Apr 1, 2024

Sorseg commented Apr 1, 2024

KillingSpark commented Apr 1, 2024

KillingSpark commented Mar 25, 2024 •

edited

Loading

KillingSpark commented Mar 25, 2024 •

edited

Loading

Sorseg commented Apr 1, 2024 •

edited

Loading