-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xerial snappy read/writer #838
Conversation
Forked from [github.com/eapache/go-xerial-snappy](https://github.com/eapache/go-xerial-snappy). Changes: * Uses [S2](https://github.com/klauspost/compress/tree/master/s2#snappy-compatibility) for better/faster compression and decompression. * Fixes 0-length roundtrips. * Adds `DecodeCapped`, which allows decompression with capped output size. * `DecodeInto` will decode directly into destination if there is space enough. * `Encode` will now encode directly into 'dst' if it has space enough. * Fixes short snappy buffers returning `ErrMalformed`. * Renames `EncodeStream` to `Encode`. * Adds `EncodeBetter` for better than default compression at ~half the speed. Comparison (before/after): ``` BenchmarkSnappyStreamEncode-32 959010 1170 ns/op 875.15 MB/s 1280 B/op 1 allocs/op BenchmarkSnappyStreamEncode-32 1000000 1107 ns/op 925.04 MB/s 0 B/op 0 allocs/op --> Output size: 913 -> 856 bytes BenchmarkSnappyStreamEncodeBetter-32 477739 2506 ns/op 408.62 MB/s 0 B/op 0 allocs/op --> Output size: 835 bytes BenchmarkSnappyStreamEncodeMassive-32 100 10596963 ns/op 966.31 MB/s 40977 B/op 1 allocs/op BenchmarkSnappyStreamEncodeMassive-32 100 10220236 ns/op 1001.93 MB/s 0 B/op 0 allocs/op --> Output size: 2365547 -> 2256991 bytes BenchmarkSnappyStreamEncodeBetterMassive-32 69 16983314 ns/op 602.94 MB/s 0 B/op 0 allocs/op --> Output size: 2011997 bytes BenchmarkSnappyStreamDecodeInto-32 1887378 639.5 ns/op 1673.19 MB/s 1088 B/op 3 allocs/op BenchmarkSnappyStreamDecodeInto-32 2707915 436.2 ns/op 2452.99 MB/s 0 B/op 0 allocs/op BenchmarkSnappyStreamDecodeIntoMassive-32 267 4559594 ns/op 2245.81 MB/s 71120 B/op 1 allocs/op BenchmarkSnappyStreamDecodeIntoMassive-32 282 4285844 ns/op 2389.26 MB/s 0 B/op 0 allocs/op ```
Hmm, that's a surprisingly good question :) It sounds like there are some bug fixes and some feature improvements mixed together here? Though it's not clear to me at a glance whether the bug fixes are just side-effects of the switch to S2, or whether the framing code in xerial.go actually changed meaningfully as well? The last time I touched the xerial-snappy package was January, in eapache/go-xerial-snappy#9 - @dnwe and I had a chat in that PR (which you are welcome to read for context) about the future of the package and its primary user (https://github.com/IBM/sarama/), given I'm not really involved in any of the related projects anymore.
To answer the original question though - I'm certainly happy to accept small bug fixes and improvements upstream if you'd like to contribute them there instead / as well / depending on the discussion with Dom. Swapping the underlying library for S2 is not something that I would feel comfortable accepting just because it's a major change and I don't have the setup or time to verify for myself that it actually maintains compatibility, wouldn't break Sarama, etc. |
@eapache I can send the fixes for the decoding edge cases. That'll give me an excuse to add the test cases outside of the fuzz tests. I understand that you aren't looking to test replacing the encoder - I had a hunch about that, which is why I forked it. It is of course safe, but I wouldn't take such a change without spending time investigating myself. Yes, it would be drop-in, except Sidenote: It took me a while to understand how the "append" mechanic of EncodeStream works. I think I will add some more documentation on that. I will send a PR with the fixes ASAP. You can then decide on what the fate of your package will be :) |
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | minor | `v1.16.7` -> `v1.17.0` | --- ### Release Notes <details> <summary>klauspost/compress (github.com/klauspost/compress)</summary> ### [`v1.17.0`](https://togithub.com/klauspost/compress/releases/tag/v1.17.0) [Compare Source](https://togithub.com/klauspost/compress/compare/v1.16.7...v1.17.0) #### What's Changed - Add dictionary builder by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/853](https://togithub.com/klauspost/compress/pull/853) - Add xerial snappy read/writer by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/838](https://togithub.com/klauspost/compress/pull/838) - flate: Add limited window compression by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/843](https://togithub.com/klauspost/compress/pull/843) - s2: Do 2 overlapping match checks by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/839](https://togithub.com/klauspost/compress/pull/839) - flate: Add amd64 assembly matchlen by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/837](https://togithub.com/klauspost/compress/pull/837) - gzip: Copy bufio.Reader on Reset by [@​thatguystone](https://togithub.com/thatguystone) in [https://github.com/klauspost/compress/pull/860](https://togithub.com/klauspost/compress/pull/860) - zstd: Remove offset from bitReader by [@​greatroar](https://togithub.com/greatroar) in [https://github.com/klauspost/compress/pull/854](https://togithub.com/klauspost/compress/pull/854) - fse, huff0, zstd: Remove always-nil error returns by [@​greatroar](https://togithub.com/greatroar) in [https://github.com/klauspost/compress/pull/857](https://togithub.com/klauspost/compress/pull/857) - tests: unnecessary use of fmt.Sprintf by [@​testwill](https://togithub.com/testwill) in [https://github.com/klauspost/compress/pull/836](https://togithub.com/klauspost/compress/pull/836) - tests: Fix OSS fuzzer t.Run by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/852](https://togithub.com/klauspost/compress/pull/852) - tests: Use Go 1.21.x by [@​klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/851](https://togithub.com/klauspost/compress/pull/851) #### New Contributors - [@​testwill](https://togithub.com/testwill) made their first contribution in [https://github.com/klauspost/compress/pull/836](https://togithub.com/klauspost/compress/pull/836) - [@​thatguystone](https://togithub.com/thatguystone) made their first contribution in [https://github.com/klauspost/compress/pull/860](https://togithub.com/klauspost/compress/pull/860) **Full Changelog**: klauspost/compress@v1.16.7...v1.17.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4xMDkuNCIsInVwZGF0ZWRJblZlciI6IjM2LjEwOS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
Forked from github.com/eapache/go-xerial-snappy.
Changes:
DecodeCapped
, which allows decompression with capped output size.DecodeInto
will decode directly into destination if there is space enough.Encode
will now encode directly into 'dst' if it has space enough.ErrMalformed
.EncodeStream
toEncode
.EncodeBetter
for better than default compression at ~half the speed.Comparison (before/after):
@eapache - I don't know if you are interested in having these changes upstream?