Skip to content

Commit

Permalink
s2: Add AMD64 assembly for better mode (#315)
Browse files Browse the repository at this point in the history
Blocks:
```
benchmark                              old ns/op     new ns/op     delta
BenchmarkTwainEncode1e1/better-32      10.7          10.5          -1.87%
BenchmarkTwainEncode1e2/better-32      2947          280           -90.50%
BenchmarkTwainEncode1e3/better-32      6664          2525          -62.11%
BenchmarkTwainEncode1e4/better-32      47401         25461         -46.29%
BenchmarkTwainEncode1e5/better-32      528060        417367        -20.96%
BenchmarkTwainEncode1e6/better-32      2137499       1554364       -27.28%

benchmark                                                  old ns/op     new ns/op     delta
BenchmarkRandomEncodeBetterBlock1MB-32                     39476         38241         -3.13%
BenchmarkEncodeS2Block/0-html/block-better-32              10140         6761          -33.32%
BenchmarkEncodeS2Block/1-urls/block-better-32              141170        90141         -36.15%
BenchmarkEncodeS2Block/2-jpg/block-better-32               1026          848           -17.35%
BenchmarkEncodeS2Block/3-jpg_200b/block-better-32          332           24.3          -92.68%
BenchmarkEncodeS2Block/4-pdf/block-better-32               12266         7164          -41.59%
BenchmarkEncodeS2Block/5-html4/block-better-32             14229         8134          -42.84%
BenchmarkEncodeS2Block/6-txt1/block-better-32              40537         27718         -31.62%
BenchmarkEncodeS2Block/7-txt2/block-better-32              35890         24783         -30.95%
BenchmarkEncodeS2Block/8-txt3/block-better-32              104525        77463         -25.89%
BenchmarkEncodeS2Block/9-txt4/block-better-32              144537        104121        -27.96%
BenchmarkEncodeS2Block/10-pb/block-better-32               9017          5427          -39.81%
BenchmarkEncodeS2Block/11-gaviota/block-better-32          31386         20973         -33.18%
BenchmarkEncodeS2Block/12-txt1_128b/block-better-32        312           16.4          -94.74%
BenchmarkEncodeS2Block/13-txt1_1000b/block-better-32       578           136           -76.47%
BenchmarkEncodeS2Block/14-txt1_10000b/block-better-32      3278          1293          -60.56%
BenchmarkEncodeS2Block/15-txt1_20000b/block-better-32      6469          3820          -40.95%

benchmark                                                  old MB/s      new MB/s      speedup
BenchmarkRandomEncodeBetterBlock1MB-32                     26562.09      27420.04      1.03x
BenchmarkEncodeS2Block/0-html/block-better-32              10098.47      15145.41      1.50x
BenchmarkEncodeS2Block/1-urls/block-better-32              4973.34       7788.75       1.57x
BenchmarkEncodeS2Block/2-jpg/block-better-32               119973.57     145200.76     1.21x
BenchmarkEncodeS2Block/3-jpg_200b/block-better-32          602.41        8241.97       13.68x
BenchmarkEncodeS2Block/4-pdf/block-better-32               8348.31       14293.26      1.71x
BenchmarkEncodeS2Block/5-html4/block-better-32             28786.61      50355.67      1.75x
BenchmarkEncodeS2Block/6-txt1/block-better-32              3751.82       5486.93       1.46x
BenchmarkEncodeS2Block/7-txt2/block-better-32              3487.81       5051.03       1.45x
BenchmarkEncodeS2Block/8-txt3/block-better-32              4082.81       5509.15       1.35x
BenchmarkEncodeS2Block/9-txt4/block-better-32              3333.82       4627.90       1.39x
BenchmarkEncodeS2Block/10-pb/block-better-32               13151.91      21850.98      1.66x
BenchmarkEncodeS2Block/11-gaviota/block-better-32          5872.67       8788.25       1.50x
BenchmarkEncodeS2Block/12-txt1_128b/block-better-32        410.38        7791.86       18.99x
BenchmarkEncodeS2Block/13-txt1_1000b/block-better-32       1729.19       7370.56       4.26x
BenchmarkEncodeS2Block/14-txt1_10000b/block-better-32      3050.66       7736.81       2.54x
BenchmarkEncodeS2Block/15-txt1_20000b/block-better-32      3091.47       5235.17       1.69x
```

Streams, With/without assembly, 16 cores:
```
github-june-2days-2019.json:
Compressing... 6273951764 -> 949146808 [15.13%]; 564ms, 10608.7MB/s
Compressing... 6273951764 -> 950079555 [15.14%]; 722ms, 8287.1MB/s

github-ranks-backup.bin:
Compressing... 1862623243 -> 555069246 [29.80%]; 261ms, 6805.8MB/s
Compressing... 1862623243 -> 555617002 [29.83%]; 384ms, 4625.9MB/s

enwik9:
Compressing... 1000000000 -> 426854233 [42.69%]; 229ms, 4164.5MB/s
Compressing... 1000000000 -> 427660256 [42.77%]; 333ms, 2863.9MB/s

nyc-taxi-data-10M.csv:
Compressing... 3325605752 -> 954776589 [28.71%]; 491ms, 6459.4MB/s
Compressing... 3325605752 -> 960330423 [28.88%]; 608ms, 5216.4MB/s

sharnd.out.2gb:
Compressing... 2147483647 -> 2147487753 [100.00%]; 174ms, 11770.0MB/s
Compressing... 2147483647 -> 2147487753 [100.00%]; 172ms, 11907.1MB/s
```
  • Loading branch information
klauspost authored Feb 25, 2021
1 parent da0f8a3 commit 68c9310
Show file tree
Hide file tree
Showing 10 changed files with 8,172 additions and 1,266 deletions.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,13 @@ This package provides various compression algorithms.
[![Sourcegraph Badge](https://sourcegraph.com/github.com/klauspost/compress/-/badge.svg)](https://sourcegraph.com/github.com/klauspost/compress?badge)

# changelog

* Feb 25, 2021 (v1.11.8)
* s2: Fixed occational out-of-bounds write on amd64. Upgrade recommended.
* s2: Add AMD64 assembly for better mode. 25-50% faster. [#315](https://github.com/klauspost/compress/pull/315)
* s2: Less upfront decoder allocation. [#322](https://github.com/klauspost/compress/pull/322)
* zstd: Faster "compression" of incompressible data. [#314](https://github.com/klauspost/compress/pull/314)
* zip: Fix zip64 headers. [#313](https://github.com/klauspost/compress/pull/313)

* Jan 14, 2021 (v1.11.7)
* Use Bytes() interface to get bytes across packages. [#309](https://github.com/klauspost/compress/pull/309)
* s2: Add 'best' compression option. [#310](https://github.com/klauspost/compress/pull/310)
Expand Down
24 changes: 12 additions & 12 deletions s2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Decoding is compatible with Snappy compressed content, but content compressed wi
This means that S2 can seamlessly replace Snappy without converting compressed content.

S2 is designed to have high throughput on content that cannot be compressed.
This is important so you don't have to worry about spending CPU cycles on already compressed data.
This is important, so you don't have to worry about spending CPU cycles on already compressed data.

## Benefits over Snappy

Expand Down Expand Up @@ -456,33 +456,33 @@ This will compress as much as possible with little regard to CPU usage.
Mainly for offline compression, but where decompression speed should still
be high and compatible with other S2 compressed data.

Some examples compared on 16 core CPU:
Some examples compared on 16 core CPU, amd64 assembly used:

```
* enwik10
Default... 10000000000 -> 4761467548 [47.61%]; 1.098s, 8685.6MB/s
Better... 10000000000 -> 4225922984 [42.26%]; 2.817s, 3385.4MB/s
Best... 10000000000 -> 3667646858 [36.68%]; 35.995s, 264.9MB/s
Better... 10000000000 -> 4219438251 [42.19%]; 1.925s, 4954.2MB/s
Best... 10000000000 -> 3667646858 [36.68%]; 35.995s, 264.9MB/s
* github-june-2days-2019.json
Default... 6273951764 -> 1043196283 [16.63%]; 431ms, 13882.3MB/s
Better... 6273951764 -> 950079555 [15.14%]; 736ms, 8129.5MB/s
Best... 6273951764 -> 846260870 [13.49%]; 8.125s, 736.4MB/s
Better... 6273951764 -> 949146808 [15.13%]; 547ms, 10938.4MB/s
Best... 6273951764 -> 846260870 [13.49%]; 8.125s, 736.4MB/s
* nyc-taxi-data-10M.csv
Default... 3325605752 -> 1095998837 [32.96%]; 324ms, 9788.7MB/s
Better... 3325605752 -> 960330423 [28.88%]; 602ms, 5268.4MB/s
Best... 3325605752 -> 794873295 [23.90%]; 6.619s, 479.1MB/s
Better... 3325605752 -> 954776589 [28.71%]; 491ms, 6459.4MB/s
Best... 3325605752 -> 794873295 [23.90%]; 6.619s, 479.1MB/s
* 10gb.tar
Default... 10065157632 -> 5916578242 [58.78%]; 1.028s, 9337.4MB/s
Better... 10065157632 -> 5650133605 [56.14%]; 2.172s, 4419.4MB/s
Best... 10065157632 -> 5246578570 [52.13%]; 25.696s, 373.6MB/s
Better... 10065157632 -> 5649207485 [56.13%]; 1.597s, 6010.6MB/s
Best... 10065157632 -> 5246578570 [52.13%]; 25.696s, 373.6MB/s
* consensus.db.10gb
Default... 10737418240 -> 4562648848 [42.49%]; 882ms, 11610.0MB/s
Better... 10737418240 -> 4542443833 [42.30%]; 3.3s, 3103.5MB/s
Best... 10737418240 -> 4272335558 [39.79%]; 38.955s, 262.9MB/s
Better... 10737418240 -> 4542428129 [42.30%]; 1.533s, 6679.7MB/s
Best... 10737418240 -> 4272335558 [39.79%]; 38.955s, 262.9MB/s
```

Decompression speed should be around the same as using the 'better' compression mode.
Expand Down
Loading

0 comments on commit 68c9310

Please sign in to comment.