Reduce unnecessary Vec allocations and indirections #77

chyyran · 2022-08-06T07:38:26Z

Pull Request Overview

Changed literal_probs array from a Vec<Vec<u16>> to a Vec2D backed by a contiguous Vec<u16>.
BitTrees in LenDecoder and DecoderState are now stored inline as an array. The actual BitTree data is still on the heap.
DecoderState::reset now just recreates stack-allocated data, dropping the existing ones. I found this to be faster than using slice::fill. The heap-allocated arrays still use fill to keep their allocation.

This gives a small but free performance boost and reduces overall allocations.

6e1f0d7

running 8 tests
test compress_65536                  ... bench:   1,406,069 ns/iter (+/- 60,172)
test compress_empty                  ... bench:         734 ns/iter (+/- 36)
test compress_hello                  ... bench:       1,037 ns/iter (+/- 31)
test decompress_after_compress_65536 ... bench:   2,708,511 ns/iter (+/- 298,490)
test decompress_after_compress_empty ... bench:       2,243 ns/iter (+/- 1,853)
test decompress_after_compress_hello ... bench:       2,869 ns/iter (+/- 680)
test decompress_big_file             ... bench:   4,097,344 ns/iter (+/- 182,653)
test decompress_huge_dict            ... bench:       2,791 ns/iter (+/- 183)

After this PR (notice the tighter variance in particular)

running 8 tests
test compress_65536                  ... bench:   1,391,247 ns/iter (+/- 38,388)
test compress_empty                  ... bench:         725 ns/iter (+/- 17)
test compress_hello                  ... bench:       1,035 ns/iter (+/- 36)
test decompress_after_compress_65536 ... bench:   2,441,367 ns/iter (+/- 53,371)
test decompress_after_compress_empty ... bench:       1,974 ns/iter (+/- 66)
test decompress_after_compress_hello ... bench:       2,459 ns/iter (+/- 291)
test decompress_big_file             ... bench:   3,849,870 ns/iter (+/- 192,785)
test decompress_huge_dict            ... bench:       2,427 ns/iter (+/- 126)

Testing Strategy

This pull request was tested by...

Added relevant unit tests.

gendx

Nice to see some performance improvements here! Ultimately, const generics (#19) would go further into this direction.

src/decode/vec2d.rs

src/decode/lzma.rs

src/decode/vec2d.rs

src/macros.rs

src/decode/vec2d.rs

src/decode/lzma.rs

chyyran · 2022-08-09T06:39:02Z

I found that it was faster to just recreate stack-allocated arrays in DecoderState than bother with filling them. Heap-backed arrays still use fill to keep their allocation.

chyyran · 2022-08-17T00:57:26Z

b90b484 removes the explicit bounds checks in Index and IndexMut for Vec2D as well as the rows argument which isn't necessary afterwards. We can rely on the bounds checks for slice that can be better optimized by rustc instead on Release builds.

Looks like clippy is failing on an unrelated file with a new lint, so I'm going to leave that as is.

gendx

I've fixed the lints in #80, please rebase and re-run the workflows.

gendx · 2022-08-23T20:20:00Z

src/decode/lzma.rs

+        self.pos_decoders = [0x400; 115];
+        self.is_match = [0x400; 192];
+        self.is_rep = [0x400; 12];
+        self.is_rep_g0 = [0x400; 12];
+        self.is_rep_g1 = [0x400; 12];
+        self.is_rep_g2 = [0x400; 12];
+        self.is_rep_0long = [0x400; 192];
        self.state = 0;
-        self.rep.fill(0);
+        self.rep = [0; 4];


I don't think this brings any readability benefit over fill. The compiler should be able to optimize both versions of the code regardless.

I see you however observed otherwise:

I found that it was faster to just recreate stack-allocated arrays in DecoderState than bother with filling them. Heap-backed arrays still use fill to keep their allocation.

Can you add a comment here to explain that?

(IMO, this might also be worth filing a bug to the Rust compiler, to understand if fill being slower is intentional or not a missed optimization somewhere in the compiler.)

src/util/vec2d.rs

chyyran · 2022-08-24T02:13:58Z

Addressed nits, and reformatted imports as per #78 (comment) as well.

chyyran · 2022-08-26T22:32:32Z

Accidentally closed because of a faulty rebase, 444e68c should be properly rebased on current master now.

gendx

I've re-run the benchmarks, which show no significant change. However this is still a valuable cleanup - assuming the Vec2D implementation is correct with all the needed bounds checks!

src/util/vec2d.rs

* Changed literal_probs array from a Vec<Vec<u16>> to a Vec2D backed by a contiguous allocation * BitTrees in LenDecoder and DecoderState are now stored inline. The actual BitTree data still lives in a Vec but one level of indirection is reduced. * Don't bother with filling stack-allocated DecoderState arrays on reset, and just recreate the arrays dropping the existing ones.

chyyran · 2022-08-28T23:38:29Z

Fixed nits in 48b66fa

chyyran force-pushed the patch-reduce-allocs branch 3 times, most recently from c4a996a to 9f44242 Compare August 6, 2022 07:52

gendx requested changes Aug 6, 2022

View reviewed changes

chyyran force-pushed the patch-reduce-allocs branch 2 times, most recently from 70194d7 to 038d597 Compare August 6, 2022 16:11

chyyran requested a review from gendx August 6, 2022 16:12

chyyran force-pushed the patch-reduce-allocs branch 3 times, most recently from e2f0650 to b5c8f04 Compare August 6, 2022 16:20

chyyran mentioned this pull request Aug 9, 2022

Use const-generics #19

Closed

chyyran force-pushed the patch-reduce-allocs branch 2 times, most recently from b1a798a to a010cc0 Compare August 9, 2022 06:33

chyyran mentioned this pull request Aug 9, 2022

Use const generics to remove BitTree heap allocations #79

Merged

chyyran force-pushed the patch-reduce-allocs branch 2 times, most recently from 885a14f to b90b484 Compare August 17, 2022 00:50

gendx mentioned this pull request Aug 23, 2022

Fix Clippy lints. #80

Merged

1 task

gendx requested changes Aug 23, 2022

View reviewed changes

chyyran force-pushed the patch-reduce-allocs branch from b90b484 to 8e08e96 Compare August 24, 2022 02:13

chyyran requested a review from gendx August 24, 2022 02:14

chyyran force-pushed the patch-reduce-allocs branch 4 times, most recently from 583006a to 58360eb Compare August 26, 2022 21:31

chyyran closed this Aug 26, 2022

chyyran force-pushed the patch-reduce-allocs branch from 58360eb to c80e4aa Compare August 26, 2022 22:20

chyyran reopened this Aug 26, 2022

chyyran force-pushed the patch-reduce-allocs branch from 086d897 to 444e68c Compare August 26, 2022 22:32

gendx requested changes Aug 27, 2022

View reviewed changes

src/util/vec2d.rs Outdated Show resolved Hide resolved

src/util/vec2d.rs Outdated Show resolved Hide resolved

src/util/vec2d.rs Outdated Show resolved Hide resolved

src/util/vec2d.rs Show resolved Hide resolved

chyyran force-pushed the patch-reduce-allocs branch from 444e68c to 48b66fa Compare August 28, 2022 23:35

chyyran requested a review from gendx August 28, 2022 23:37

gendx approved these changes Sep 3, 2022

View reviewed changes

gendx merged commit 5ad34d7 into gendx:master Sep 3, 2022

chyyran deleted the patch-reduce-allocs branch September 3, 2022 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce unnecessary Vec allocations and indirections #77

Reduce unnecessary Vec allocations and indirections #77

chyyran commented Aug 6, 2022 •

edited

Loading

gendx left a comment

chyyran commented Aug 9, 2022

chyyran commented Aug 17, 2022 •

edited

Loading

gendx left a comment

gendx Aug 23, 2022

chyyran commented Aug 24, 2022

chyyran commented Aug 26, 2022

gendx left a comment

chyyran commented Aug 28, 2022

Reduce unnecessary Vec allocations and indirections #77

Reduce unnecessary Vec allocations and indirections #77

Conversation

chyyran commented Aug 6, 2022 • edited Loading

Pull Request Overview

Testing Strategy

gendx left a comment

Choose a reason for hiding this comment

chyyran commented Aug 9, 2022

chyyran commented Aug 17, 2022 • edited Loading

gendx left a comment

Choose a reason for hiding this comment

gendx Aug 23, 2022

Choose a reason for hiding this comment

chyyran commented Aug 24, 2022

chyyran commented Aug 26, 2022

gendx left a comment

Choose a reason for hiding this comment

chyyran commented Aug 28, 2022

chyyran commented Aug 6, 2022 •

edited

Loading

chyyran commented Aug 17, 2022 •

edited

Loading