Rewrite proof decoding code to not use a separate map #1462

tomaka · 2023-12-11T14:00:24Z

Before this PR, decoding a proof builds a BTreeMap that basically duplicates the trie structure.

Looking things up in the proof then consists in traversing the BTreeMap. It seems to me, however, that we would be better off traversing the proof itself rather than a BTreeMap.

This is what this PR does. Decoding a proof now builds a "proof layout" that is now traversed when looking things up in the proof.
As a side effect, the keys and prefixes are no longer passed as slices by as iterators. This should also slightly improve performances, as there are various locations where we builds Vecs just to pass them as parameter.

Since the objective of this PR is to improve performance, I might throw it away if it turns out to be slower than before (which is why I did #1460 beforehand).

tomaka · 2023-12-21T07:56:13Z

Some preliminary benchmarking show a 25% to 30% gain both when decoding and when accessing a proof content, and that's not accounting for the fact that we can optimize accessing the proof content further by storing the partial key on the side to avoid decoding trie nodes multiple times.

tomaka · 2023-12-21T16:35:56Z

Everything now seems to work fine.

tomaka · 2023-12-21T16:43:36Z

Here are the benchmarks compared to the main branch:

Benchmarking proof-decode/decode/1762: Collecting 100 samples in estimated 5.0073 s (858k iteratio
proof-decode/decode/1762
                        time:   [5.8289 µs 5.8367 µs 5.8439 µs]
                        thrpt:  [287.54 MiB/s 287.90 MiB/s 288.28 MiB/s]
                 change:
                        time:   [-24.676% -24.462% -24.255%] (p = 0.00 < 0.05)
                        thrpt:  [+32.022% +32.384% +32.760%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
Benchmarking proof-decode/get-storage-value/1762: Collecting 100 samples in estimated 5.0013 s (14
proof-decode/get-storage-value/1762
                        time:   [112.08 ns 112.34 ns 112.61 ns]
                        thrpt:  [14.572 GiB/s 14.607 GiB/s 14.641 GiB/s]
                 change:
                        time:   [-52.757% -52.449% -52.130%] (p = 0.00 < 0.05)
                        thrpt:  [+108.90% +110.30% +111.67%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking proof-decode/next-key/1762: Collecting 100 samples in estimated 5.0003 s (14M iterati
proof-decode/next-key/1762
                        time:   [122.81 ns 122.92 ns 123.05 ns]
                        thrpt:  [13.336 GiB/s 13.350 GiB/s 13.362 GiB/s]
                 change:
                        time:   [-40.841% -40.518% -40.201%] (p = 0.00 < 0.05)
                        thrpt:  [+67.227% +68.117% +69.037%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
Benchmarking proof-decode/closest-descendant-merkle-value/1762: Collecting 100 samples in estimate
proof-decode/closest-descendant-merkle-value/1762
                        time:   [108.74 ns 108.94 ns 109.17 ns]
                        thrpt:  [15.032 GiB/s 15.063 GiB/s 15.091 GiB/s]
                 change:
                        time:   [-27.075% -26.565% -26.048%] (p = 0.00 < 0.05)
                        thrpt:  [+35.222% +36.176% +37.127%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking proof-decode/closest-ancestor/1762: Collecting 100 samples in estimated 5.0010 s (13M
proof-decode/closest-ancestor/1762
                        time:   [127.16 ns 127.42 ns 127.72 ns]
                        thrpt:  [12.848 GiB/s 12.879 GiB/s 12.904 GiB/s]
                 change:
                        time:   [+24.550% +25.246% +25.930%] (p = 0.00 < 0.05)
                        thrpt:  [-20.591% -20.157% -19.711%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
Benchmarking proof-decode/decode/100299: Collecting 100 samples in estimated 7.1923 s (15k iterati
proof-decode/decode/100299
                        time:   [472.63 µs 473.45 µs 474.24 µs]
                        thrpt:  [201.70 MiB/s 202.03 MiB/s 202.38 MiB/s]
                 change:
                        time:   [-40.275% -40.096% -39.891%] (p = 0.00 < 0.05)
                        thrpt:  [+66.366% +66.934% +67.433%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking proof-decode/get-storage-value/100299: Collecting 100 samples in estimated 5.0013 s (
proof-decode/get-storage-value/100299
                        time:   [127.21 ns 127.63 ns 128.04 ns]
                        thrpt:  [729.55 GiB/s 731.90 GiB/s 734.32 GiB/s]
                 change:
                        time:   [-63.839% -63.697% -63.569%] (p = 0.00 < 0.05)
                        thrpt:  [+174.49% +175.46% +176.54%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking proof-decode/next-key/100299: Collecting 100 samples in estimated 5.0013 s (13M itera
proof-decode/next-key/100299
                        time:   [140.20 ns 140.80 ns 141.57 ns]
                        thrpt:  [659.80 GiB/s 663.42 GiB/s 666.28 GiB/s]
                 change:
                        time:   [-53.641% -53.349% -53.023%] (p = 0.00 < 0.05)
                        thrpt:  [+112.87% +114.36% +115.71%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe
Benchmarking proof-decode/closest-descendant-merkle-value/100299: Collecting 100 samples in estima
proof-decode/closest-descendant-merkle-value/100299
                        time:   [122.21 ns 122.64 ns 123.10 ns]
                        thrpt:  [758.83 GiB/s 761.67 GiB/s 764.33 GiB/s]
                 change:
                        time:   [-49.598% -49.338% -49.056%] (p = 0.00 < 0.05)
                        thrpt:  [+96.293% +97.387% +98.404%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking proof-decode/closest-ancestor/100299: Collecting 100 samples in estimated 5.0013 s (1
proof-decode/closest-ancestor/100299
                        time:   [140.15 ns 140.31 ns 140.48 ns]
                        thrpt:  [664.92 GiB/s 665.77 GiB/s 666.52 GiB/s]
                 change:
                        time:   [-11.439% -11.000% -10.593%] (p = 0.00 < 0.05)
                        thrpt:  [+11.848% +12.359% +12.916%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

tomaka · 2023-12-21T16:50:00Z

I'm very happy with the direction of this change, plus it seems to fix some corner case situations that the missing tests didn't detect before.

While the code isn't perfect, I'm going to merge this so that I can start working on other changes that are blocked on this one.

tomaka added 6 commits December 11, 2023 14:55

Rewrite proof decoding code to not use a separate map

0b922bd

Work in progress

3f59d66

Merge branch 'main' into rewrite-proof-decoding

536d841

Adjust some APIs

ddb672c

Use strong typing for stack entries

e28159a

Don't decode trie nodes twice when decoding

ba75a25

tomaka added 4 commits December 21, 2023 11:02

Work on next_key

a0fbda1

next_key is expected to work now

37e6175

Fix benches

265a012

Give me a break

a46750f

tomaka mentioned this pull request Dec 21, 2023

Add tests to the proof decoding code #1500

Merged

tomaka added 18 commits December 21, 2023 13:51

Merge branch 'main' into rewrite-proof-decoding

683e3e0

Fix tests compilation

f39dc72

Uncomment now-working test

559f840

Implement trie_node_info

b61325a

Reimplement iter_ordered

f80e764

Don't require lifetime on trie_root_merkle_value

bd1ad45

Update light client

331d753

Inline and fix closest_ancestor_in_proof

0f89ccf

Remove empty proof special case

1254ae6

Improve fmt::Debug for EntryKeyIter

d781a46

Bugfix in next_key

9a7e1e5

Uncomment failing test

8f169af

CHANGELOG

ea7564f

Fix warnings

230042a

Typo

6278aa5

Fix children with nibble e being skipped

378ae19

Stupidest typo

08472b5

Add some TODOs

e14ad0f

tomaka marked this pull request as ready for review December 21, 2023 16:35

Small revert in runtime_service

2641095

tomaka added this pull request to the merge queue Dec 21, 2023

Merged via the queue into smol-dot:main with commit 3186bb4 Dec 21, 2023
22 checks passed

tomaka deleted the rewrite-proof-decoding branch December 21, 2023 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite proof decoding code to not use a separate map #1462

Rewrite proof decoding code to not use a separate map #1462

tomaka commented Dec 11, 2023

tomaka commented Dec 21, 2023 •

edited

Loading

tomaka commented Dec 21, 2023

tomaka commented Dec 21, 2023

tomaka commented Dec 21, 2023

Rewrite proof decoding code to not use a separate map #1462

Rewrite proof decoding code to not use a separate map #1462

Conversation

tomaka commented Dec 11, 2023

tomaka commented Dec 21, 2023 • edited Loading

tomaka commented Dec 21, 2023

tomaka commented Dec 21, 2023

tomaka commented Dec 21, 2023

tomaka commented Dec 21, 2023 •

edited

Loading