Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite proof decoding code to not use a separate map #1462

Merged
merged 29 commits into from
Dec 21, 2023

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Dec 11, 2023

Before this PR, decoding a proof builds a BTreeMap that basically duplicates the trie structure.

Looking things up in the proof then consists in traversing the BTreeMap. It seems to me, however, that we would be better off traversing the proof itself rather than a BTreeMap.

This is what this PR does. Decoding a proof now builds a "proof layout" that is now traversed when looking things up in the proof.
As a side effect, the keys and prefixes are no longer passed as slices by as iterators. This should also slightly improve performances, as there are various locations where we builds Vecs just to pass them as parameter.

Since the objective of this PR is to improve performance, I might throw it away if it turns out to be slower than before (which is why I did #1460 beforehand).

@tomaka
Copy link
Contributor Author

tomaka commented Dec 21, 2023

Some preliminary benchmarking show a 25% to 30% gain both when decoding and when accessing a proof content, and that's not accounting for the fact that we can optimize accessing the proof content further by storing the partial key on the side to avoid decoding trie nodes multiple times.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 21, 2023

Everything now seems to work fine.

@tomaka tomaka marked this pull request as ready for review December 21, 2023 16:35
@tomaka
Copy link
Contributor Author

tomaka commented Dec 21, 2023

Here are the benchmarks compared to the main branch:

Benchmarking proof-decode/decode/1762: Collecting 100 samples in estimated 5.0073 s (858k iteratio
proof-decode/decode/1762
                        time:   [5.8289 µs 5.8367 µs 5.8439 µs]
                        thrpt:  [287.54 MiB/s 287.90 MiB/s 288.28 MiB/s]
                 change:
                        time:   [-24.676% -24.462% -24.255%] (p = 0.00 < 0.05)
                        thrpt:  [+32.022% +32.384% +32.760%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
Benchmarking proof-decode/get-storage-value/1762: Collecting 100 samples in estimated 5.0013 s (14
proof-decode/get-storage-value/1762
                        time:   [112.08 ns 112.34 ns 112.61 ns]
                        thrpt:  [14.572 GiB/s 14.607 GiB/s 14.641 GiB/s]
                 change:
                        time:   [-52.757% -52.449% -52.130%] (p = 0.00 < 0.05)
                        thrpt:  [+108.90% +110.30% +111.67%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking proof-decode/next-key/1762: Collecting 100 samples in estimated 5.0003 s (14M iterati
proof-decode/next-key/1762
                        time:   [122.81 ns 122.92 ns 123.05 ns]
                        thrpt:  [13.336 GiB/s 13.350 GiB/s 13.362 GiB/s]
                 change:
                        time:   [-40.841% -40.518% -40.201%] (p = 0.00 < 0.05)
                        thrpt:  [+67.227% +68.117% +69.037%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
Benchmarking proof-decode/closest-descendant-merkle-value/1762: Collecting 100 samples in estimate
proof-decode/closest-descendant-merkle-value/1762
                        time:   [108.74 ns 108.94 ns 109.17 ns]
                        thrpt:  [15.032 GiB/s 15.063 GiB/s 15.091 GiB/s]
                 change:
                        time:   [-27.075% -26.565% -26.048%] (p = 0.00 < 0.05)
                        thrpt:  [+35.222% +36.176% +37.127%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking proof-decode/closest-ancestor/1762: Collecting 100 samples in estimated 5.0010 s (13M
proof-decode/closest-ancestor/1762
                        time:   [127.16 ns 127.42 ns 127.72 ns]
                        thrpt:  [12.848 GiB/s 12.879 GiB/s 12.904 GiB/s]
                 change:
                        time:   [+24.550% +25.246% +25.930%] (p = 0.00 < 0.05)
                        thrpt:  [-20.591% -20.157% -19.711%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
Benchmarking proof-decode/decode/100299: Collecting 100 samples in estimated 7.1923 s (15k iterati
proof-decode/decode/100299
                        time:   [472.63 µs 473.45 µs 474.24 µs]
                        thrpt:  [201.70 MiB/s 202.03 MiB/s 202.38 MiB/s]
                 change:
                        time:   [-40.275% -40.096% -39.891%] (p = 0.00 < 0.05)
                        thrpt:  [+66.366% +66.934% +67.433%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking proof-decode/get-storage-value/100299: Collecting 100 samples in estimated 5.0013 s (
proof-decode/get-storage-value/100299
                        time:   [127.21 ns 127.63 ns 128.04 ns]
                        thrpt:  [729.55 GiB/s 731.90 GiB/s 734.32 GiB/s]
                 change:
                        time:   [-63.839% -63.697% -63.569%] (p = 0.00 < 0.05)
                        thrpt:  [+174.49% +175.46% +176.54%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking proof-decode/next-key/100299: Collecting 100 samples in estimated 5.0013 s (13M itera
proof-decode/next-key/100299
                        time:   [140.20 ns 140.80 ns 141.57 ns]
                        thrpt:  [659.80 GiB/s 663.42 GiB/s 666.28 GiB/s]
                 change:
                        time:   [-53.641% -53.349% -53.023%] (p = 0.00 < 0.05)
                        thrpt:  [+112.87% +114.36% +115.71%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe
Benchmarking proof-decode/closest-descendant-merkle-value/100299: Collecting 100 samples in estima
proof-decode/closest-descendant-merkle-value/100299
                        time:   [122.21 ns 122.64 ns 123.10 ns]
                        thrpt:  [758.83 GiB/s 761.67 GiB/s 764.33 GiB/s]
                 change:
                        time:   [-49.598% -49.338% -49.056%] (p = 0.00 < 0.05)
                        thrpt:  [+96.293% +97.387% +98.404%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking proof-decode/closest-ancestor/100299: Collecting 100 samples in estimated 5.0013 s (1
proof-decode/closest-ancestor/100299
                        time:   [140.15 ns 140.31 ns 140.48 ns]
                        thrpt:  [664.92 GiB/s 665.77 GiB/s 666.52 GiB/s]
                 change:
                        time:   [-11.439% -11.000% -10.593%] (p = 0.00 < 0.05)
                        thrpt:  [+11.848% +12.359% +12.916%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

@tomaka
Copy link
Contributor Author

tomaka commented Dec 21, 2023

I'm very happy with the direction of this change, plus it seems to fix some corner case situations that the missing tests didn't detect before.

While the code isn't perfect, I'm going to merge this so that I can start working on other changes that are blocked on this one.

@tomaka tomaka added this pull request to the merge queue Dec 21, 2023
Merged via the queue into smol-dot:main with commit 3186bb4 Dec 21, 2023
22 checks passed
@tomaka tomaka deleted the rewrite-proof-decoding branch December 21, 2023 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant