[wasmparser] Reduce the size of TypeId for better validation performance #844

Ekleog · 2022-11-28T15:15:04Z

Hey! First of all, thank you for maintaining wasm-tools :D

While upgrading our wasmer fork from wasmparser 0.78 to 0.94, I noticed a significant performance degradation on one of our benchmarks. Unfortunately, this benchmark is currently related to a security vulnerability, so I cannot share it yet.

My understanding is that it is related to TypeId becoming significantly bigger in 0.94 compared to 0.78, as all the metadata to identify different types that are basically isomorphic otherwise was added. TBH I don't really understand why this metadata is required, but I guess it's required indeed as otherwise #775 probably wouldn't have made it in.

So I'm submitting this PR, that at least reduces the amount of extra information: TypeId was just an 8-bytes integer in 0.78, and became a huge 40-bytes struct in 0.94, of which 8 are the previous TypeId, 8 are for type size monitoring, and 24 are for the type metadata used just for disambiguating types. With this PR, the last 24 bytes get reduced to 8 bytes, by bitpacking everything together and assuming there won't be more than 2⁶² types in a module, which sounds like a reasonable assumption to me. So TypeId becomes a 24-bytes struct.

(It'd be even better if we could assume a max type size of 2³² and a max number of types of 2³⁰, as this would allow us to reduce further the size of TypeId to 16-bytes, but I'm not sure this would be reasonable limits for wasmparser? Also, my benches don't show significant performance changes reducing the size further, though I'm not really sure why)

I have tried running our internal bench through wasmparser's benchmarks, and I can say that the before/after difference with this is a ~12% performance gain in validation, thanks to reduced memory pressure most certainly.

WDYT? :)

alexcrichton · 2022-11-28T16:45:36Z

Thanks! Have you been able to see the difference from the benchmarks in this repository? If not would you be up for adding a benchmark?

I think this could also be made smaller as well. The type_size and index values don't need to be larger than 32 bits so could be hardcoded as u32. Additionally type_hash could be shrunk to 4-bytes as well since the type index won't ever go so big as to need the full 32-bits. That would bring this down to 12 bytes which might improve things further?

alexcrichton · 2022-11-28T22:25:40Z

I sent Ekleog#1 which further shrinks the type to 8 bytes, but locally doesn't show a huge impact in the benchmarks for this repository. Would you be up for testing that to see whether it further improves the benchmark you're looking at? (which if possible would still be great to add here)

Ekleog-NEAR · 2022-12-07T11:31:31Z

Thank you for your answer as well as further size reduction PR!

However, it turns out that the further reduction is actually worsening performance, maybe because often-used values now need to be unpacked?

What I’m seeing on the repro I can’t share is:

main -> first commit of this PR: 10% improvement (the repro had taken a ~15% perf hit when upgrading wasmparser, so it’s likely ~back to previous performance, though I’m not measuring in the same way)
main -> tip of your further size reduction: 5% improvement
main -> second commit of this PR, that reduces to 16 bytes by reducing type_size and type_hash while keeping index: usize: still only 10% improvement

These results are roughly consistent with the bench I just pushed on top of this branch.

WDYT? I’m thinking we could either keep the size reduction to 24 bytes or do the further reduction to 16 bytes, whichever you think best, but I’d argue against reducing to 8 bytes given the numbers above. (So I’m also pushing both commits right now, but feel free to let me know if you think removing the second one would be better)

Performance benefits are less obvious, but it gets us further away from a regression

alexcrichton

Ok sounds good to me, I'll go ahead and merge this then, thanks for measuring and thanks for the PR!

Ekleog · 2022-12-10T14:25:51Z

Awesome, thank you! :)

Reduce the size of TypeId for better validation performance

ec5529d

Ekleog force-pushed the typeid-size-reduce branch from 87c3671 to ec5529d Compare November 28, 2022 15:39

Ekleog-NEAR added 2 commits December 7, 2022 12:35

Add a benchmark showcasing TypeId size reduction performance benefits

78e6867

Reduce TypeId size further to 16 bytes

cbb33b7

Performance benefits are less obvious, but it gets us further away from a regression

alexcrichton approved these changes Dec 9, 2022

View reviewed changes

alexcrichton merged commit 24d4808 into bytecodealliance:main Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wasmparser] Reduce the size of TypeId for better validation performance #844

[wasmparser] Reduce the size of TypeId for better validation performance #844

Ekleog commented Nov 28, 2022

alexcrichton commented Nov 28, 2022

alexcrichton commented Nov 28, 2022

Ekleog-NEAR commented Dec 7, 2022 •

edited

Loading

alexcrichton left a comment

Ekleog commented Dec 10, 2022

[wasmparser] Reduce the size of TypeId for better validation performance #844

[wasmparser] Reduce the size of TypeId for better validation performance #844

Conversation

Ekleog commented Nov 28, 2022

alexcrichton commented Nov 28, 2022

alexcrichton commented Nov 28, 2022

Ekleog-NEAR commented Dec 7, 2022 • edited Loading

alexcrichton left a comment

Choose a reason for hiding this comment

Ekleog commented Dec 10, 2022

Ekleog-NEAR commented Dec 7, 2022 •

edited

Loading