Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasmparser] Reduce the size of TypeId for better validation performance #844

Merged
merged 3 commits into from
Dec 9, 2022

Conversation

Ekleog
Copy link
Contributor

@Ekleog Ekleog commented Nov 28, 2022

Hey! First of all, thank you for maintaining wasm-tools :D

While upgrading our wasmer fork from wasmparser 0.78 to 0.94, I noticed a significant performance degradation on one of our benchmarks. Unfortunately, this benchmark is currently related to a security vulnerability, so I cannot share it yet.

My understanding is that it is related to TypeId becoming significantly bigger in 0.94 compared to 0.78, as all the metadata to identify different types that are basically isomorphic otherwise was added. TBH I don't really understand why this metadata is required, but I guess it's required indeed as otherwise #775 probably wouldn't have made it in.

So I'm submitting this PR, that at least reduces the amount of extra information: TypeId was just an 8-bytes integer in 0.78, and became a huge 40-bytes struct in 0.94, of which 8 are the previous TypeId, 8 are for type size monitoring, and 24 are for the type metadata used just for disambiguating types. With this PR, the last 24 bytes get reduced to 8 bytes, by bitpacking everything together and assuming there won't be more than 2⁶² types in a module, which sounds like a reasonable assumption to me. So TypeId becomes a 24-bytes struct.

(It'd be even better if we could assume a max type size of 2³² and a max number of types of 2³⁰, as this would allow us to reduce further the size of TypeId to 16-bytes, but I'm not sure this would be reasonable limits for wasmparser? Also, my benches don't show significant performance changes reducing the size further, though I'm not really sure why)

I have tried running our internal bench through wasmparser's benchmarks, and I can say that the before/after difference with this is a ~12% performance gain in validation, thanks to reduced memory pressure most certainly.

WDYT? :)

@alexcrichton
Copy link
Member

Thanks! Have you been able to see the difference from the benchmarks in this repository? If not would you be up for adding a benchmark?

I think this could also be made smaller as well. The type_size and index values don't need to be larger than 32 bits so could be hardcoded as u32. Additionally type_hash could be shrunk to 4-bytes as well since the type index won't ever go so big as to need the full 32-bits. That would bring this down to 12 bytes which might improve things further?

@alexcrichton
Copy link
Member

I sent Ekleog#1 which further shrinks the type to 8 bytes, but locally doesn't show a huge impact in the benchmarks for this repository. Would you be up for testing that to see whether it further improves the benchmark you're looking at? (which if possible would still be great to add here)

@Ekleog-NEAR
Copy link
Contributor

Ekleog-NEAR commented Dec 7, 2022

Thank you for your answer as well as further size reduction PR!

However, it turns out that the further reduction is actually worsening performance, maybe because often-used values now need to be unpacked?

What I’m seeing on the repro I can’t share is:

  • main -> first commit of this PR: 10% improvement (the repro had taken a ~15% perf hit when upgrading wasmparser, so it’s likely ~back to previous performance, though I’m not measuring in the same way)
  • main -> tip of your further size reduction: 5% improvement
  • main -> second commit of this PR, that reduces to 16 bytes by reducing type_size and type_hash while keeping index: usize: still only 10% improvement

These results are roughly consistent with the bench I just pushed on top of this branch.

WDYT? I’m thinking we could either keep the size reduction to 24 bytes or do the further reduction to 16 bytes, whichever you think best, but I’d argue against reducing to 8 bytes given the numbers above. (So I’m also pushing both commits right now, but feel free to let me know if you think removing the second one would be better)

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good to me, I'll go ahead and merge this then, thanks for measuring and thanks for the PR!

@alexcrichton alexcrichton merged commit 24d4808 into bytecodealliance:main Dec 9, 2022
@Ekleog
Copy link
Contributor Author

Ekleog commented Dec 10, 2022

Awesome, thank you! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants