Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial interning implementation #1612

Merged
merged 27 commits into from
Jul 22, 2019
Merged

Initial interning implementation #1612

merged 27 commits into from
Jul 22, 2019

Conversation

Pauan
Copy link
Contributor

@Pauan Pauan commented Jun 22, 2019

This is just a rough draft so I can make sure I'm going in the right direction.

The way I designed it, you can change (at runtime) the max str len and whether the interning is enabled or disabled.

In addition, there's a disabled Cargo feature to completely wipe out the implementation at compile-time, so there's 0 overhead.

The reason I went with a disabled feature (and not an enabled feature) is that I expect the interning to be enabled by default.

The cache size is hard-coded at 1,024 elements (I just guessed at what seemed like a reasonable number), but we can make that configurable later.

I took a look at the existing LRU crates, and I decided on uluru because it seemed to be the fastest and lightest weight (based on looking at the source code).

@alexcrichton does this seem reasonable?

@alexcrichton
Copy link
Contributor

I think this is a good start yeah but I'm personally mostly interested in how this hooks up into the wasm-bindgen crate and CLI moreso than the cache itself. It seems like we can easily go hog wild with configuration settings and implementations of the cache but hammering out the details of how the CLI/ABIs work I think will be pretty important too

@Pauan
Copy link
Contributor Author

Pauan commented Jun 22, 2019

Right, I'm just taking it one step at a time. The next step is to hook it into the #[wasm_bindgen] macro.

@alexcrichton
Copy link
Contributor

mk makes sense!

I figure this may make the most sense on the wasm-bindgen crate itself where a feature would basically just change the implementation of IntoWasmAbi for strings?

@Pauan
Copy link
Contributor Author

Pauan commented Jun 22, 2019

Yeah, that's what I was thinking. Basically, have IntoWasmAbi call the intern::str function.

And then wasm-bindgen could have a disable-interning feature which disables the interning for the wasm-bindgen-cache crate.

@Pauan
Copy link
Contributor Author

Pauan commented Jun 22, 2019

Okay, so because of circular dependencies I can't make wasm-bindgen-cache into a separate crate, so I had to move it into wasm-bindgen.

Here are the profiling results. First, without interning:

DevTools 007

Now with interning:

DevTools 006

The performance went from 1256.8ms to 1108.9ms, an improvement of 147.9ms! As you can see from the charts, decoding went from 387.2ms to 67.0ms.

Unfortunately, the increase wasn't as big as I was hoping for. I need to investigate why the cache_str function is taking up a whopping 154.3ms.

By the way, I had to implement OptionIntoWasmAbi for JsValue, and I'm not sure if the implementation is correct.

@Pauan
Copy link
Contributor Author

Pauan commented Jun 22, 2019

Okay, so the issue is that it was just trying to cache too many useless strings, so I took a different approach.

Rather than automatically caching all strings, instead the user has to manually call an intern("foo") function, which has the signature fn intern(s: &str) -> &str

This will add the string to the cache, and so when the string is sent to JS, it will then pick it up from the cache.

By carefully choosing which strings go into the cache and which don't, I was able to get incredible performance gains.

With interning completely disabled: (1363.8ms)

DevTools 008

With interning enabled but not used: (1485.0ms)

DevTools 009

With all strings interned: (1403.1ms)

DevTools 010

With specific strings interned: (915.0ms)

DevTools 011

So in the end, by interning only the strings which are commonly used, it gained 448.8ms in performance!

And as you can see in the last chart, decoding is very fast (69.8ms), and the cache itself is also very fast (37.5ms).

In fact, because everything is so fast, the top 7 items are all DOM built-ins, which means the app is going as fast as possible, and the browser is now the bottleneck.

In addition, the Major GC doesn't even include decoding, which means the browser is doing 0 garbage collection of strings. Amazing.

@alexcrichton
Copy link
Contributor

Sorry I've been pretty busy but this all looks great to me! Some high-level thoughts I might have are:

  • I think we'll probably want to have this as an off-by-default feature of wasm-bindgen. Anything on-by-default is very difficult to turn off since wasm-bindgen is so heavily depended on, and this also has some code-size impact with the interning infrastructure. Having it as a switch is perfect but I think the better default will be off-by-default. I also think that having the switch being disable vs enable makes it sort of go against how Cargo features work and may lead to surprising results.

  • I'd ideally prefer to avoid impl OptionIntoWasmAbi for JsValue for now if we can, would it be possible to just do raw bits under the hood to avoid adding the impl just yet?

  • Could the main test suite be run on CI with interning explicitly both enabled and disabled?

@Pauan
Copy link
Contributor Author

Pauan commented Jun 26, 2019

Anything on-by-default is very difficult to turn off since wasm-bindgen is so heavily depended on

Actually, any crate in the dependency graph (including the root/application crate) can very easily turn it off, they just need to use features = ["disable-interning"]

I also think that having the switch being disable vs enable makes it sort of go against how Cargo features work and may lead to surprising results.

I don't think so. It's true that the typical way of doing things (using default-features = false) is broken and almost unusable, but I'm not doing that.

And there isn't an "enable-interning" feature, so it's not possible for there to be any conflicts. And the API is exactly the same between the interning and not interning modes, so there's no compile errors.

So could you explain what sorts of surprising results you think might happen?

would it be possible to just do raw bits under the hood to avoid adding the impl just yet?

Sure, but what bits should it use? Maybe it should just return JsValue::UNDEFINED?

@Pauan
Copy link
Contributor Author

Pauan commented Jun 26, 2019

(Originally I intended to have full automatic string interning on by default, but after seeing the benchmarks it's clear that manual interning is much faster, so I'm okay with making the feature off-by-default, but I still want to know why you think it would be problematic)

@alexcrichton
Copy link
Contributor

Ah yeah so I was halfway done writing my original comment before I realized the feature was to disable, not an on-by-default feature which enabled. So to be clear if the feature were enable-intern and it was on-by-default, I don't think that's tenable since it could effectively never be turned off.

The feature is, however, off-by-default and instead worded as disabling interning. Cargo features are meant to be an additive set, however, due to the unioning behavior to compile crates only once. This means that if a crate relies on interning for reasonable performance, another crate may actually disable interning and the original crate has no course of action. In other words, the "disable" form is removing a feature rather than adding it which goes against how shared dependencies work in Cargo today.

That, plus the benchmark data, is basically why I think I'd prefer if this was an opt-in feature rather than on-by-default.

For returning JsValue::UNDEFINED, I think that'd work yeah!

@Pauan
Copy link
Contributor Author

Pauan commented Jun 27, 2019

So to be clear if the feature were enable-intern and it was on-by-default, I don't think that's tenable since it could effectively never be turned off.

Right, I learned that the hard way with futures-preview, where you cannot turn off their features at all, not even with default-features = false

This means that if a crate relies on interning for reasonable performance, another crate may actually disable interning and the original crate has no course of action.

That's absolutely true, but I don't think that's a problem, in fact I think that's desirable.

I don't see any reason why a crate would want to disable interning, unless they are the root application crate.

And in that case the root should definitely have the power to disable interning (if they find that it improves the performance of their app).

So, although it's quite rare in general, I think in this very specific case a "negative feature" can be appropriate.

That, plus the benchmark data, is basically why I think I'd prefer if this was an opt-in feature rather than on-by-default.

Yeah, I've come around to that as well.

@alexcrichton
Copy link
Contributor

Ah ok well in any case, want to update this to opt-in and we can go ahead and merge?

@Pauan
Copy link
Contributor Author

Pauan commented Jun 29, 2019

@alexcrichton So, while working on this, I noticed something: the IntoWasmAbi impl for JsValue uses mem::forget, so doesn't that mean that the object will stay in the JS heap forever, and never be cleaned up?

@alexcrichton
Copy link
Contributor

Oh right yeah I was thinking about that when reading this but forgot to say we should update it. I think that this is always returning (effectively) &JsValue, right? If str and String report ANYREF that may mean that &str is ok (since it'll show up as REF ANYREF) but for String I think we'll need to insert the REF there ourselves.

@Pauan
Copy link
Contributor Author

Pauan commented Jun 30, 2019

If the string exists in the cache, then yeah it can return &JsValue (which is fine).

The problem is that if the string isn't in the cache then the into_abi method uses JsValue::from to convert the &str into a JsValue, and then calls into_abi on that.

And since JsValue leaks (at least, if I'm understanding it correctly), that's quite bad. Maybe it's best we discuss this on Discord, since it seems to have some pretty deep implications?

@alexcrichton
Copy link
Contributor

Ah that's a good point! I'll be around sort of only spottily this week but feel free to ping me when you're around and we can hope it aligns! I'll be around-ish 9am-5pm CST.

Otherwise though it's true we don't have any precedent for returning a "maybe owned" JsValue from any APIs. That being said it doesn't seem entirely unresonable to invent one for strings. In the anyref world it will have no cost because the apis will still take anyref at the boundary, and for now it just means a bit more JS glue. I would imagine that the high bit (or something like that) would indicate whether it was owned or not and whether free needs to be called.

crates/cli-support/src/js/outgoing.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/cache/intern.rs Outdated Show resolved Hide resolved
src/cache/intern.rs Outdated Show resolved Hide resolved
@Pauan Pauan marked this pull request as ready for review July 17, 2019 21:28
@Pauan
Copy link
Contributor Author

Pauan commented Jul 17, 2019

Now that I can build wasm-bindgen-cli, I was able to actually test this.

So, I significantly changed the implementation (before your review). Now the implementation is almost identical to the current string impl, except if the ptr is 0, then it assumes that the len is a pointer to the JsValue.

What that means is that there is now basically no performance difference regardless of whether interning is enabled or not:

Without interning:
   normal     0.016440500294726636 ms

With interning:
   interned   0.006101373767490259 ms
   normal     0.017310679540981640 ms  (+183.7177363763148 %)

~183% increase in performance sounds pretty great to me. And it doesn't regress the performance when not interning (which was a big problem with the old implementation). And the memory leak has been fixed.

@Pauan
Copy link
Contributor Author

Pauan commented Jul 18, 2019

I rebased, and I also created a benchmark with the 1,000 most common English words. These are the results:

No interning (113,556 bytes):
   normal str            0.015406891304873488 ms

LRU Cache (139,951 bytes):
   interned str (hit)    0.001420249078550068 ms
   normal str            0.015683253470461032 ms  + 1004.2607742067371 %
   interned str (miss)   0.065211217786988060 ms  + 4491.5338916158410 %

BTreeMap (119,950 bytes):
   interned str (hit)    0.0021928430482934585 ms
   normal str            0.0148238589445778490 ms  + 576.0109418735763 %
   interned str (miss)   0.0159713026124058200 ms  + 628.3376995373748 %

HashMap (121,198 bytes):
   interned str (hit)    0.0017783651625431442 ms
   normal str            0.0157200559689065660 ms  + 783.9610840344038 %
   interned str (miss)   0.0159029148265826120 ms  + 794.2434974288807 %

The cache is full and it has a lot of similar words, so it's a reasonably accurate stress test. In practice it will be a lot faster.

The interesting things to note:

  • The LRU cache is actually the biggest in terms of file size, which is surprising. Maybe it's because it statically allocates the entire cache upfront?

  • The LRU cache is indeed slower during misses (about 4.5x slower). Not too bad for the worst-case situation, but still not good.

    I tested it on lighter work-loads and it didn't have any cost during misses, so this is specific to heavy work-loads.

  • Meanwhile, the BTreeMap and HashMap (from the Rust stdlib) perform exceptionally well: misses are the same performance as not interning.

  • On the other hand, the LRU cache is fastest for hits (10x faster). The BTreeMap and HashMap are very slightly slower on hits, but are much faster on misses.

Given the results, I switched it to use HashMap. This gives a 7.8x speed boost for hits, no speed penalty for misses, infinite scalability, and smaller file size than the LRU cache.

@Pauan
Copy link
Contributor Author

Pauan commented Jul 18, 2019

I just added in docs for the intern function. The only thing left is adding in CI testing for the enable-interning feature, but I'm not sure how to do that.

@Pauan
Copy link
Contributor Author

Pauan commented Jul 18, 2019

Also, the CI failures seem really weird, I have no clue what's going on with those.

Copy link
Contributor

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be sure to add tests to CI for this feature as well? I think that enabling this feature breaks Option<&str> because the tests for "is none" only checks the pointer value, and if it's zero it's considered undefined. For the cached case though we'd need to check both the pointer and the length to see if they're both zero.

crates/cli-support/src/js/outgoing.rs Outdated Show resolved Hide resolved
src/cache/intern.rs Show resolved Hide resolved
src/cache/intern.rs Outdated Show resolved Hide resolved
src/cache/intern.rs Show resolved Hide resolved
src/describe.rs Outdated Show resolved Hide resolved
@Pauan
Copy link
Contributor Author

Pauan commented Jul 19, 2019

For the cached case though we'd need to check both the pointer and the length to see if they're both zero.

I don't think there will be any issue: it's guaranteed that non-undefined JsValue will never have 0 as their idx, so the length will only be 0 if it is in fact None.

I agree that we should have tests for it.

EDIT: Oh, wait, you're talking about the OptionFromWasmAbi. Yeah, you're right, it needs to be updated, very good catch!

@Pauan
Copy link
Contributor Author

Pauan commented Jul 19, 2019

Wait, nevermind, it doesn't support receiving cached strings in the first place, they use completely different systems, so there shouldn't be any problem with OptionFromWasmAbi.

@alexcrichton
Copy link
Contributor

Oh I'm mostly worried about this block which handles outgoing optional slices (including strings). That's only checking {ptr} when to handle cached strings it'd also need to handle {len} being zero

@Pauan
Copy link
Contributor Author

Pauan commented Jul 19, 2019

Oh I'm mostly worried about this block which handles outgoing optional slices (including strings).

That code path shouldn't ever be called. When caching is enabled, it will always go through NonstandardOutgoing::CachedString for all string types (&str, String, Option<&str>, and Option<String>).

@Pauan
Copy link
Contributor Author

Pauan commented Jul 19, 2019

Okay, I think this is ready for merging. The CI failures seem spurious, and I added in CI tests for enable-interning.

@alexcrichton
Copy link
Contributor

Works for me, thanks again for pushing on this!

@alexcrichton alexcrichton merged commit 68a1519 into rustwasm:master Jul 22, 2019
@Pauan Pauan deleted the cache branch July 22, 2019 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants