-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial interning implementation #1612
Conversation
I think this is a good start yeah but I'm personally mostly interested in how this hooks up into the wasm-bindgen crate and CLI moreso than the cache itself. It seems like we can easily go hog wild with configuration settings and implementations of the cache but hammering out the details of how the CLI/ABIs work I think will be pretty important too |
Right, I'm just taking it one step at a time. The next step is to hook it into the |
mk makes sense! I figure this may make the most sense on the wasm-bindgen crate itself where a feature would basically just change the implementation of |
Yeah, that's what I was thinking. Basically, have And then |
Okay, so because of circular dependencies I can't make Here are the profiling results. First, without interning: Now with interning: The performance went from Unfortunately, the increase wasn't as big as I was hoping for. I need to investigate why the By the way, I had to implement |
Okay, so the issue is that it was just trying to cache too many useless strings, so I took a different approach. Rather than automatically caching all strings, instead the user has to manually call an This will add the string to the cache, and so when the string is sent to JS, it will then pick it up from the cache. By carefully choosing which strings go into the cache and which don't, I was able to get incredible performance gains. With interning completely disabled: ( With interning enabled but not used: ( With all strings interned: ( With specific strings interned: ( So in the end, by interning only the strings which are commonly used, it gained And as you can see in the last chart, decoding is very fast ( In fact, because everything is so fast, the top 7 items are all DOM built-ins, which means the app is going as fast as possible, and the browser is now the bottleneck. In addition, the Major GC doesn't even include decoding, which means the browser is doing 0 garbage collection of strings. Amazing. |
Sorry I've been pretty busy but this all looks great to me! Some high-level thoughts I might have are:
|
Actually, any crate in the dependency graph (including the root/application crate) can very easily turn it off, they just need to use
I don't think so. It's true that the typical way of doing things (using And there isn't an "enable-interning" feature, so it's not possible for there to be any conflicts. And the API is exactly the same between the interning and not interning modes, so there's no compile errors. So could you explain what sorts of surprising results you think might happen?
Sure, but what bits should it use? Maybe it should just return |
(Originally I intended to have full automatic string interning on by default, but after seeing the benchmarks it's clear that manual interning is much faster, so I'm okay with making the feature off-by-default, but I still want to know why you think it would be problematic) |
Ah yeah so I was halfway done writing my original comment before I realized the feature was to disable, not an on-by-default feature which enabled. So to be clear if the feature were The feature is, however, off-by-default and instead worded as disabling interning. Cargo features are meant to be an additive set, however, due to the unioning behavior to compile crates only once. This means that if a crate relies on interning for reasonable performance, another crate may actually disable interning and the original crate has no course of action. In other words, the "disable" form is removing a feature rather than adding it which goes against how shared dependencies work in Cargo today. That, plus the benchmark data, is basically why I think I'd prefer if this was an opt-in feature rather than on-by-default. For returning |
Right, I learned that the hard way with
That's absolutely true, but I don't think that's a problem, in fact I think that's desirable. I don't see any reason why a crate would want to disable interning, unless they are the root application crate. And in that case the root should definitely have the power to disable interning (if they find that it improves the performance of their app). So, although it's quite rare in general, I think in this very specific case a "negative feature" can be appropriate.
Yeah, I've come around to that as well. |
Ah ok well in any case, want to update this to opt-in and we can go ahead and merge? |
@alexcrichton So, while working on this, I noticed something: the |
Oh right yeah I was thinking about that when reading this but forgot to say we should update it. I think that this is always returning (effectively) |
If the string exists in the cache, then yeah it can return The problem is that if the string isn't in the cache then the And since |
Ah that's a good point! I'll be around sort of only spottily this week but feel free to ping me when you're around and we can hope it aligns! I'll be around-ish 9am-5pm CST. Otherwise though it's true we don't have any precedent for returning a "maybe owned" |
Now that I can build So, I significantly changed the implementation (before your review). Now the implementation is almost identical to the current string impl, except if the What that means is that there is now basically no performance difference regardless of whether interning is enabled or not:
~183% increase in performance sounds pretty great to me. And it doesn't regress the performance when not interning (which was a big problem with the old implementation). And the memory leak has been fixed. |
I rebased, and I also created a benchmark with the 1,000 most common English words. These are the results:
The cache is full and it has a lot of similar words, so it's a reasonably accurate stress test. In practice it will be a lot faster. The interesting things to note:
Given the results, I switched it to use HashMap. This gives a 7.8x speed boost for hits, no speed penalty for misses, infinite scalability, and smaller file size than the LRU cache. |
I just added in docs for the |
Also, the CI failures seem really weird, I have no clue what's going on with those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be sure to add tests to CI for this feature as well? I think that enabling this feature breaks Option<&str>
because the tests for "is none" only checks the pointer value, and if it's zero it's considered undefined
. For the cached case though we'd need to check both the pointer and the length to see if they're both zero.
I don't think there will be any issue: it's guaranteed that non-undefined I agree that we should have tests for it. EDIT: Oh, wait, you're talking about the |
Wait, nevermind, it doesn't support receiving cached strings in the first place, they use completely different systems, so there shouldn't be any problem with |
Oh I'm mostly worried about this block which handles outgoing optional slices (including strings). That's only checking |
That code path shouldn't ever be called. When caching is enabled, it will always go through |
Okay, I think this is ready for merging. The CI failures seem spurious, and I added in CI tests for |
Works for me, thanks again for pushing on this! |
This is just a rough draft so I can make sure I'm going in the right direction.
The way I designed it, you can change (at runtime) the max str len and whether the interning is enabled or disabled.
In addition, there's a
disabled
Cargo feature to completely wipe out the implementation at compile-time, so there's 0 overhead.The reason I went with a
disabled
feature (and not anenabled
feature) is that I expect the interning to be enabled by default.The cache size is hard-coded at 1,024 elements (I just guessed at what seemed like a reasonable number), but we can make that configurable later.
I took a look at the existing LRU crates, and I decided on uluru because it seemed to be the fastest and lightest weight (based on looking at the source code).
@alexcrichton does this seem reasonable?