Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient trie lookup for boolean Unicode properties #33098

Merged
merged 3 commits into from
May 23, 2016

Conversation

raphlinus
Copy link
Contributor

Replace binary search of ranges with trie lookup using leaves of
64-bit bitmap chunks. Benchmarks suggest this is approximately 10x
faster than the bsearch approach.

Replace binary search of ranges with trie lookup using leaves of
64-bit bitmap chunks. Benchmarks suggest this is approximately 10x
faster than the bsearch approach.
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nrc (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@raphlinus
Copy link
Contributor Author

@SimonSapin who I discussed this with on irc.

Somehow got in my head that >> 8 was the right shift for a chunk of 64.
Oops, sorry.
@alexcrichton
Copy link
Member

Neat! Out of curiosity, do you know what the impact this has on the compiled table sizes? (if any)

@raphlinus
Copy link
Contributor Author

@alexcrichton A very slight uptick, but almost a wash. By my calculations, the old table size is 29288 bytes and the new is 30451. If it were a concern, it'd be possible to get that back by special casing the sets with a small max value, but honestly I'd say code simplicity is more important.

@nrc
Copy link
Member

nrc commented Apr 19, 2016

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned nrc Apr 19, 2016
@samlh
Copy link

samlh commented Apr 19, 2016

What is the perf impact of switching to (c >> 6) - 0x20 and (c >> 12) - 0x10? It would reduce the space used a bit, but I don't know if it would lose too much of the perf gain.

@raphlinus
Copy link
Contributor Author

@samlh I just measured it, the performance difference is imperceptible. This makes sense, the subtraction probably happens in parallel with getting the address of r2 into a register. Probably save 500 bytes, I think there are 11 tables. Happy to make the change if people feel like it's worthwhile.

@@ -307,12 +307,114 @@ def emit_table(f, name, t_data, t_type = "&'static [(char, char)]", is_pub=True,
format_table_content(f, data, 8)
f.write("\n ];\n\n")

def emit_trie_lookup_range_table(f):
f.write("""
pub struct BoolTrie {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to have an high-level overview of what all these fields represent/how it actually represents the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, with some comments I think this is good to go though

Adds a comment which explains the trie structure, and also does a
little arithmetic on lookup (no measurable impact, looks like modern
CPUs do this arithmetic in parallel with the memory lookup to find the
node) to save a bit of space. As a result, the memory impact of the
compiled tables is within a couple hundred bytes of the old
bsearch-range structure.
@raphlinus
Copy link
Contributor Author

Some questions / comments.

The BoolTrie and tables are pub, following the pattern of the existing code, but this feels like not-great encapsulation. I'd prefer the impl details to be private, and just the query functions to be pub, but not sure if that would break anything. Thoughts?

Should I aggressively dead-code-eliminate the emit of the bsearch? It was useful to have both around during development, now I'm thinking only one is needed.

What's the feeling on tests? There are some in doc-tests, but coverage is low. For my own work, I've been doing some randomly-generated tests, but not sure how well that fits in here.

Should I do similar magic for case mapping? It's trickier because it's not just a bool, but possible. I'm thinking a different PR though.

@alexcrichton
Copy link
Member

The BoolTrie and tables are pub, following the pattern of the existing code, but this feels like not-great encapsulation.

Ah no need to worry much about this. If you can get it private that'd be great, but don't sweat it too much. This crate is unstable so we're fine to change the API at any time basically.

I thought the tables module was private regardless, but maybe it's not?

What's the feeling on tests? There are some in doc-tests, but coverage is low. For my own work, I've been doing some randomly-generated tests, but not sure how well that fits in here.

I wonder how hard it might be to just exhaustively test everything? In theory we could generate tests as part of the unicode.py script as well.

Should I do similar magic for case mapping? It's trickier because it's not just a bool, but possible. I'm thinking a different PR though.

Separate PR sounds good to me, but if it's also 10x faster we'd certainly love to have the change :)

@samlh
Copy link

samlh commented Apr 23, 2016

Another silly idea (not sure if worth doing):

We could combine all the u64 tables into one, where the first 32 values are as before, but can be reused as values for the other sets of leaves. This risks running out of indices for leaf nodes, but I think the current maximum total number of values is 249. The bigger question is if this would save enough bytes to be at all worth the effort :P.

Overall, I like the code as-is, though I agree that marking everything possible as private and removing old code would be nice.

@alexcrichton
Copy link
Member

@raphlinus did you wanna update this with some more tests? Or do you think this is good to go?

@raphlinus
Copy link
Contributor Author

@alexcrichton I'd lean toward "good to go." There isn't an obvious direction the tests should go that doesn't lead to bloat of the code-gen scripts.

@alexcrichton
Copy link
Member

@bors: r+ cfaf66c

Ok, sounds good to me, thanks again @raphlinus!

@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label May 21, 2016
@bors
Copy link
Contributor

bors commented May 21, 2016

⌛ Testing commit cfaf66c with merge c80d23d...

@bors
Copy link
Contributor

bors commented May 22, 2016

💔 Test failed - auto-mac-64-opt-rustbuild

@alexcrichton
Copy link
Member

@bors: retry

Umm... what? If that happens again I'll file a spurious issue

@bors
Copy link
Contributor

bors commented May 22, 2016

⌛ Testing commit cfaf66c with merge 54bfed7...

bors added a commit that referenced this pull request May 22, 2016
Efficient trie lookup for boolean Unicode properties

Replace binary search of ranges with trie lookup using leaves of
64-bit bitmap chunks. Benchmarks suggest this is approximately 10x
faster than the bsearch approach.
@bors
Copy link
Contributor

bors commented May 23, 2016

💥 Test timed out

@alexcrichton
Copy link
Member

@bors: retry

@bors
Copy link
Contributor

bors commented May 23, 2016

⌛ Testing commit cfaf66c with merge 4c6b6c2...

bors added a commit that referenced this pull request May 23, 2016
Efficient trie lookup for boolean Unicode properties

Replace binary search of ranges with trie lookup using leaves of
64-bit bitmap chunks. Benchmarks suggest this is approximately 10x
faster than the bsearch approach.
@bors bors merged commit cfaf66c into rust-lang:master May 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants