Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use a custom hash set for interning #50959

Closed
wants to merge 3 commits into from
Closed

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented May 22, 2018

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 22, 2018
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:04:40] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:40] tidy error: /checkout/src/librustc_data_structures/interner.rs: incorrect license
[00:04:41] some tidy checks failed
[00:04:41] 
[00:04:41] 
[00:04:41] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:41] 
[00:04:41] 
[00:04:41] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:41] Build completed unsuccessfully in 0:01:48
[00:04:41] Build completed unsuccessfully in 0:01:48
[00:04:41] Makefile:79: recipe for target 'tidy' failed
[00:04:41] make: *** [tidy] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0b67176e
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors
Copy link
Contributor

bors commented May 22, 2018

☔ The latest upstream changes (presumably #50520) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member

@Zoxc, how much faster do you expect this to be?

@pietroalbini pietroalbini added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 28, 2018
@shepmaster
Copy link
Member

Ping from triage, @Zoxc ! You have a review question and merge conflicts.

@Zoxc
Copy link
Contributor Author

Zoxc commented Jun 4, 2018

@bors try

@bors
Copy link
Contributor

bors commented Jun 4, 2018

⌛ Trying commit c124143 with merge 8a5761a...

bors added a commit that referenced this pull request Jun 4, 2018
[WIP] Use a custom hash set for interning

r? @michaelwoerister
@bors
Copy link
Contributor

bors commented Jun 4, 2018

☀️ Test successful - status-travis
State: approved= try=True

@Zoxc
Copy link
Contributor Author

Zoxc commented Jun 4, 2018

@Mark-Simulacrum Can I get a perf run here?

@Mark-Simulacrum
Copy link
Member

Perf run queued.

@michaelwoerister
Copy link
Member

Looks very promising!

r? @eddyb

@eddyb
Copy link
Member

eddyb commented Jun 9, 2018

I am not qualified to review this. r? @gankro @bluss @nikomatsakis

@rust-highfive rust-highfive assigned Gankra and unassigned eddyb Jun 9, 2018
@Zoxc
Copy link
Contributor Author

Zoxc commented Jun 10, 2018

std's HashSet doesn't have APIs which allow interning using a single hash lookup, so the hash set introduced here has an unfair advantage.

@eddyb
Copy link
Member

eddyb commented Jun 11, 2018

Didn't @gankro propose an API where you could do this, although misusing it would put the hashmap in an unusual state (not unsafe, just potentially misbehaving)?

@pietroalbini pietroalbini added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 18, 2018
@pietroalbini pietroalbini removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jun 18, 2018
@pietroalbini
Copy link
Member

Ping from triage @gankro! This PR needs your review.

@pietroalbini pietroalbini added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 25, 2018
@pietroalbini
Copy link
Member

Ping from triage! This PR needs a review, can @gankro or someone else from @rust-lang/compiler review this?

@stokhos stokhos added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). I-nominated and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 29, 2018
@stokhos
Copy link

stokhos commented Jun 29, 2018

Ping from triage @gankro, @rust-lang/compiler, will someone have time to review this PR?

@nikomatsakis
Copy link
Contributor

Discussed in the @rust-lang/compiler meeting. Everybody I think generally agreed that all things being equal using libstd is better, but it's also ok to land this in the meantime if it's a solid win (as it appears to be).

@nikomatsakis nikomatsakis added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed I-nominated S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jul 5, 2018
@nikomatsakis
Copy link
Contributor

r? @michaelwoerister

Copy link
Member

@michaelwoerister michaelwoerister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a first pass over this and left some notes. I'll do another pass tomorrow. There are some things that I haven't looked at at all yet.

}
}

pub fn make_hash<T: ?Sized, S>(hash_state: &S, t: &T) -> u64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could directly return a u32

const ENTRIES_PER_GROUP: usize = 5;

#[repr(align(64), C)]
pub struct Group {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Bucket" is the more canonical term.


#[inline(always)]
fn set(&mut self, pos: usize, hash: u32, value: u64) {
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a debug_assert!(pos < ENTRIES_PER_GROUP) here, please?

#[inline(always)]
fn search_with<K, F: FnMut(&K) -> bool>(&self, eq: &mut F, hash: u32) -> Option<(usize, bool)> {
for i in 0..ENTRIES_PER_GROUP {
let h = unsafe { *self.hashes.get_unchecked(i) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the compiler is smart enough to omit the bounds check by itself in cases like this.

}

Table {
group_mask: group_count.wrapping_sub(1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't wrapping around here always signify a bug?

//println!("capacity1 {} capacity {}", capacity1, capacity);
assert!(capacity < capacity2);

for i in 0..group_count {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use Global.alloc_zeroed() above, you don't have to clear things manually here. It might also be faster.


#[inline(always)]
fn iter<F: FnMut(u32, u64)>(&self, f: &mut F) {
for i in 0..ENTRIES_PER_GROUP {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do for i in 0 .. self.size here you can avoid always setting the most significant bit it the hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM won't unroll the loop in that case

Copy link
Member

@michaelwoerister michaelwoerister Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but you'll have fewer iterations and you can omit the h != 0 check. It might be a win overall.

pub struct Group {
hashes: [u32; ENTRIES_PER_GROUP],
size: u32,
values: [u64; ENTRIES_PER_GROUP],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this on a 32-bit system? Is there a reason not to make this generic? As in:

struct Group<T: Sized> {
    hashes: [u32; ENTRIES_PER_GROUP],
    size: u32,
    values: [*const T; ENTRIES_PER_GROUP],
}

group_mask: usize,
size: usize,
capacity: usize,
groups: Unique<Group>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a Vec or Box<[]> for this?

//let capacity = (capacity1 * 10 + 10 - 1) / 11;
let capacity = (capacity1 * 10 + 10 - 1) / 13;
//println!("capacity1 {} capacity {}", capacity1, capacity);
assert!(capacity < capacity2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this capacity calculation. It seems to be adapted from what the std hashmap does but it could need better variable names and comments explaining the reasoning behind it.


const ENTRIES_PER_GROUP: usize = 5;

#[repr(align(64), C)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the #[repr(..)]?

@michaelwoerister
Copy link
Member

OK, I think I have listed all my concerns. Thanks for the PR, @Zoxc! I like the general approach and that hash map implementation is simple. I think it could do with less unsafe code though.

@TimNN TimNN added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 10, 2018
@TimNN
Copy link
Contributor

TimNN commented Jul 17, 2018

Ping from triage, @Zoxc: Some changes have been requested to this PR.

@pietroalbini
Copy link
Member

Thank you for this PR @Zoxc! Unfortunately we haven't heard from you on this in a while, so I'm closing the PR to keep things tidy. Don't worry though, if you'll have time again in the future please reopen this PR, we'll be happy to review it again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.