-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SSE2 SIMD in IndexTable::find_entry #176
Conversation
5427684
to
54d8e85
Compare
src/index.rs
Outdated
let current = _mm_unpacklo_epi32(first_two, last_two); | ||
let cmp = _mm_movemask_epi8(_mm_cmpeq_epi32(current, target)); | ||
if cmp != 0 { | ||
// We found a match, let's validate it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we can simply return entry at i + cmp.leading_zeroes() / 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky thing is that the shift might be of 30 bits i.e. are left with a 34 bit number that we cast to 32 bits. So, this means we need to check again, sadly. But we can definitely be smarter and skip not useful comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the idea to just skip two bits of check in this case? (but then need to shift 32 bits instead of 30, so we check the 32 bit of the start, the 2 last unchecked bits will be checked with the rest of the key on value access).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly for the skipping. Currently we check the last 32 bits, and check again the remaining 2 bits before returning. We can indeed simplfy the operations by just checking the first 32 bits (i.e. the high weight part of the entry if I am not mistaken).
Now, there is the question of which two bits are the most discriminative, the higher 2 or the lower 2?) Because the partial_key
is build with u64::from_be_bytes
I would tend to think the higher two might be more relevant but I am not sure. What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spend time remembering/reunderstanding this today. might still be wrong, but my conclusion was the higher 2 so we in case of index 16 we would need to move >> 32 instead of 30. (the other way we would miss checking key content as the part of the key stored and check in index is 26 bytes containing the last two bits, then we indeed got 16 bit check from the access to the index, 32 from the index key content and 26 * 8 from the value table, should be 256).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'm going to update the PR to use the high 32 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous version was actually more correct. We can't assume that partial key is always 32 bits, so shifts are still needed. We can ignore the last two bits for the cases when it is 33 or 34 bytes however. A bit of explanation:
Keys are always 256 bits. Key bits [48..256] are stored in the value table. So if find_entry
returns a match, these bits are compared to make sure the rest of the key matches . Therefore find_entry
must only match the first 48 bits of the key. Current version actually matches 50, so that's a bit of an overkill.
The first index_bits
of the key are simply the offset of the chunk in the file. All entries in the chunk share the same first index_bits
of the key. They are not stored in the entries. Bits [index_bits..50] is what's actually stored in the entry and that's called partial_key
. The rest of the entry bits is address
. As index_bits
grows, so does address_bits
, and partial_key
shrinks. But the chunk index bits grows as well, so we can always match 50 bits of the key in the index file.
Indexes start with index_bits==16
. With index_bits==16
partial_key
is 34, but the last two bits can be ignored and we only need to match 48 bits.
With index_bits==21
however partial_key
is only 29 bits, so the last 3 of the first 32 entry bits should be excluded from comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your detailed explanation. I indeed forgot that the index_bit can grow further. I have applied @cheme comments to fix the implementation.
I also added a test to cover this case because it was not already done by the existing tests.
Great stuff. Curious to see if it has any affect on the query performance. |
@arkpar @cheme Thank you for your reviews. I have written a version 2 using the assumption that the high 32bits are part of the partial key. I ran a quick benchmark on my laptop. The SSE2 version seems to be slightly faster but the numbers are not stable enough to tell more than that. We might also try an AVX version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, could do :
diff --git a/src/index.rs b/src/index.rs
index 5bd78b9..f5e94bf 100644
--- a/src/index.rs
+++ b/src/index.rs
@@ -235,12 +235,14 @@ impl IndexTable {
Ok(try_io!(Ok(&map[offset..offset + CHUNK_LEN])))
}
+ #[cfg(target_feature = "sse2")]
fn find_entry(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
- if cfg!(target_feature = "sse2") {
- self.find_entry_sse2(key_prefix, sub_index, chunk)
- } else {
- self.find_entry_base(key_prefix, sub_index, chunk)
- }
+ self.find_entry_sse2(key_prefix, sub_index, chunk)
+ }
+
+ #[cfg(not(target_feature = "sse2"))]
+ fn find_entry(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
+ self.find_entry_base(key_prefix, sub_index, chunk)
}
#[cfg(target_feature = "sse2")]
@@ -284,6 +286,7 @@ impl IndexTable {
(Entry::empty(), 0)
}
+ #[cfg(any(not(target_feature = "sse2"), test))]
fn find_entry_base(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
assert!(chunk.len() >= CHUNK_ENTRIES * 8);
let partial_key = Entry::extract_key(key_prefix, self.id.index_bits());
@@ -608,19 +611,19 @@ mod test {
#[test]
fn test_find_entries() {
or expose a dummy find_entry_sse2
function when no feature sse2 (method may be missing if no traget).
No description provided.