-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude sign bit from bitpacked encoding if all values are negative #2714
Comments
Hi, I want to tackle this issue. |
Great. I don't think anyone is actively working on this so feel free to create a PR. |
I think there is something wrong with the I tried to set breakpoint inside the The same happens for |
I'm pretty sure, at one point, it did not. We switched from guarding bitpacking with an environment variable ( Also, in related news, @broccoliSpicy has been working on bitpacking as well, to try and utilize the pack/unpack routines in the |
Hi @westonpace, how can I mitigate the test issue mentioned above ? have you found the fix for the test issue mention above? |
yeah, in a recent PR cleaning to enable DataType::UInt8 | DataType::UInt16 | DataType::UInt32 | DataType::UInt64 => {
if version >= LanceFileVersion::V2_1 {
let compressed_bit_width = compute_compressed_bit_width_for_non_neg(arrays);
Ok(Box::new(BitpackedForNonNegArrayEncoder::new(
compressed_bit_width as usize,
data_type.clone(),
)))
} else {
Ok(Box::new(BasicEncoder::new(Box::new(
ValueEncoder::default(),
))))
}
}
// for signed integers, I intend to make it a cascaded encoding, a sparse array for the negative values and very wide(bit-width) values,
// then a bitpacked array for the narrow(bit-width) values, I need `BitpackedForNeg` to be merged first
DataType::Int8 | DataType::Int16 | DataType::Int32 | DataType::Int64 => {
if version >= LanceFileVersion::V2_1 {
let compressed_bit_width = compute_compressed_bit_width_for_non_neg(arrays);
Ok(Box::new(BitpackedForNonNegArrayEncoder::new(
compressed_bit_width as usize,
data_type.clone(),
)))
} else {
Ok(Box::new(BasicEncoder::new(Box::new(
ValueEncoder::default(),
))))
}
} for your testing purpose, you can omit the |
Hi @broccoliSpicy, thanks for your help. After integrating the code above, my test can use bitpack scheme. However, the test_bitpack_primitive keep failing. I tried to use the code from main branch (adding the code to Do I need to update the code to DataType:: Int32 => {
let params = bitpack_params(arrays[0].as_ref()).unwrap();
Ok(Box::new(BitpackedArrayEncoder::new(
params.num_bits,
params.signed,
//params.all_negative,
)))
} Currently, the decoder can only decode the first element correctly, other elements are 0. |
yeah, there is a mistake in
which caused the behavior you described:
I think you can try change this line to: src_idx += partial_bytes_written + to_next_byte; and the issue you described should be fixed. However, I think there are also other issues in the current bitpack implementation, for example, in the encode: lance/rust/lance-encoding/src/encodings/physical/bitpack.rs Lines 128 to 179 in f763d42
the encoder might also need to handle the case when lance/rust/lance-encoding/src/encodings/physical/basic.rs Lines 190 to 234 in f763d42
|
actually, there might be some other issues come up after doing this kind of encoding selection, I filled a issue #2927 and I am currently trying to find a way to fix it. |
@broccoliSpicy I tried to integrate the fix you mentioned above, but there is an error |
for the error lance/rust/lance-encoding/src/encodings/physical/basic.rs Lines 190 to 234 in f763d42
|
In #2662 we added support for bitpacking signed integers in LanceV2. In #2696, an optimization was made to exclude the sign bit if all the values for a signed type are positive.
We can make a further optimization to exclude the sign bit if all the values are negative.
The way to do this could be to:
lance/protos/encodings.proto
Line 176 in 35e3862
bitpack_params_for_signed_type
add logic to determine if all values are negative, and if so, don't add the sign bit to the number of bits. We can also modify the return typeBitpackParams
as suggested here:lance/rust/lance-encoding/src/encodings/physical/bitpack.rs
Line 79 in b9990d9
lance/rust/lance-encoding/src/encodings/physical/bitpack.rs
Lines 440 to 445 in b9990d9
The text was updated successfully, but these errors were encountered: