Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data type discrepancies in choose_array_encoder function #2927

Open
broccoliSpicy opened this issue Sep 24, 2024 · 0 comments
Open

data type discrepancies in choose_array_encoder function #2927

broccoliSpicy opened this issue Sep 24, 2024 · 0 comments
Assignees

Comments

@broccoliSpicy
Copy link
Contributor

currently, there are many data type discrepancies in choose_array_encoder which makes encoding selection logic harder and bug-prone to implement, for example:

in this code, the array's type is still one of BINARY_TYPES but the data_type parameter of choose_array_encoder has changed to UInt8

else if BINARY_DATATYPES.contains(arrays[0].data_type()) {
if let Some(byte_width) = check_fixed_size_encoding(arrays, version) {
// use FixedSizeBinaryEncoder
let bytes_encoder = Self::choose_array_encoder(
arrays,
&DataType::UInt8,
data_size,
false,
version,
None,
)?;

same thing happened here:

fn default_binary_encoder(
arrays: &[ArrayRef],
data_type: &DataType,
field_meta: Option<&HashMap<String, String>>,
data_size: u64,
version: LanceFileVersion,
) -> Result<Box<dyn ArrayEncoder>> {
let bin_indices_encoder =
Self::choose_array_encoder(arrays, &DataType::UInt64, data_size, false, version, None)?;
let compression = field_meta.and_then(Self::get_field_compression);
let bin_encoder = Box::new(BinaryEncoder::new(bin_indices_encoder, compression));
if compression.is_none() && Self::can_use_fsst(data_type, data_size, version) {
Ok(Box::new(FsstArrayEncoder::new(bin_encoder)))
} else {
Ok(bin_encoder)
}
}

we need to have some coherence for encoding selection predicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants