Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11349: [Rust] Add from_iter_values to create arrays from (non null) values #9293

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions rust/arrow/src/array/array_primitive.rs
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,32 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
let offset = i + self.offset();
unsafe { *self.raw_values.as_ptr().add(offset) }
}

/// Creates a PrimitiveArray based on an iterator of values without nulls
pub fn from_iter_values<I: IntoIterator<Item = T::Native>>(iter: I) -> Self {
let iter = iter.into_iter();
let (_, data_len) = iter.size_hint();
let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.

let mut val_buf = MutableBuffer::new(
data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
);

iter.for_each(|item| {
val_buf.push(item);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we wait for #9235? It introduces a method to extend a MutableBuffer out of an iterator of Native types. It will also allow to drop the .expect("Iterator must be sized"), since it will be possible to extend it out of unsized iterators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure 👍 sounds like a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9235 is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao is the new usage with extend what you meant?

We also have to know the final count for creating the array, are you thinking of calculating that with val_buf.len() / mem::size_of::<<T as ArrowPrimitiveType>::Native>() or is there an other way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what I meant: it should now be possible to do

let values = vec![0i32, 1];

let values = values.iter().map(|x| x + 1);

let buffer: Buffer: values.collect();

wrt to the len: that is a good point that I have had not thought about. 👍

One option is to do what you wrote. Another could be (I haven't tried to compile this, as Fn could become FnMut):

let mut count = 0;
let iter = iter.map(|x| {
   count += 1;
x
});

I would probably have taken your idea, though :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah with just Buffer and collect this is even more clean. I adapted the code with the new API.


let data = ArrayData::new(
T::DATA_TYPE,
data_len,
None,
None,
0,
vec![val_buf.into()],
vec![],
);
PrimitiveArray::from(Arc::new(data))
}
}

impl<T: ArrowPrimitiveType> Array for PrimitiveArray<T> {
Expand Down Expand Up @@ -820,6 +846,18 @@ mod tests {
}
}

#[test]
fn test_primitive_from_iter_values() {
// Test building a primitive array with from_iter_values

let arr: PrimitiveArray<Int32Type> = PrimitiveArray::from_iter_values(0..10);
assert_eq!(10, arr.len());
assert_eq!(0, arr.null_count());
for i in 0..10i32 {
assert_eq!(i, arr.value(i as usize));
}
}

#[test]
#[should_panic(expected = "PrimitiveArray data should contain a single buffer only \
(values buffer)")]
Expand Down
40 changes: 40 additions & 0 deletions rust/arrow/src/array/array_string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,36 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
pub(crate) fn from_opt_vec(v: Vec<Option<&str>>) -> Self {
v.into_iter().collect()
}

/// Creates a `GenericStringArray` based on an iterator of values without nulls
pub fn from_iter_values<Ptr, I: IntoIterator<Item = Ptr>>(iter: I) -> Self
where
Ptr: AsRef<str>,
{
let iter = iter.into_iter();
let (_, data_len) = iter.size_hint();
let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.

let mut offsets =
MutableBuffer::new((data_len + 1) * std::mem::size_of::<OffsetSize>());
let mut values = MutableBuffer::new(0);

let mut length_so_far = OffsetSize::zero();
offsets.push(length_so_far);

for i in iter {
let s = i.as_ref();
length_so_far += OffsetSize::from_usize(s.len()).unwrap();
offsets.push(length_so_far);
values.extend_from_slice(s.as_bytes());
}
let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
.len(data_len)
.add_buffer(offsets.into())
.add_buffer(values.into())
.build();
Self::from(array_data)
}
}

impl<'a, Ptr, OffsetSize: StringOffsetSizeTrait> FromIterator<Option<Ptr>>
Expand Down Expand Up @@ -411,6 +441,7 @@ mod tests {
);
}

#[test]
fn test_string_array_from_iter() {
let data = vec![Some("hello"), None, Some("arrow")];
// from Vec<Option<&str>>
Expand All @@ -424,4 +455,13 @@ mod tests {
assert_eq!(array1, array2);
assert_eq!(array2, array3);
}

#[test]
fn test_string_array_from_iter_values() {
let data = vec!["hello", "hello2"];
let array1 = StringArray::from_iter_values(data.iter());

assert_eq!(array1.value(0), "hello");
assert_eq!(array1.value(1), "hello2");
}
}