Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dictionary encoded array #228

Conversation

Alex-PLACET
Copy link
Collaborator

No description provided.

{

template <typename KeysArray, typename ValuesArrayBitmapRange>
class dictionary_bitmap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class is required because of the implementation of bitmap_begin() in the CRTP base class which assumes that each array class has a bitmap attribute:

    template <class D>
    auto array_crtp_base<D>::bitmap_begin() const -> const_bitmap_iterator
    {
        return derived_cast().get_bitmap().begin();
    }

With an alternative implementation not assuming the existence of a bitmap member, but only a way to iterate over a bitmap (either "pysical" or "logical"), such as

    template <class D>
    auto array_crtp_base<D>::bitmap_begin() const -> const_bitmap_iterator
    {
        return derived_cast().bitmap_begin_impl();
    }

where bitmap_begin_impl is defined as return m_bitmap.begin() in the classes with a bitmap attribute, and as return validity_iterator(...) in the dictionary_encoded_layout , we should be able to remove this class and the validity_iterator class, and get something really close to the previous implementation.

This requires defining boilerplate code for bitmap_begin_impl (and bitmap_end_impl) in all array classes but dicitonary_encoded_layout, but that can be done with a macro, and refactored inside an intermediate CRTP later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

constexpr size_t bitmap_buffer_index = 0;
SPARROW_ASSERT_TRUE(arrow_proxy.buffers().size() > bitmap_buffer_index);
const auto bitmap_size = arrow_proxy.length() + arrow_proxy.offset();
return bitmap_type(arrow_proxy.buffers()[bitmap_buffer_index].data(), bitmap_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implemetation is common to all the array classes but the dictionary encoded layout. I think it could be factorize into a free function somewhere (probably in the array_base.hpp file, inside the detail namespace).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more layouts with a special bitmap btw.
The run-end-encoded and the sparse and dense union also need a special bitmap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@Alex-PLACET Alex-PLACET force-pushed the add_dictionary_encoded_array_after_big_merge branch from ca5d541 to 29e41c5 Compare October 8, 2024 13:15
@Alex-PLACET Alex-PLACET marked this pull request as ready for review October 8, 2024 13:17
@JohanMabille JohanMabille merged commit ffc06fb into man-group:main Oct 8, 2024
55 checks passed
@Alex-PLACET Alex-PLACET deleted the add_dictionary_encoded_array_after_big_merge branch October 9, 2024 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants