New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add CursorValues Decoupling Cursor Data from Cursor Position #7855

Merged

tustvold merged 5 commits into apache:main from tustvold:decouple-cursor-storage-from-cursor

Oct 19, 2023

Contributor

tustvold commented Oct 18, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

#7798 proposed making the cursors themselves sliceable, this resulted in a potentially surprising interface where slicing a cursor would reset its position. This is necessary because a cascading merge sort needs to re-visit rows from previous passes.

Instead of introducing a notion of cursor slicing, this PR instead separates the cursor out from the cursor's values. This in turn will allow constructing cursors multiple times on the same set of values, and potentially adding the ability to slice said values. I think this will be easier to follow.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

No, all of these details are crate private


          Decouple cursor storage

e784be0

tustvold changed the title ~~Decouple cursor storage~~ Add CursorValues Decoupling Cursor Data from Cursor Position

tustvold added 2 commits

October 18, 2023 12:53


          Fix doc

22961b2


          Format

26266d9

alamb approved these changes

View reviewed changes

Contributor

alamb left a comment

Thank you @tustvold (and by extension @wiedld ). I think this improves readability significantly.

I left some comment suggestions to make it even better

It is great to see motion here

datafusion/physical-plan/src/sorts/cursor.rs


		rows: Rows,
		fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool;

Contributor

alamb Oct 19, 2023

Could we please document what these functions mean (even though you could argue they are obvious -- it might help to explain what is expected if l_idx > len, for example.

datafusion/physical-plan/src/sorts/cursor.rs Show resolved Hide resolved

datafusion/physical-plan/src/sorts/cursor.rs Show resolved Hide resolved

datafusion/physical-plan/src/sorts/cursor.rs Show resolved Hide resolved

datafusion/physical-plan/src/sorts/cursor.rs

               pub trait FieldArray: Array + 'static {
-                  type Values: FieldValues;
+                  type Values: CursorValues;
                   fn values(&self) -> Self::Values;

Contributor

alamb Oct 19, 2023

given the overload of the use of the term values (e.g. for Dictionary arrays) perhaps we could rename this to cursor_values

Contributor Author

tustvold Oct 19, 2023

Member functions naturally take precedence over trait methods, so should ambiguity arise, it will necessitate the explicit form CursorArray::values(...) and so I don't think this is an issue

Contributor

alamb Oct 20, 2023

I wasn't thinking about compiler ambiguities, more like cognitive overload for future readers. I agree it is not critical however

datafusion/physical-plan/src/sorts/cursor.rs Outdated Show resolved Hide resolved

datafusion/physical-plan/src/sorts/cursor.rs Outdated

                   }
               }
-              /// An [`Array`] that can be converted into [`FieldValues`]
+              /// An [`Array`] that can be converted into [`CursorValues`]
               pub trait FieldArray: Array + 'static {

Contributor

alamb Oct 19, 2023

I found the use of Field confusing here -- what about renaming this CursorArray or IntoCursorValues?

datafusion/physical-plan/src/sorts/cursor.rs

    
              ///

              /// Note: comparing cursors with different `SortOptions` will yield an arbitrary ordering

              #[derive(Debug)]

              pub struct FieldCursor<T: FieldValues> {

              pub struct ArrayValues<T: CursorValues> {

Contributor

alamb Oct 19, 2023

This is a nice naming change

datafusion/physical-plan/src/sorts/cursor.rs Outdated

+                  }
+              }
+              /// A collection of sorted, nullable [`FieldArray`]

Contributor

alamb Oct 19, 2023

Suggested change

      
            /// A collection of sorted, nullable [`FieldArray`]
          
            /// A collection of sorted, nullable [`CursorValues`]

datafusion/physical-plan/src/sorts/merge.rs Outdated

		@@ -89,7 +89,7 @@ pub(crate) struct SortPreservingMergeStream<C> {
		batch_size: usize,

		/// Vector that holds cursors for each non-exhausted input partition

Contributor

alamb Oct 19, 2023

Suggested change

      
                /// Vector that holds cursors for each non-exhausted input partition
          
                /// Cursors for each input partition. `None` means the input is exhausted

wiedld approved these changes

View reviewed changes

wiedld mentioned this pull request

feat(7181): add cursor slicing #7798

Closed

tustvold added 2 commits

October 19, 2023 19:34


          Review feedback

eb19ff0


          Tweak ByteArrayValues::value

611c25c

tustvold merged commit 37d6bf0 into apache:main

22 checks passed

matthewgapp mentioned this pull request

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet