-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CursorValues Decoupling Cursor Data from Cursor Position #7855
Add CursorValues Decoupling Cursor Data from Cursor Position #7855
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
rows: Rows, | ||
fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please document what these functions mean (even though you could argue they are obvious -- it might help to explain what is expected if l_idx > len
, for example.
pub trait FieldArray: Array + 'static { | ||
type Values: FieldValues; | ||
type Values: CursorValues; | ||
|
||
fn values(&self) -> Self::Values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the overload of the use of the term values
(e.g. for Dictionary arrays) perhaps we could rename this to cursor_values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Member functions naturally take precedence over trait methods, so should ambiguity arise, it will necessitate the explicit form CursorArray::values(...)
and so I don't think this is an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't thinking about compiler ambiguities, more like cognitive overload for future readers. I agree it is not critical however
} | ||
} | ||
|
||
/// An [`Array`] that can be converted into [`FieldValues`] | ||
/// An [`Array`] that can be converted into [`CursorValues`] | ||
pub trait FieldArray: Array + 'static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the use of Field
confusing here -- what about renaming this CursorArray
or IntoCursorValues
?
/// | ||
/// Note: comparing cursors with different `SortOptions` will yield an arbitrary ordering | ||
#[derive(Debug)] | ||
pub struct FieldCursor<T: FieldValues> { | ||
pub struct ArrayValues<T: CursorValues> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice naming change
} | ||
} | ||
|
||
/// A collection of sorted, nullable [`FieldArray`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// A collection of sorted, nullable [`FieldArray`] | |
/// A collection of sorted, nullable [`CursorValues`] |
@@ -89,7 +89,7 @@ pub(crate) struct SortPreservingMergeStream<C> { | |||
batch_size: usize, | |||
|
|||
/// Vector that holds cursors for each non-exhausted input partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Vector that holds cursors for each non-exhausted input partition | |
/// Cursors for each input partition. `None` means the input is exhausted |
Which issue does this PR close?
Closes #.
Rationale for this change
#7798 proposed making the cursors themselves sliceable, this resulted in a potentially surprising interface where slicing a cursor would reset its position. This is necessary because a cascading merge sort needs to re-visit rows from previous passes.
Instead of introducing a notion of cursor slicing, this PR instead separates the cursor out from the cursor's values. This in turn will allow constructing cursors multiple times on the same set of values, and potentially adding the ability to slice said values. I think this will be easier to follow.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No, all of these details are crate private