Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalar: introduce Datum struct to have null-last Ord behavior #4272

Closed
skyzh opened this issue Jul 28, 2022 · 10 comments · Fixed by #9816
Closed

scalar: introduce Datum struct to have null-last Ord behavior #4272

skyzh opened this issue Jul 28, 2022 · 10 comments · Fixed by #9816
Assignees

Comments

@skyzh
Copy link
Contributor

skyzh commented Jul 28, 2022

No description provided.

@skyzh
Copy link
Contributor Author

skyzh commented Jul 28, 2022

while #4263 only changes the key encoding, we should revisit all places that we store Row and Datum, so that it is consistent with the behavior of the storage. Normally this would happen in BTreeMap<Row, Row>.

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Jul 29, 2022

I guess this means Datum should not have Ord, or we have to wrap them as NullsLargestDatum and NullsSmallestDatum to support nulls first and nulls last in the future. This does not take collation into account as well... (Collation is part of data type rather than order)

@skyzh
Copy link
Contributor Author

skyzh commented Jul 29, 2022

Also sounds reasonable to me... We should rethink how we directly use Row / Datum to do comparisons...

@BugenZhao
Copy link
Member

Datum contains the data type tag and it actually seems kind of meaningless to implement Ord on it, which allows comparing datums with different types, not to mention that we may not be able to customize that 🤣 #477.

I'm not sure whether there are many cases for comparing Row. With the new schemaless row format introduced by @yuhao-su, we'll always need the row schema to make comparisons. We may solve this issue at that time by the way.

@yuhao-su
Copy link
Contributor

yuhao-su commented Jul 31, 2022

I'm not sure whether there are many cases for comparing Row

Ideally no. But we may be using BtreeMap as a kv index in our system somewhere (we may replace them with other data structures like HashMap).

The comparison between Rows should be avoided. If we want a order, we should encode it as memcomparable

@github-actions
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@stdrc
Copy link
Member

stdrc commented Mar 16, 2023

Working on NULLS FIRST/LAST in #8485, I feel that we'd better introduce a dedicated type for Datum (either struct or enum is OK). The key point is to disallow comparing on Datums without an OrderType. If we don't, NULLS FIRST/LAST support can be easily broken by any unintentional comparison on Datums which are actually Options, and that cause a chaos in our system in regards to row ordering (which in fact is our current status🥵).

@stdrc stdrc reopened this Mar 16, 2023
@stdrc stdrc self-assigned this Mar 16, 2023
@stdrc
Copy link
Member

stdrc commented May 15, 2023

I feel that we'd better introduce a dedicated type for Datum (either struct or enum is OK).

Had a try on this, the result is that there're too many changes need to be made. Then I came up with another idea.

The key point is to disallow comparing on Datums without an OrderType.

Since the key point is to avoid automatical derivation of Ord caused by Option, we can simply don't implement Ord for ScalarImpl. The idea is actually quite consistent, we don't want Datum to be ordered without an OrderType, so we may also don't want ScalarImpl to be ordered without an OrderType. The difference is that the default impl of Ord for Datum is incorrect (ascending nulls first) but the one for ScalarImpl is correct (ascending).

@stdrc
Copy link
Member

stdrc commented May 15, 2023

Any thoughts? cc @BugenZhao

@BugenZhao
Copy link
Member

so we may also don't want ScalarImpl to be ordered without an OrderType

Makes sense to me. 👍 We may introduce several thin wrappers instead for ScalarImpl, Option<ScalarImpl>, and Vec<Option<ScalarImpl>> to provide a default ASC NULLS LARGEST Ord implementation for convenience if the caller really don't care about the order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants