-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_by: collation-aware grouping #1476
Conversation
Thanks again! Please keep em coming. Are you all using this at Planetscale or are these hobby fixes? We love what you all have done with Vitess and are doing with Planetscale core. I've mentioned you all a number of times in some of our more consistently popular blogs. For instance: https://www.dolthub.com/blog/2021-09-17-database-version-control :-) |
Oh wow! This is another great find. Thank you @vmg!! 🙏 |
We're evaluating using Before we get there, we need My assumption is that you are also interested on behaving as close as MySQL as possible, is that correct? For things like this specific bug, I'd like to discuss those with you, and whether you'd like those fixes too. Should I open an issue for MySQL compatibility quirks? What works best for you?
I know! I read those and found them very interesting! :) |
(build should be green now; I introduced a typo in my refactoring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just a few more test additions and this is good to merge!
sql/plan/group_by_test.go
Outdated
{float64(8888)}, | ||
} | ||
|
||
require.Equal(expected, rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a few additional tests for ENUM
and SET
as well. I'd expect them to work as, although they're collation-aware, they handle their values as integers, so I'd expect it to just work. But we should verify the behavior anyway.
Great use case. We have a number of GMS customers using it this way.
We're about 99% certain that we want to mimic MySQL exactly. These should at least be issues that folks can refer to so if you find them, we can start there. If we should fix them, we'll fix them. Otherwise, we'll just leave the issue open with an explanation on why we're leaving the divergent behavior. |
Tests are updated! |
@vmg It looks like the tests are failing. |
@Hydrocharged: I'm afraid this is not related to my PR. Somebody broke |
@jcor11599 – can you take a look at the CI test failure above and see if it's related to your bindings change for prepared statements? I didn't immediately recognize the error, but happy to dig in and help figure it out if needed. |
Thanks for the quick fix @jcor11599! @vmg – once you pull James' latest change from main, it should get your branch fixed up. Thanks again for working with us on this great contribution! 🙏 |
Signed-off-by: Vicent Marti <vmg@strn.cat>
@Hydrocharged – heads up that the CI checks are all passing now the latest fixes from main have been pulled in. Anything else you wanna see on this PR? |
Nope, we’ll merge it! |
Awesome! 🙌 Thanks again for the great contribution @vmg! 🙏 |
Hi doltfriends! Here's another compatibility fix. This one I'm more confident about:
GROUP BY
is not collation aware when grouping!The included test reproduces the issue trivially, but for completeness: when using a
GROUP BY
statement that includes a textual column, the column's collation needs to be taken into account when creating the aggregation groups.E.g., for the following schema and query
We should expect the result of the
SELECT
to contain two rows, as there are only two groups when grouping by title. Even though we've inserted 4 unique values, theutf8mb4_0900_ai_ci
with which the column is collated is case insensitive, so in practice,'val1' = 'VAL1'
and'val2' = 'VAL2'
when grouping. This is not the current behavior.The fix is straightforward: use a weight string when hashing to generate the grouping key instead of using the expression's literal value. I did a small refactoring in
collations.go
so that we can write the weight string directly into the grouping key's hasher and continue using the same code for the existingHashUint
API.cc @dbussink