Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to set correct collation #563

Closed
vsalvino opened this issue Nov 2, 2022 · 0 comments · Fixed by #564
Closed

Impossible to set correct collation #563

vsalvino opened this issue Nov 2, 2022 · 0 comments · Fixed by #564

Comments

@vsalvino
Copy link
Contributor

vsalvino commented Nov 2, 2022

Describe the bug

Use case: the database is configured with a character set, and a non-default collation for that character set.

[mysqld]
character_set_server=utf8mb4
collation_server=utf8mb4_unicode_ci

For reference, note that on this particular installation (MariaDB 10.3), the default collation for utf8mb4 is utf8mb4_general_ci, NOT utf8mb4_unicode_ci.

The bug in this case is that when using mysqlclient, it is IMPOSSIBLE to set the correct collation. There is an option charset, which whether it is provided, blank, or empty, issues a command as such:

SET NAME <charset>

SET NAME will ALWAYS use the default collation for the charset if the collation is not specified, which is documented here: https://mariadb.com/kb/en/set-names/, and much more deeply documented here: https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html.

In this case we are specifying charset="utf8mb4". The command is being generated is:

SET NAMES utf8mb4

When COLLATION is omitted, this uses the default collation for utf8mb4 which utf8mb4_general_ci. This is hard-coded into MariaDB and cannot be changed: https://mariadb.com/kb/en/change-is_default-value-for-information_schemacollations/

The problem is that certain SQL queries, such as a CAST as generated by the Django Cast function, will infer the connection's collation, and can fail if the connection's collation does not match the table:

(1267, "Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '='")

The simple documented solution is to also specify the collation:

SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci

I have tried several workarounds to set the collation, which currently fail, because set_character_set is always run AFTER the workaround, therefore negating it.

Workaround 1

Use the init_command option to send SET NAME utf8mb4 COLLATE utf8mb4_unicode_ci after connecting. Unfortunately, from examining the query log, charset is set AFTER init_command, therefore negating it. The query log looks as such:

31 Query    SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci  -- run by init_command
31 Query    SET NAMES utf8mb4  -- from mysqlclient

Workaround 2

Use the init_connect option on the MySQL server. Unfortunately, the story is the same, as the charset is issued after the init_connect script is run, therefore negating it.

32 Query    SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci  -- run by the server
32 Query    SET NAMES utf8mb4  -- from mysqlclient

Possible fixes

Fix 1: Do not set charset if empty.

If charset is empty, or possibly None, do not set it. This way, it does not override the server's default behavior or the init_command

if not charset:
charset = self.character_set_name()
self.set_character_set(charset)

It seems this behavior (which is the source of my bug) was introduce in 2.1: #509

Fix 2: Add a collation option.

This deviates from the MySQL C connector behavior, but it would be really nice to be able to specify both "charset" and "collation". Special implementation in Python might be required for this, e.g. to issue a SQL command after connecting.

Environment

MySQL Server

  • MariaDB (Debian 11, Windows 10)
  • MariaDB 10.3.35

MySQL Client

  • Debian, Windows

  • Python tested on multiple versions: 3.7+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant