Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing ordercollation #287

Open
Flachdachs opened this issue Sep 29, 2024 · 0 comments
Open

Fixing ordercollation #287

Flachdachs opened this issue Sep 29, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Flachdachs
Copy link

Flachdachs commented Sep 29, 2024

DynamicPageList3 version: 3.5.2
MediaWiki version: 1.41.2
PHP version: 8.1.30
MariaDB version: 10.11.6

List of steps to reproduce (step by step, including full links if applicable):

  • Have a standard Mediawiki installation (with MySQL or MariaDB) that creates its text columns as varbinary.
  • Have some pages with and without characters with diacritics in its title.
  • Create a DPL query that queries these pages and uses the ordercollation parameter.

What happens?:

Sorting simply doesn't work. And using the "Default collation" value for a charset creates an error. Other collation values are ignored.

What should have happened instead?:

Sorting should work according to the collation rules.


When using utf8mb4_general_ci the database error message is: Error 1253: COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'binary'

Because the type of the column page.title is varbinary it has a pseudo charset and collation of "binary". And you can't use collations that doesn't fit with the charset. A query like SELECT field COLLATE latin1_german1_ci FROM ... where the field has a charset other than latin1 throws a similar error message. So whatever collation value you use it wont work with the varbinary columns because binary is the only allowed value for collation.

You have to cast/convert the column to the charset that was used when inserting the binary representation of string values. That would be utf8mb4. Then you can add a COLLATE ..., but only collation for the charset utf8mb4. Or you cast/convert a second time to the charset that fits to the collation. But this can lead to loss of characters that are not present in the destination charset. I'd suggest to cast/convert to utf8mb4 and then only allow collations values for the utf8mb4 charset. I guess this has to be applied to all places in Query.php where $this->getCollateSQL() is used.


But why only utf8mb4_general_ci throws an error and other collation values (e.g. utf8mb4_general_ci) doesn't?

In method _ordercollation() of Query.php a list of collations is queried with SHOW CHARACTER SET. But this statement only returns the "default collation value" for each charset, not all available values. The given value of the ordercollation parameter is compared against the collation values of this list, and all other values are silently ignored. As a user you don't get any feedback when using an invalid value. It just doesn't do its intended job.

The right statement would be SHOW COLLATION, the collation names can be found in column Collation. The statement can be restricted like SHOW COLLATION WHERE Charset='utf8mb4' because other charsets don't make sense (and throw an error) without the double cast/convert mentioned above.

@Flachdachs Flachdachs added the bug Something isn't working label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant