You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
List of steps to reproduce (step by step, including full links if applicable):
Have a standard Mediawiki installation (with MySQL or MariaDB) that creates its text columns as varbinary.
Have some pages with and without characters with diacritics in its title.
Create a DPL query that queries these pages and uses the ordercollation parameter.
What happens?:
Sorting simply doesn't work. And using the "Default collation" value for a charset creates an error. Other collation values are ignored.
What should have happened instead?:
Sorting should work according to the collation rules.
When using utf8mb4_general_ci the database error message is: Error 1253: COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'binary'
Because the type of the column page.title is varbinary it has a pseudo charset and collation of "binary". And you can't use collations that doesn't fit with the charset. A query like SELECT field COLLATE latin1_german1_ci FROM ... where the field has a charset other than latin1 throws a similar error message. So whatever collation value you use it wont work with the varbinary columns because binary is the only allowed value for collation.
You have to cast/convert the column to the charset that was used when inserting the binary representation of string values. That would be utf8mb4. Then you can add a COLLATE ..., but only collation for the charset utf8mb4. Or you cast/convert a second time to the charset that fits to the collation. But this can lead to loss of characters that are not present in the destination charset. I'd suggest to cast/convert to utf8mb4 and then only allow collations values for the utf8mb4 charset. I guess this has to be applied to all places in Query.php where $this->getCollateSQL() is used.
But why only utf8mb4_general_ci throws an error and other collation values (e.g. utf8mb4_general_ci) doesn't?
In method _ordercollation() of Query.php a list of collations is queried with SHOW CHARACTER SET. But this statement only returns the "default collation value" for each charset, not all available values. The given value of the ordercollation parameter is compared against the collation values of this list, and all other values are silently ignored. As a user you don't get any feedback when using an invalid value. It just doesn't do its intended job.
The right statement would be SHOW COLLATION, the collation names can be found in column Collation. The statement can be restricted like SHOW COLLATION WHERE Charset='utf8mb4' because other charsets don't make sense (and throw an error) without the double cast/convert mentioned above.
The text was updated successfully, but these errors were encountered:
DynamicPageList3 version: 3.5.2
MediaWiki version: 1.41.2
PHP version: 8.1.30
MariaDB version: 10.11.6
List of steps to reproduce (step by step, including full links if applicable):
What happens?:
Sorting simply doesn't work. And using the "Default collation" value for a charset creates an error. Other collation values are ignored.
What should have happened instead?:
Sorting should work according to the collation rules.
When using utf8mb4_general_ci the database error message is: Error 1253: COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'binary'
Because the type of the column page.title is varbinary it has a pseudo charset and collation of "binary". And you can't use collations that doesn't fit with the charset. A query like
SELECT field COLLATE latin1_german1_ci FROM ...
where the field has a charset other than latin1 throws a similar error message. So whatever collation value you use it wont work with the varbinary columns becausebinary
is the only allowed value for collation.You have to cast/convert the column to the charset that was used when inserting the binary representation of string values. That would be
utf8mb4
. Then you can add aCOLLATE ...
, but only collation for the charsetutf8mb4
. Or you cast/convert a second time to the charset that fits to the collation. But this can lead to loss of characters that are not present in the destination charset. I'd suggest to cast/convert toutf8mb4
and then only allow collations values for theutf8mb4
charset. I guess this has to be applied to all places in Query.php where$this->getCollateSQL()
is used.But why only
utf8mb4_general_ci
throws an error and other collation values (e.g.utf8mb4_general_ci
) doesn't?In method
_ordercollation()
of Query.php a list of collations is queried withSHOW CHARACTER SET
. But this statement only returns the "default collation value" for each charset, not all available values. The given value of the ordercollation parameter is compared against the collation values of this list, and all other values are silently ignored. As a user you don't get any feedback when using an invalid value. It just doesn't do its intended job.The right statement would be
SHOW COLLATION
, the collation names can be found in columnCollation
. The statement can be restricted likeSHOW COLLATION WHERE Charset='utf8mb4'
because other charsets don't make sense (and throw an error) without the double cast/convert mentioned above.The text was updated successfully, but these errors were encountered: