Skip to content

Commit

Permalink
[mono] Use unsigned char when computing UTF8 string hashes
Browse files Browse the repository at this point in the history
The C standard does not specify whether `char` is signed or unsigned,
it is implementation defined.

Apparently Android aarch64 makes a different choice than other
platforms (at least macOS arm64 and Windows x64 give different
results).

Mono uses `mono_metadata_str_hash` in the AOT compiler and AOT runtime
to optimize class name lookup.  As a result, classes whose names
include UTF-8 continuation bytes (with the high bit = 1) will hash
differently in the AOT compiler and on the device.

Fixes dotnet#82187
Fixes dotnet#78638
  • Loading branch information
lambdageek committed Mar 10, 2023
1 parent a923c64 commit 9a5350d
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/mono/mono/eglib/ghashtable.c
Original file line number Diff line number Diff line change
Expand Up @@ -673,7 +673,7 @@ guint
g_str_hash (gconstpointer v1)
{
guint hash = 0;
char *p = (char *) v1;
unsigned char *p = (unsigned char *) v1;

while (*p++)
hash = (hash << 5) - (hash + *p);
Expand Down
3 changes: 2 additions & 1 deletion src/mono/mono/metadata/metadata.c
Original file line number Diff line number Diff line change
Expand Up @@ -5524,7 +5524,8 @@ guint
mono_metadata_str_hash (gconstpointer v1)
{
/* Same as g_str_hash () in glib */
char *p = (char *) v1;
/* note: signed/unsigned char matters - we feed UTF-8 to this function, so the high bit will give diferent results if we don't match. */
unsigned char *p = (unsigned char *) v1;
guint hash = *p;

while (*p++) {
Expand Down

0 comments on commit 9a5350d

Please sign in to comment.