Skip to content

Commit

Permalink
[mono] Use unsigned char when computing UTF8 string hashes
Browse files Browse the repository at this point in the history
The C standard does not specify whether `char` is signed or unsigned,
it is implementation defined.

Apparently Android aarch64 makes a different choice than other
platforms (at least macOS arm64 and Windows x64 give different
results).

Mono uses `mono_metadata_str_hash` in the AOT compiler and AOT runtime
to optimize class name lookup.  As a result, classes whose names
include UTF-8 continuation bytes (with the high bit = 1) will hash
differently in the AOT compiler and on the device.

Fixes #82187
Fixes #78638
  • Loading branch information
lambdageek authored and github-actions committed Mar 11, 2023
1 parent cbfaba5 commit 205ddba
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/mono/mono/eglib/ghashtable.c
Original file line number Diff line number Diff line change
Expand Up @@ -673,7 +673,7 @@ guint
g_str_hash (gconstpointer v1)
{
guint hash = 0;
char *p = (char *) v1;
unsigned char *p = (unsigned char *) v1;

while (*p++)
hash = (hash << 5) - (hash + *p);
Expand Down
3 changes: 2 additions & 1 deletion src/mono/mono/metadata/metadata.c
Original file line number Diff line number Diff line change
Expand Up @@ -5386,7 +5386,8 @@ guint
mono_metadata_str_hash (gconstpointer v1)
{
/* Same as g_str_hash () in glib */
char *p = (char *) v1;
/* note: signed/unsigned char matters - we feed UTF-8 to this function, so the high bit will give diferent results if we don't match. */
unsigned char *p = (unsigned char *) v1;
guint hash = *p;

while (*p++) {
Expand Down

0 comments on commit 205ddba

Please sign in to comment.