Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests (valgrind warning) on aarch64/glibc 2.31 #3058

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

colinleroy
Copy link
Contributor

tests/mantest were failing on Raspberry Pi OS 11:

==10205== Conditional jump or move depends on uninitialised value(s)
==10205== at 0x48806F8: checkfail (jq_test.c:58)
==10205== by 0x48806F8: run_jq_tests (jq_test.c:99)
==10205== by 0x4880FCB: jq_testsuite (jq_test.c:39)
==10205== by 0x10B37F: main (main.c:576)
==10205==
==10205== Conditional jump or move depends on uninitialised value(s)
==10205== at 0x4880714: checkfail (jq_test.c:58)
==10205== by 0x4880714: run_jq_tests (jq_test.c:99)
==10205== by 0x4880FCB: jq_testsuite (jq_test.c:39)
==10205== by 0x10B37F: main (main.c:576)

Copy link
Member

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but i wonder why it happens. Thestrcmp shoudn't be poking around after the nul-terminator set by fgets or?

Let's wait for one more approval then merge

@colinleroy
Copy link
Contributor Author

Looks good but i wonder why it happens. Thestrcmp shoudn't be poking around after the nul-terminator set by fgets or?

I'm unsure - I don't know ARM assembler. Given the glibc source, I suspect that it's optimized by fetching more than one byte at once, which makes it susceptible to read past the terminator, even if it doesn't compare past the terminator.

https://elixir.bootlin.com/glibc/glibc-2.31/source/sysdeps/aarch64/strcmp.S

@wader
Copy link
Member

wader commented Mar 5, 2024

Looks good but i wonder why it happens. Thestrcmp shoudn't be poking around after the nul-terminator set by fgets or?

I'm unsure - I don't know ARM assembler. Given the glibc source, I suspect that it's optimized by fetching more than one byte at once, which makes it susceptible to read past the terminator, even if it doesn't compare past the terminator.

https://elixir.bootlin.com/glibc/glibc-2.31/source/sysdeps/aarch64/strcmp.S

My aarch64 asm is not good enough :) wonder how it handles that it don't know the size of the buffers or can it makes assumptions for some reason? i might see some kind of quick look for nul loop there but not sure

@emanuele6
Copy link
Member

This probably requires some more investigation; I have a raspeberry PI 4B, I could try reproducing this later.
By the way, you can zero-initialise a buffer with = {0} (e.g. char buf[4096] = {0}), you don't need to call memset.

tests/mantest were failing on Raspberry Pi OS 11:

==10205== Conditional jump or move depends on uninitialised value(s)
==10205==    at 0x48806F8: checkfail (jq_test.c:58)
==10205==    by 0x48806F8: run_jq_tests (jq_test.c:99)
==10205==    by 0x4880FCB: jq_testsuite (jq_test.c:39)
==10205==    by 0x10B37F: main (main.c:576)
==10205==
==10205== Conditional jump or move depends on uninitialised value(s)
==10205==    at 0x4880714: checkfail (jq_test.c:58)
==10205==    by 0x4880714: run_jq_tests (jq_test.c:99)
==10205==    by 0x4880FCB: jq_testsuite (jq_test.c:39)
==10205==    by 0x10B37F: main (main.c:576)
@colinleroy
Copy link
Contributor Author

This probably requires some more investigation; I have a raspeberry PI 4B, I could try reproducing this later.
By the way, you can zero-initialise a buffer with = {0} (e.g. char buf[4096] = {0}), you don't need to call memset.

You're absolutely right, updated the commit.

@emanuele6 emanuele6 merged commit d697331 into jqlang:master Mar 18, 2024
28 checks passed
@emanuele6
Copy link
Member

Thank you

@colinleroy
Copy link
Contributor Author

Hi,
In light of what happened at ZX, maybe it would be good to understand why aarch64's strcmp gives this valgrind warning. @emanuele6, have you managed to reproduce on your Pi 4?

@colinleroy
Copy link
Contributor Author

Tested a bit more and it seems to be an optimizer issue. The valgrind warning goes away with either:

  • this patch we merged
  • changing static int checkfail() to int checkfail()
  • changing -O2 to -O1 or -O0

I've tried to see if this was fixed by gcc-11, glibc-2.36 or valgrind-3.19 on a Debian 12 Bookworm Raspberry. This one isn't visible anymore, there are numerous new ones though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants