Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"./zstreamtest --newapi" test case fails on Windows #3119

Closed
eli-schwartz opened this issue Apr 27, 2022 · 10 comments · Fixed by #3288
Closed

"./zstreamtest --newapi" test case fails on Windows #3119

eli-schwartz opened this issue Apr 27, 2022 · 10 comments · Fixed by #3288
Assignees
Labels
bug release-blocking Must be done by the release

Comments

@eli-schwartz
Copy link
Contributor

In the Meson WrapDB integration CI, we try to run the build && test on all 3 major OSes. On Windows, one of the tests fails. Logs: https://github.com/mesonbuild/wrapdb/runs/6086663524?check_suite_focus=true

Relevant bits:

3/8 test-zstream-3           FAIL            21.14s   (exit status 3221225477 or signal 3221225349 SIGinvalid)
>>> MALLOC_PERTURB_=124 D:\a\wrapdb\wrapdb\_build\subprojects\zstd-1.5.2\build/meson/tests\zstreamtest.exe --newapi -t1 -T90s --no-big-tests
------------------------------------- 8< -------------------------------------
stderr:
Starting zstream tester (64-bits, 1.5.2)

Seed = 1643


     1/     1    
    34          
    37          
    41          
    45          
    47          
    60          
    78          
    82          
    95          
   102          
   132          
   133          
   134          
   143          
   145          
   150          
   159          
   171          
   196          
   201          
   202          
   209          
   216          
   225          
   249          
   257          
   263          
   267          
   272          
   275          
   287          
   297          
   310          
   313          
   318          
   339          
   348          
   356          
   362          
   374          
   381          
   413          
   414          
   433          
   458          
   484          
   491          
------------------------------------------------------------------------------

It is dying of a strange signal apparently. I've grepped around the source tree, and --newapi is only exercised by the Makefile and by the meson.build, and neither one is getting run on Windows-based CI jobs.

@terrelln
Copy link
Contributor

@eli-schwartz are you able to reproduce the error consistently?

@eli-schwartz
Copy link
Contributor Author

My only windows test environment is GitHub actions. In that environment I could consistently produce these results while doing debug iteration. (This is the same debug iteration in which I discovered the valgrind issue I was facing etc. in my other PR.)

@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Apr 27, 2022

It was also reproducible at the end of January (again in the WrapDB github actions CI), as I mentioned this exact problem here: #3039 (comment)

@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 28, 2022

I've tried to reproduce it.
On macos or Linux, I was unable to reproduce the same issue.

On Windows + Mingw64, I could finally reproduce it.
Problem is, it's not reproducible : a problem will happen, at some point, but it's never the same place. Traces are not helpful to debug the issue. There is no way to use something like a sanitizer in this environment.

Suspecting an issue with the multi-threading code currently, as the MT code on windows uses a shim translation layer, not pthread directly.
To check this hypothesis, I compile and run the same test, but under Windows + Msys2.
The difference is fairly small, almost the same system as mingw64, but msys2 uses a posix compatibility layer, so programs compiled in this mode can invoke pthread directly, without employing zstd's windows translation layer.
And sure enough, this test worked flawlessly, several times.

So that's a potential investigation direction.

Also worth noting :
under Windows + mingw64, the test fails only when MALLOC_PERTURB_ is set. Otherwise, it completes successfully.

This seems to point at MT + Windows + malloc combination problem.

Cyan4973 added a commit that referenced this issue Apr 28, 2022
fix segfault error when running zstreamtest with MALLOC_PERTURB_
@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Apr 28, 2022

Note that in the preamble to my CI logs I linked above, it prints messages pointing out that the detected / used compiler is cl.exe (msvc 19.31.31105) and the linker is link.exe -- and of course there too it is not using the msys2 posix compatibility layer.

Thanks for confirming the issue. :) The fact that it happens due to MALLOC_PERTURB_ is interesting... Meson sets this as described at https://mesonbuild.com/Unit-tests.html#malloc_perturb_ and it can be disabled if needed, though catching malloc problems does seem like that feature was useful in exposing an issue?

@Cyan4973
Copy link
Contributor

Yes, I think MALLOC_PERTURB_ is useful for tests, and it's worthwhile fixing the issues it finds: #3121.

@Cyan4973
Copy link
Contributor

Also worth noting : under Windows + mingw64, the test fails only when MALLOC_PERTURB_ is set. Otherwise, it completes successfully.

I retract that statement.
Even without MALLOC_PERTURB_, there are still failures. They just seem to take longer to generate, but they still happen.
This seems to point at a more subtle issue in the Windows pthread translation layer.

@Cyan4973 Cyan4973 added bug release-blocking Must be done by the release labels Oct 13, 2022
Cyan4973 added a commit that referenced this issue Oct 13, 2022
fix segfault error when running zstreamtest with MALLOC_PERTURB_
@Cyan4973 Cyan4973 reopened this Oct 15, 2022
@Cyan4973
Copy link
Contributor

This was improperly closed.
#3288 doesn't fix #3119.

@eli-schwartz
Copy link
Contributor Author

This is properly fixed by #3364 (not yet merged, but passes this test case in CI now).

@yoniko yoniko self-assigned this Dec 18, 2022
@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Dec 20, 2022

Fix has been merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug release-blocking Must be done by the release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants