-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New unexpected build failures on 20240603.1.0 #10004
New unexpected build failures on 20240603.1.0 #10004
Comments
We are also seeing new issues in the Specifically, in our case, a built executable yields an access violation when attempts are made to run it as part of later stages of our build process. |
We're also seeing issues with In our case, the build process itself succeeds but trying to run any of the resulting executables later in the workflow fails with |
We're also seeing similar issues on
Errors are all segfaults trying to run executable files. |
The issue may be due to a combination of the ordering of the PATH environment variable, older versions of vcruntime in the path (ex. bundled with Python), and changes in VS 2022 17.10 STL that require the latest vcruntime. When running the following on But the system-installed version (at As noted in the VS 2022 17.10 STL release notes:
Reference: https://github.com/microsoft/STL/releases/tag/vs-2022-17.10 And, in fact, I can confirm that built executables run correctly on a VM where I've ensured that Possible recommendations for windows images:
|
Also seeing segfaults running executables from a windows-latest (20240603.1.0) runner. Was fine on 20240514.3.0.
|
DLL confusion due to PATH issues makes a lot of sense - I had a build succeed just now simply by changing the workflow to build in Debug rather than Release. The executables would then be looking for |
I also see a regression on the GDAL (https://github.com/OSGeo/gdal) CI related to that change, causing crashes during tests execution:
|
We hit this too and traced it back to std::mutex usage in https://github.com/abseil/abseil-cpp. This looks like https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660#T-N10668856, which suggests an older incompatible version of msvcp140.dll is being used in this image. We also only see this in optimized builds. Our debug-built executables still work. The stacktrace we got:
|
My Python code using 🟢 20240514.3.0: https://github.com/facefusion/facefusion/actions/runs/9401500581/job/25893418056 |
20240603.1.0 broke us too; we're consistently seeing nondescript "failed to execute command" errors in some |
I've attempted that in https://github.com/rouault/gdal/actions/runs/9407632196/job/25913839874, copying c:\Windows\system32\vcruntime140.dll in a specific directory and putting it in front of the path, but it appears that the version of c:\Windows\system32\ is only 14.32.31326 , and not >= 14.40.33810.00. I would argue that the runner-image should be fixed to have a recent enough vcruntime140.dll in front of the PATH. A workaround I found when reading https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660#T-N10668856 is that you can define "/D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR" when building your software to revert to a std::mutex constructor compatible of older vcruntimes : rouault/gdal@c4ab31f . My builds work fine with that workaround. |
…atest 20240603.1.0 Cf actions/runner-images#10004 Other approach attempted in 9ff56b3 didn't result in finding a recent enough vcruntime140.dll
…atest 20240603.1.0 Cf actions/runner-images#10004 Other approach attempted in rouault@9ff56b3 didn't result in finding a recent enough vcruntime140.dll
Can we expect a rollback or some other fix to the runner images, or will affected projects need to use one of the listed workarounds? Is there a way to use older/stable runner images instead of this new version? |
Seeing this also Runner image 20240514.3.0 works fine https://github.com/randombit/botan/actions/runs/9408188903/job/25918967600 Runner image 20240603.1.0, built binaries fail with error code 3221225477 https://github.com/randombit/botan/actions/runs/9408188903/job/25918967856 Same code in this case; one (working) is the PR the second (failing) is the merge of that PR into master. |
Visual C++ devs (Microsoft) did something idiotic, and made it so that if code compiled with the latest compiler runs against an older runtime it ... crashes without any message. https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660#T-N10668856 Then Github (Microsoft) shipped an image with a new compiler and an old runtime, so that compiling anything and trying to run it fails actions/runner-images#10004 Truly extraordinary
Getting similar errors today with the new images, simply executing curl to download a binary. affecting windows-2019 and windows-2022 images. ( I was using windows-latest, but tried windows-2019, same result)
Here is a re-run of job today, using a newer runner, that passed yesterday. 🟢 curl --output foo --write-out "%{http_code}" --location https://github.com/pact-foundation/pact-ruby-standalone/releases/download/v2.4.4/pact-2.4.4-windows-x86_64.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: libcurl function was given a bad argument
000 I believe my error is curl/curl#13845 probably due to the image update, updating curl deps |
Revert "Add `/D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR` to fix segfault #2151 This reverts commit a121c75. actions/runner-images#10004
According to actions/runner-images#10004, the problems with the Windows runners have been fixed. They prevented successful runs for release builds. So this commit reverts 20c1fd8.
@ijunaidm Do you have more information on this change. In our case, JVM itself is loading an older version of vc runtime from it's deployed folder, so even if we remove or update the vc dlls in system32 and other paths on the agent image, JVM still loads an older version of the runtime before we are even called, so our native code will crash on load if we build on the latest 20240610. The 20240610 image in un-usable to us, unless we modify our code, and work with all our partner team to have them also start building with the D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR flag defined. This does not seem like a good path forward. How long will the D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR be supported? What other std incompatibilities are in there that we do not know about yet, and are not covered by this one compiler flag? Do you know if there is a customer impact issue for this in the DevDiv team? |
Initially I don't think this in on the hosted image maintainers. My point of view is that this is a failure on the Visual Studio team that allowed such a change to ship in the first place. The only area the hosted image folks could have improved would be to have rolled back the compiler upgrade as soon as they realized what happened, and to then message it to the community and push back on the Compiler team to unbreak the STD classes in the vc runtime.... or at a bare minimum published a high visibility breaking change message as soon as they root caused the issue, rather than just resloving this GH Issue because one persons builds are passing while dozens others are still broken... |
@BrianMouncer #10020 and #10055 also cover the JVM issues. |
It is. They deployed an image in which the VC runtime installed in As for MS, they need to fix the compiler tool chain so that the dynamic linker ensures code is linking to a compatible version of the runtime and fails with an understandable error message when the versions mismatch. |
@BrianMouncer It's a maintainer's issue. I have no gripes with the VC team, as a matter of fact. The images were improperly built. As supporting evidence, we re-ran pipelines that had previously succeeded with May's image and found those pipelines failed with the new June image. Then that same code started working again when the We lucked out that our 3rd party libraries started working with the fix. We never had to touch our code b/c it never explicitly dealt with mutex issue. So, yes, magically on the 13th of the month, everything started working for us with no code change. It's an image issue. The sticky part is the issue still exists for some - #10020 is still active and very much a concern for a lot of folks since the new configuration is still a blocking issue for them. It's worth noting that the reason I also called out the gcc issue was to illustrate that two different platforms suffered from the lack of clear communication, and both practically happened at the same time. That's a lot of expensive deviations due to debugging, workarounds, etc. The windows issue illustrates a need for better testing, monitoring, and response. The ubuntu issues took an image set backwards. Both issues highlight a lack of communication. We spent a lot of time (man hours and pipeline time) on issues that really shouldn't have happened in the first place (windows c++ issue) or, at the very least, with sufficient notice to allow for planned changes (ubuntu issue). |
Is the |
I found my own answer after looking back at CI build logs. Usually an msvc build failed first cancelling the other builds but I found an occurrence where the runner image was updated after the msvc jobs had completed but before a clangcl job. It suffered crashes and SEH exceptions in the way familar to everyone here. The workaround is needed also with ClangCL. |
Instead of removing the older version of the vcruntime from the Temurin JVM installation in the GitHub Actions runner image, define `_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR` when compiling libktx and ${ASTCENC_LIB_TARGET}. This makes the code compatible with older VC runtimes removing the burden from users to ensure their JVM installation uses the latest VC runtime. See actions/runner-images#10055. For further background see actions/runner-images#10004 and https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660#T-N10669129-N10678728. Includes 2 other minor changes: 1. Move the compiler info dump in `CMakeLists.txt` to before first use of the compiler info and recode it to use `cmake_print_variables`. 2. Disable dump of system and platform info in `tests/loadtests/CMakeLists.txt`.
This reverts commit 205f163.
This PR contains two changes: 1) Moves a pragma to disable a warning, which seems to be required by the new compiler. 2) Adds a preprocessor define to workaround the crashes caused by the runner image mismatching C++ runtime versions. The second change we will want to revert once the runner images are fixed. The issue tracking the runner images is: actions/runner-images#10004 Related microsoft#6668 (cherry picked from commit 0b9acdb)
This removes the hack introduced in microsoft#6683 to workaround issues in the GitHub and ADO runner image: actions/runner-images#10004 Rumor has it the runner images are now fixed... let's see. Fixes microsoft#6674 (cherry picked from commit 98bb80a)
…ex::lock (_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR) if (WIN32 AND IMGUI_BUNDLE_BUILD_PYTHON) # Windows: workaround against msvc Runtime incompatibilities when using std::mutex::lock # Early 2024, msvcp140.dll was updated, and Python 3.11/3.12 are shipped with their own older version of msvcp140.dll # As a consequence the python library will happily crash at customer site, not bothering to mention # the fact that the loaded version of msvcp140.dll is incompatible... # See: # https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660 # actions/runner-images#10004 # #239 (comment) add_compile_definitions(_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR) endif()
Description
Our PR builds had been working fine as expected until last night when the runners updated to the 20240603.1.0 image. An impacted PR is here:
microsoft/DirectXShaderCompiler#6668
Earlier iterations of the PR build successfully, but the builds began failing once the image updated.
See a failing build here:
https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6383&view=results
And a previously successful one here:
https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6371&view=results
Platforms affected
Runner images affected
Image version and build link
Image: 20240603.1.0
https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6383&view=results
Is it regression?
20240514.3.0
Expected behavior
Our builds should work?
https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6371&view=results
Actual behavior
The build fails with errors we don't encounter locally or on older VM images.
Repro steps
We cannot reproduce outside the VM image.
The text was updated successfully, but these errors were encountered: