Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in System.Text.RegularExpressions.Tests #93206

Closed
akoeplinger opened this issue Oct 9, 2023 · 28 comments
Closed

Segmentation fault in System.Text.RegularExpressions.Tests #93206

akoeplinger opened this issue Oct 9, 2023 · 28 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Known Build Error Use this to report build issues in the .NET Helix tab os-mac-os-x macOS aka OSX
Milestone

Comments

@akoeplinger
Copy link
Member

akoeplinger commented Oct 9, 2023

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=431489
Build error leg or test failing: Libraries Test Run release coreclr osx x64 Release
Pull request: N/A

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "Segmentation fault.*System.Text.RegularExpressions.Tests",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=431489
Error message validated: Segmentation fault.*System.Text.RegularExpressions.Tests
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 10/9/2023 11:24:45 AM UTC

Report

Build Definition Test Pull Request
643217 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #100446
631279 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #100658
2422814 dotnet-runtime System.Text.RegularExpressions.Tests.WorkItemExecution #38744
626423 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #100503
624173 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #100446
617751 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #100169

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 1 6
@akoeplinger akoeplinger added area-System.Text.RegularExpressions blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Oct 9, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 9, 2023
@ghost
Copy link

ghost commented Oct 9, 2023

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=431489
Build error leg or test failing: Libraries Test Run release coreclr osx x64 Release
Pull request: N/A

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "Segmentation fault.*System.Text.RegularExpressions.Tests",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}
Author: akoeplinger
Assignees: -
Labels:

area-System.Text.RegularExpressions, blocking-clean-ci, Known Build Error

Milestone: -

@akoeplinger
Copy link
Member Author

Looks similar to #85046

@steveharter
Copy link
Member

Some detail from the logs; no dump:

  Discovering: System.Text.RegularExpressions.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Text.RegularExpressions.Tests (found 329 of 357 test cases)
  Starting:    System.Text.RegularExpressions.Tests (parallel test collections = on, max threads = 4)
./RunTests.sh: line 204: 48718 Segmentation fault: 11  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Text.RegularExpressions.Tests.runtimeconfig.json --depsfile System.Text.RegularExpressions.Tests.deps.json xunit.console.dll System.Text.RegularExpressions.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/private/tmp/helix/working/B0D30970/w/B6A909D4/e
----- end Mon Oct 9 04:57:38 EDT 2023 ----- exit code 139 ----------------------------------------------------------
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.
ulimit -c value: 0
+ export _commandExitCode=139
+ _commandExitCode=139
+ /usr/local/bin/python3 /tmp/helix/working/B0D30970/p/reporter/run.py https://dev.azure.com/dnceng-public/ public 9545048 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Im9PdmN6NU1fN3AtSGpJS2xGWHo5M3VfVjBabyJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJhcHBfdG9rZW4iLCJhdWkiOiJmYjRjZWM4Ny00MzBlLTQ5MjctYWY5Zi0wODEyMDY0ZjliNTEiLCJzaWQiOiJkNDQ0NDU2ZS0wMGFkLTQxODktOTZiZi03NTc5YzhkNDFhMzIiLCJCdWlsZElkIjoiY2JiMTgyNjEtYzQ4Zi00YWJiLTg2NTEtOGNkY2I1NDc0NjQ5OzQzMTQ4OSIsImpvYnJlZiI6IjRlNzhkMDdhLTU0MDQtNDA0Yy04NWQ0LWYwMGYwMjU0OTc1Zjo3ZTk4NjRhNi03ZmVjLTVjZDQtZmEwMS01OGQ3NmEwMzU4ZjkiLCJwcGlkIjoidnN0ZnM6Ly8vQnVpbGQvQnVpbGQvNDMxNDg5Iiwib3JjaGlkIjoiNGU3OGQwN2EtNTQwNC00MDRjLTg1ZDQtZjAwZjAyNTQ5NzVmLmJ1aWxkLmxpYnJhcmllc190ZXN0X3J1bl9yZWxlYXNlX2NvcmVjbHJfb3N4X3g2NF9yZWxlYXNlLl9fZGVmYXVsdCIsInJlcG9JZHMiOiIiLCJpc3MiOiJhcHAudnN0b2tlbi52aXN1YWxzdHVkaW8uY29tIiwiYXVkIjoiYXBwLnZzdG9rZW4udmlzdWFsc3R1ZGlvLmNvbXx2c286NmZjYzkyZTUtNzNhNy00Zjg4LThkMTMtZDkwNDViNDVmYjI3IiwibmJmIjoxNjk2ODQwNDc2LCJleHAiOjE2OTY4NTA2NzZ9.IVBKJCxDEmXAR-ltcWk6a08SCMG-owMAPvaBhq_6BAtitZNAJPwChwcwtCqIv5sOeUCkQYfSmU6UD1OWMSxeUeY_nFlukS-q4eD9X_HJaz5rICRMFcmO3u884rtGgHef6YAaojCw894W-rBndNvV2mZ-cso9BmfPOYZMoYpO3pQfeCZoaRq3Im1VsJY26_W0rExBk8asLIjuEKDlk7LjYA7kOF61uh3Qy5fQRJtQkDDaJY9PEExvCBneeRf5cnoARNncRi9z3mpHiqsIZ6W2gyMP6uff5v0OG5GFu-AJkVHXgREL3pw935WTLMKFMORsc8lD5MqYryve3RCOTw0CZA
2023-10-09T08:57:46.052Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2023-10-09T08:57:46.054Z	INFO   	run.py	__init__(42)	read_results	Searching '/private/tmp/helix/working/B0D30970/w/B6A909D4/e' for test results files
2023-10-09T08:57:46.056Z	INFO   	run.py	__init__(42)	read_results	Searching '/tmp/helix/working/B0D30970/w/B6A909D4/uploads' for test results files
2023-10-09T08:57:46.057Z	WARNING	run.py	__init__(55)	read_results	No results file found in any of the following formats: xunit, junit, trx
2023-10-09T08:57:46.058Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 0 test reports to '/tmp/helix/working/B0D30970/w/B6A909D4/e/__test_report.json'
2023-10-09T08:57:46.058Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 1553 bytes
+ /usr/local/bin/python3 /tmp/helix/working/B0D30970/p/gen-debug-dump-docs.py -buildid 431489 -workitem System.Text.RegularExpressions.Tests -jobid 5472e3d3-beb1-49db-af99-5d3100d2a736 -outdir /tmp/helix/working/B0D30970/w/B6A909D4/uploads -templatedir /tmp/helix/working/B0D30970/p -dumpdir /cores -productver 9.0.0
Did not find dumps, skipping dump docs generation.
+ exit 139
['System.Text.RegularExpressions.Tests' END OF WORK ITEM LOG: Command exited with 139]

@steveharter steveharter added the os-mac-os-x macOS aka OSX label Oct 9, 2023
@steveharter
Copy link
Member

Both this and #85046 occurred on OSX.

@steveharter steveharter removed the untriaged New issue has not been triaged by the area owner label Oct 9, 2023
@stephentoub stephentoub removed their assignment Oct 9, 2023
@stephentoub
Copy link
Member

Without a dump it'll be impossible to make progress on this. It's also very unlikely to be in regex itself, and much more likely to be an issue either in the span-related functionality regex sits on top of, or in codegen / the runtime.

@akoeplinger
Copy link
Member Author

I was able to capture a core dump on my local Mac using the Helix artifacts, but it is 8GB so uploading will take a while :)

@steveharter steveharter added this to the 9.0.0 milestone Oct 9, 2023
@akoeplinger
Copy link
Member Author

Here's the coredump compressed with 7z: https://microsofteur-my.sharepoint.com/:u:/g/personal/alkpli_microsoft_com/Ed36-eUF0PZOm-1hEL6QVwMBNRoTbiyBIgX5sh9dY6WR6Q?e=fGpWUP

This was from the artifacts from Helix job 5472e3d-beb1-49db-af99-5d3100d2a736.

bt all from lldb: https://gist.github.com/akoeplinger/73cda3c6fa725e18d0f3fbc25c929ca6

Let me know if you need anything else.

@steveharter
Copy link
Member

I was able to capture a core dump on my local Mac using the Helix artifacts

Cool. Is there a crashlog (.crash) file that you can run lldb crashlog against?

This looks suspect however:

 thread #15
    frame #0: 0x000000010eceff91 libcoreclr.dylib`WKS::gc_heap::mark_object_simple(unsigned char**) [inlined] WKS::mark_queue_t::queue_mark(this=<unavailable>, o="p\x9f\x8c\U0000001d\U00000001") at gc.cpp:26791:9 [opt]
    frame #1: 0x000000010eceff8e libcoreclr.dylib`WKS::gc_heap::mark_object_simple(unsigned char**) [inlined] WKS::mark_queue_t::queue_mark(this=<unavailable>, o="p\x9f\x8c\U0000001d\U00000001", condemned_gen=-1) at gc.cpp:26829:16 [opt]
    frame #2: 0x000000010eceff7c libcoreclr.dylib`WKS::gc_heap::mark_object_simple(po=<unavailable>) at gc.cpp:27476:17 [opt]
    frame #3: 0x000000010ecf2b0b libcoreclr.dylib`WKS::GCHeap::Promote(ppObject=0x0000700006b773a8, sc=<unavailable>, flags=0) at gc.cpp:48915:5 [opt]
    frame #4: 0x000000010ec6a7ae libcoreclr.dylib`GcInfoDecoder::ReportUntrackedSlots(GcSlotDecoder&, REGDISPLAY*, unsigned int, void (*)(void*, Object**, unsigned int), void*) [inlined] GcInfoDecoder::ReportSlotToGC(this=0x0000700006860708, slotDecoder=0x0000700006860310, slotIndex=173, pRD=0x0000700006860d90, reportScratchSlots=true, inputFlags=<unavailable>, pCallBack=(libcoreclr.dylib`GcEnumObject(void*, Object**, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007000068634b0) at gcinfodecoder.h:0 [opt]
    frame #5: 0x000000010ec6a6fc libcoreclr.dylib`GcInfoDecoder::ReportUntrackedSlots(this=0x0000700006860708, slotDecoder=0x0000700006860310, pRD=0x0000700006860d90, inputFlags=<unavailable>, pCallBack=(libcoreclr.dylib`GcEnumObject(void*, Object**, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007000068634b0) at gcinfodecoder.cpp:1040:9 [opt]
    frame #6: 0x000000010ec695e5 libcoreclr.dylib`GcInfoDecoder::EnumerateLiveSlots(this=<unavailable>, pRD=<unavailable>, reportScratchSlots=<unavailable>, inputFlags=<unavailable>, pCallBack=<unavailable>, hCallBack=<unavailable>) at gcinfodecoder.cpp:989:9 [opt]
    frame #7: 0x000000010ea972cf libcoreclr.dylib`EECodeManager::EnumGcRefs(this=<unavailable>, pRD=0x0000700006860d90, pCodeInfo=0x0000700006860c10, flags=0, pCallBack=(libcoreclr.dylib`GcEnumObject(void*, Object**, unsigned int) at gcenv.ee.common.cpp:147), hCallBack=0x00007000068634b0, relOffsetOverride=4294967295) at eetwain.cpp:5336:24 [opt]
    frame #8: 0x000000010eba8353 libcoreclr.dylib`GcStackCrawlCallBack(pCF=0x00007000068609e0, pData=0x00007000068634b0) at gcenv.ee.common.cpp:282:18 [opt]
    frame #9: 0x000000010eb269f5 libcoreclr.dylib`Thread::MakeStackwalkerCallback(this=0x00007f92ae04b000, pCF=0x00007000068609e0, pCallback=(libcoreclr.dylib`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007000068634b0) at stackwalk.cpp:847:27 [opt]
    frame #10: 0x000000010eb26c4a libcoreclr.dylib`Thread::StackWalkFramesEx(this=0x00007f92ae04b000, pRD=0x0000700006860d90, pCallback=(libcoreclr.dylib`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007000068634b0, flags=34048, pStartFrame=0x0000000000000000) at stackwalk.cpp:927:26 [opt]
    frame #11: 0x000000010eb27084 libcoreclr.dylib`Thread::StackWalkFrames(this=0x00007f92ae04b000, pCallback=(libcoreclr.dylib`GcStackCrawlCallBack(CrawlFrame*, void*) at gcenv.ee.common.cpp:200), pData=0x00007000068634b0, flags=34048, pStartFrame=0x0000000000000000) at stackwalk.cpp:1010:12 [opt]
    frame #12: 0x000000010eba5285 libcoreclr.dylib`ScanStackRoots(pThread=0x00007f92ae04b000, fn=(libcoreclr.dylib`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) at gc.cpp:48849), sc=0x0000700006863588) at gcenv.ee.cpp:204:18 [opt]
    frame #13: 0x000000010eba5099 libcoreclr.dylib`GCToEEInterface::GcScanRoots(fn=(libcoreclr.dylib`WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) at gc.cpp:48849), condemned=1, max_gen=2, sc=0x0000700006863588) at gcenv.ee.cpp:303:13 [opt]
    frame #14: 0x000000010ece4c1a libcoreclr.dylib`WKS::gc_heap::mark_phase(condemned_gen_number=1) at gc.cpp:29358:9 [opt]
    frame #15: 0x000000010ece1306 libcoreclr.dylib`WKS::gc_heap::gc1() at gc.cpp:22324:13 [opt]
    frame #16: 0x000000010ececcad libcoreclr.dylib`WKS::gc_heap::garbage_collect(n=0) at gc.cpp:0:21 [opt]
    frame #17: 0x000000010ecdbc75 libcoreclr.dylib`WKS::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=0, reason=reason_alloc_soh) at gc.cpp:50393:9 [opt]
    frame #18: 0x000000010ecdddf9 libcoreclr.dylib`WKS::gc_heap::try_allocate_more_space(alloc_context*, unsigned long, unsigned int, int) [inlined] WKS::gc_heap::trigger_gc_for_alloc(gen_number=0, gr=<unavailable>, msl=0x000000010ef2e548, loh_p=<unavailable>, take_state=<unavailable>) at gc.cpp:18920:14 [opt]
    frame #19: 0x000000010ecdddf2 libcoreclr.dylib`WKS::gc_heap::try_allocate_more_space(acontext=0x00007f92ae824658, size=64, flags=2, gen_number=0) at gc.cpp:19058:34 [opt]
    frame #20: 0x000000010ed08f50 libcoreclr.dylib`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_more_space(acontext=0x00007f92ae824658, size=64, flags=2, alloc_generation_number=0) at gc.cpp:19558:18 [opt]
    frame #21: 0x000000010ed08f35 libcoreclr.dylib`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:19589:19 [opt]
    frame #22: 0x000000010ed08f1a libcoreclr.dylib`WKS::GCHeap::Alloc(this=<unavailable>, context=0x00007f92ae824658, size=64, flags=2) at gc.cpp:49327:34 [opt]
    frame #23: 0x000000010eba8aa3 libcoreclr.dylib`Alloc(size=64, flags=GC_ALLOC_CONTAINS_REF) at gchelpers.cpp:227:48 [opt]
    frame #24: 0x000000010eba9bf1 libcoreclr.dylib`AllocateObject(pMT=0x0000000110c63688, flags=GC_ALLOC_CONTAINS_REF) at gchelpers.cpp:1101:37 [opt]
    frame #25: 0x000000010eaa7cc9 libcoreclr.dylib`FieldDesc::GetStubFieldInfo() [inlined] AllocateObject(pMT=<unavailable>) at gchelpers.h:68:12 [opt]
    frame #26: 0x000000010eaa7cc2 libcoreclr.dylib`FieldDesc::GetStubFieldInfo(this=0x0000000110c08250) at field.cpp:803:49 [opt]
    frame #27: 0x000000010ebc9689 libcoreclr.dylib`JIT_GetRuntimeFieldStub(field=0x0000000110c08250) at jithelpers.cpp:3635:43 [opt]

@akoeplinger
Copy link
Member Author

There's no .crash but a .ips which is basically the same but json encoded: dotnet-2023-10-09-182901.ips.zip (also added the rendered report)

Btw. the binary was built from commit d3a782e

Interestingly the .ips points to Thread 18 (== Thread 19 in lldb since that uses 1-based indexing) as the thread that had the SIGSEGV, which points to libclrjit.dylib`Compiler::fgCompactBlocks(BasicBlock*, BasicBlock*) [inlined] BasicBlock::isLoopAlign(this=0x00007f92afd7ffd0) const at block.h:614:44 [opt]

Here's the disassembly of the function https://gist.github.com/akoeplinger/621f3de8abf8dfd01f62c941d5d552fe

I wasn't able to get lldb crashlog to do anything useful since I didn't find a way to load the .dwarf symbols (just loading it via add-dsym doesn't work, while that works for the core dump)

@akoeplinger
Copy link
Member Author

Poking at the function a bit:

(lldb) t 19
* thread #19
    frame #0: 0x00000001af72a8a1 libclrjit.dylib`Compiler::fgCompactBlocks(BasicBlock*, BasicBlock*) [inlined] BasicBlock::isLoopAlign(this=0x00007f92afd7ffd0) const at block.h:614:44 [opt]
   611
   612 	    bool isLoopAlign() const
   613 	    {
-> 614 	        return ((bbFlags & BBF_LOOP_ALIGN) != 0);
   615 	    }
   616
   617 	    void unmarkLoopAlign(Compiler* comp DEBUG_ARG(const char* reason));
(lldb) p bbNext
(BasicBlock *) 0x00007f92afd7ff68
(lldb) p bbPrev
(BasicBlock *) 0x00007f92af840001
(lldb) p bbJumpSwt
(BBswtDesc *) 0x00007f92afd7ff80
(lldb) p bbFlags
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb) p bbNum
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory

@build-analysis build-analysis bot removed this from the 9.0.0 milestone Nov 15, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 15, 2023
@jeffhandley jeffhandley added this to the 9.0.0 milestone Nov 17, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Nov 17, 2023
@danmoseley
Copy link
Member

@akoeplinger should this be in the codegen area?

@ericstj
Copy link
Member

ericstj commented Feb 22, 2024

Happened to have a look at this and it does appear to only be failing on Mac. It's a bummer we aren't getting dumps there yet @hoyosjs @carlossanlop - this type of issue would really benefit from crash symbolization. I thought with the latest changes that should be working on Macs?

Agree with @danmoseley that this looks more like codegen issue.

@ericstj ericstj added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Text.RegularExpressions labels Feb 22, 2024
@ghost
Copy link

ghost commented Feb 22, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=431489
Build error leg or test failing: Libraries Test Run release coreclr osx x64 Release
Pull request: N/A

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "Segmentation fault.*System.Text.RegularExpressions.Tests",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=431489
Error message validated: Segmentation fault.*System.Text.RegularExpressions.Tests
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 10/9/2023 11:24:45 AM UTC

Report

Build Definition Test Pull Request
574893 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #92197
574577 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98753
574484 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96440
574236 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98700
573797 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98641
573121 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98361
572337 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98562
571700 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98559
570767 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #95565
570618 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98593
570181 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98573
568683 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #89204
566192 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98361
566082 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98421
564818 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96254
564459 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98129
564084 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #91317
562975 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97640
562684 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97644
562424 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98294
561889 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98277
561350 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97738
560363 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98207
560222 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97216
560004 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96332
559858 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97096
559748 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97898
557728 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #98126
557214 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97814
554264 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97929
554182 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97999
552162 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution
550953 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97878
550874 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97388
550091 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97726
549714 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97537
548835 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97482
548727 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97777
548614 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97797
548628 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96650
547800 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #95001
547683 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97388
546819 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97738
546265 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97623
545957 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97388
544478 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97644
543975 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #95565
543882 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96961
543848 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97619
543770 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97604
543656 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97592
543318 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97388
543128 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #95565
542703 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97574
542284 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97560
541829 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97545
541105 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96707
540311 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97505
540142 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96888
539873 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96650
538905 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #97388
538303 dotnet/runtime System.Text.RegularExpressions.Tests.WorkItemExecution #96650

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 12 62
Author: akoeplinger
Assignees: -
Labels:

os-mac-os-x, area-CodeGen-coreclr, blocking-clean-ci, Known Build Error

Milestone: 9.0.0

@hoyosjs
Copy link
Member

hoyosjs commented Feb 22, 2024

I'm actively investigating a product issue where dumps are not getting collected.

@JulieLeeMSFT
Copy link
Member

@kunalspathak, it seems loop alignment related. PTAL. It is blocking clean ci.

@kunalspathak
Copy link
Member

are we still seeing this issue? I don't think so.

@hoyosjs
Copy link
Member

hoyosjs commented Mar 13, 2024

https://dev.azure.com/dnceng-public/public/_build/results?buildId=598100&view=ms.vss-test-web.build-test-results-tab&runId=14495740&resultId=215086&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

@kunalspathak that's from yesterday, the dump sadly didn't get egressed. The method that failed was Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.Lexer.AddTrivia(Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.CSharpSyntaxNode, Microsoft.CodeAnalysis.Syntax.InternalSyntax.SyntaxListBuilder ByRef) at IL offset 0x1d

@kunalspathak
Copy link
Member

@jakobbotsch @amanasifkhalid - can one of you please take a look as you recently touched the loops/block layout code. This seems to be accessing a null BasicBlock and we get seg fault.

@JulieLeeMSFT
Copy link
Member

@amanasifkhalid, PTAL.

@amanasifkhalid
Copy link
Member

amanasifkhalid commented Mar 29, 2024

fgCompactBlocks (and the rest of the JIT's flowgraph code, for that matter) has undergone a lot of churn lately. I'm no longer seeing that method come up in the backtraces for recent failures. However, recent crash reports suggest the failure is due to a System.Reflection.TargetInvocationException (example); I suppose we seg fault while trying to handle it? In that particular run, thread 14 (which is thread 15 in lldb) crashed with the exception; I included a backtrace in the above gist. You'll see a couple of threads are in the JIT during the crash, but the disassemblies don't seem to have any obvious null dereferences. BasicBlockVisit BasicBlock::VisitEHSuccs looks a bit suspect in that we don't have any guards against dereferencing a null BasicBlock*, but if we were to pass a null block pointer to it, then we should've attempted to dereference that null pointer earlier in the call stack.

Unless I'm missing something, the seg fault doesn't seem to be happening in the JIT.

@AndyAyersMS
Copy link
Member

The two recent appearances are from preliminary runs in PRs that had issues. So I would probably hold off looking at the crash dumps (if any).

@amanasifkhalid
Copy link
Member

Since this hasn't hit recently, I'm going to unmark blocking-clean-ci for now, and keep an eye on this. If it hits again, I'll revisit the crash dumps, and (if necessary) re-triage this.

@amanasifkhalid amanasifkhalid removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Apr 1, 2024
@AndyAyersMS
Copy link
Member

Hard to say, I think codegen is a good a guess as any. Are all the crashes on osx-x64?

@amanasifkhalid
Copy link
Member

amanasifkhalid commented Apr 8, 2024

It's worth noting that the most recent failure (#100658) was from an intermediate commit that hit other issues in CI, so maybe that was a false positive?

Are all the crashes on osx-x64?

Not all of them. Some of them hit on Linux arm64.

@dotnet dotnet deleted a comment from amanasifkhalid Apr 8, 2024
@riarenas
Copy link
Member

riarenas commented Apr 8, 2024

@amanasifkhalid I apologize for deleting one of your comments. Reminder that internal helix logs should not be shared in GitHub comments.

@amanasifkhalid
Copy link
Member

@riarenas no worries, sorry about that.

@amanasifkhalid
Copy link
Member

This hasn't hit on a "functional" CI run in quite a while. Are we ok with closing this?

@amanasifkhalid
Copy link
Member

The most recent hit was on a draft PR with other failures. Since we haven't had a failure block CI recently, I think we ought to close this to avoid instilling a false sense of confidence on affected PRs.

@github-actions github-actions bot locked and limited conversation to collaborators May 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Known Build Error Use this to report build issues in the .NET Helix tab os-mac-os-x macOS aka OSX
Projects
None yet
Development

No branches or pull requests