Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a fix for an incorrect assumption of an assertion failure re: failure to decommit. #105849

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mrsharm
Copy link
Member

@mrsharm mrsharm commented Aug 1, 2024

The original condition didn't take into account if the decommit had succeeded and only checked if we are using large pages. The assertion failure was discovered by some our Linux based Reliability Framework Stress Runs with the following call stack:

0:007> k
 # Child-SP          RetAddr               Call Site
00 00007fa7`f8c77a10 00007fa7`fcb3ae30     libc_so!wait4+0x57
01 00007fa7`f8c77a40 00007fa7`fcb3c0d8     libcoreclr!PROCCreateCrashDump+0x410 [/home/vwazho/runtime/src/coreclr/pal/src/thread/process.cpp @ 2308] 
02 00007fa7`f8c77ab0 00007fa7`fcb3948d     libcoreclr!PROCCreateCrashDumpIfEnabled+0xad8 [/home/vwazho/runtime/src/coreclr/pal/src/thread/process.cpp @ 15732480] 
03 00007fa7`f8c77b50 00007fa7`fcb3938b (T) libcoreclr!PROCAbort+0x2d [/home/vwazho/runtime/src/coreclr/pal/src/thread/process.cpp @ 2559] 
04 00007fa7`f8c77b70 00007fa7`fc97eea0 (T) libcoreclr!RaiseFailFastException+0x66 [/home/vwazho/runtime/src/coreclr/pal/src/thread/process.cpp @ 1276] 
05 00007fa7`f8c77b90 00007fa7`fc97eca5     libcoreclr!FailFastOnAssert+0x1b [/home/vwazho/runtime/src/coreclr/utilcode/debug.cpp @ 63] 
06 00007fa7`f8c77ba0 00007fa7`fc97ef24     libcoreclr!_DbgBreakCheck+0x2d5 [/home/vwazho/runtime/src/coreclr/utilcode/debug.cpp @ 15732480] 
07 00007fa7`f8c78c50 00007fa7`fc97f1b3     libcoreclr!_DbgBreakCheckNoThrow+0x84 [/home/vwazho/runtime/src/coreclr/utilcode/debug.cpp @ 15732480] 
08 00007fa7`f8c78cd0 00007fa7`fc84af70     libcoreclr!DbgAssertDialog+0x73 [/home/vwazho/runtime/src/coreclr/utilcode/debug.cpp @ 15732480] 
09 00007fa7`f8c78d10 00007fa7`fc82f39a     libcoreclr!SVR::gc_heap::decommit_region+0x120 [/home/vwazho/runtime/src/coreclr/gc/gc.cpp @ 44143] 
0a 00007fa7`f8c78d50 00007fa7`fc82efc7     libcoreclr!SVR::gc_heap::decommit_step+0xea [/home/vwazho/runtime/src/coreclr/gc/gc.cpp @ 44072] 
0b 00007fa7`f8c78d90 00007fa7`fc82e796 (T) libcoreclr!SVR::gc_heap::gc_thread_function+0x827 [/home/vwazho/runtime/src/coreclr/gc/gc.cpp @ 15732480] 
0c 00007fa7`f8c78dd0 00007fa7`fc6f0b53     libcoreclr!SVR::gc_heap::gc_thread_stub+0x31 [/home/vwazho/runtime/src/coreclr/gc/gc.cpp @ 37266] 
0d (Inline Function) --------`--------     libcoreclr!<unnamed-namespace>::CreateNonSuspendableThread::$_1::operator()+0x4f [/home/vwazho/runtime/src/coreclr/vm/gcenv.ee.cpp @ 1525] 
0e 00007fa7`f8c78df0 00007fa7`fcb3f142     libcoreclr!<unnamed-namespace>::CreateNonSuspendableThread::$_1::__invoke+0x53 [/home/vwazho/runtime/src/coreclr/vm/gcenv.ee.cpp @ 1510] 
0f 00007fa7`f8c78e20 00007fa7`fcce4134     libcoreclr!CorUnix::CPalThread::ThreadEntry+0x3c2 [/home/vwazho/runtime/src/coreclr/pal/src/thread/thread.cpp @ 1744] 
10 00007fa7`f8c78ee0 00007fa7`fcd647dc     libc_so!pthread_condattr_setpshared+0x4d4
11 00007fa7`f8c78f80 ffffffff`ffffffff     libc_so!_xmknodat+0x23c
12 00007fa7`f8c78f88 00000000`00000000     0xffffffff`ffffffff

TODO: Stress testing for this is still needed to ensure this assertion isn't fired anymore. CC: @VincentBu

…ut the equality of heap segment used and mem
Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

src/coreclr/gc/gc.cpp Show resolved Hide resolved
@Maoni0
Copy link
Member

Maoni0 commented Aug 8, 2024

what kind of testing has been done on this change? it seems like if you want to actually delete the region when decommit succeeded, the check for require_clearing_memory_p is backwards, because if decommit_succeeded_p is true and we are not using large pages, require_clearing_memory_p is false. so you want to do

    if (!require_clearing_memory_p)
    {
        global_region_allocator.delete_region (get_region_start (region));
    }

but this is also incorrect because if use_large_pages_p is true which would make require_clearing_memory_p true, you do want to delete the region. use_large_pages_p being true just means we treat decommit always as success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants