Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable computation of topStack on Windows platforms #54986

Closed
wants to merge 1 commit into from

Conversation

davidwrighton
Copy link
Member

  • And disable features which depend on it when it is no longer calculated (stack_limit and conservative GC)
  • Move logic which has been uncoditionally enabled if FEATURE_CONSERVATIVE_GC is enabled to be enabled unconditionally, as that is the current tested scenario.
  • When disabling stack_limit, keep the size/layout of the ScanContext the same, to preserve abi compat
    On Windows platforms when we hijack, we use SuspendThread from outside the thread, and if we suspend
    within an interruptible region, we can stop with the top Frame that is an InlinedCallFrame
    that is not active. In that situation, it is possible for the actual top of the stack to be a managed
    function which has a lower SP pointer than is reported from the InlinedCallFrame, or in the case of
    Windows x86 platforms, the InlinedCallFrame may report a GetCallSiteSP of 0, or of an value associated
    with some other callsite within the managed function that set up the InlinedCallFrame. This issue could
    be fixed by capturing the SP after suspension, and storing it on the thread object for use when setting
    stack_limit, but as conservative GC is not needed on any current Windows platforms, and the Unix scenario
    is fully functional, this PR change simply disables the feature.

Fixes #45326

- And disable features which depend on it when it is no longer calculated (stack_limit and conservative GC)
- Move logic which has been uncoditionally enabled if FEATURE_CONSERVATIVE_GC is enabled to be enabled unconditionally, as that is the current tested scenario.
- When disabling stack_limit, keep the size/layout of the ScanContext the same, to preserve abi compat
On Windows platforms when we hijack, we use SuspendThread from outside the thread, and if we suspend
within an interruptible region, we can stop with the top Frame that is an InlinedCallFrame
that is not active. In that situation, it is possible for the actual top of the stack to be a managed
function which has a lower SP pointer than is reported from the InlinedCallFrame, or in the case of
Windows x86 platforms, the InlinedCallFrame may report a GetCallSiteSP of 0, or of an value associated
with some other callsite within the managed function that set up the InlinedCallFrame. This issue could
be fixed by capturing the SP after suspension, and storing it on the thread object for use when setting
stack_limit, but as conservative GC is not needed on any current Windows platforms, and the Unix scenario
is fully functional, this PR change simply disables the feature.
@davidwrighton
Copy link
Member Author

@Maoni0 I made a small tweak to the gc handling of FEATURE_CONSERVATIVE_GC so that disabling that ifdef didn't send us down paths we haven't tested in years. Could you take a quick look when you get a chance?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Comment on lines -21177 to -21180
#ifdef FEATURE_CONSERVATIVE_GC
if (interior >= heap_segment_allocated(seg))
return 0;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionally I don't see a problem. I'm curious - could interior be actually >= heap_segment_allocated if GCConfig::GetConservativeGC() is false? this would mean it's reporting an interior pointer that's indeed on a segment but not pointing at a valid object on that seg.

Comment on lines +976 to +980
#ifdef FEATURE_SUPPORTS_STACK_LIMIT
uintptr_t stack_limit; // Lowest point on the thread stack that the scanning logic is permitted to read
#else
uintptr_t stack_limitUNUSED; // Keep the shape of ScanContext the same whether or not STACK_LIMIT is supported for use
#endif // FEATURE_SUPPORTS_STACK_LIMIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this distinction in this file? GC doesn't care at all about whether this is a stack limit you maintain or not... it would be good to not have this knowledge in the GC interface unless it's mandatory...

#else
#if defined(FEATURE_CONSERVATIVE_GC)
// On Windows platforms when we hijack, we use SuspendThread from outside the thread, and if we suspend
// within an interruptible region, we can stop with the top Frame that is an InlinedCallFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect RedirectedThreadFrame to be the top Frame when we stop within an interruptible region and start GC walk. Why are the problematic cases missing RedirectedThreadFrame ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point. I thought I had managed to reproduce the scenario (or something very like it) in the debugger, but I agree, I can't quite see how we didn't end up with a proper RedirectedThreadFrame. In any case, the best crash dump I could get from the CI system had a disabled InlinedCallFrame as the top of the framestack. (Unfortunately, for some reason, the thread that frame is on had been terminated, so it wasn't completely workable.) I'll try a again with some debugger fun times.

@davidwrighton
Copy link
Member Author

@jkotas Good catch, this really ends up just silencing the assert. The actual problem is that if we have a function that contains both a reverse p/invoke and an InlinedCallFrame, which is called from native, we see a weird state where the top of stack is an InlinedCallFrame which is inactive, and a actual interesting logic needs to begin, somewhere else. I'm experimenting with various approaches to address the problem, but I'm going to close this PR as not fixing the problem.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 1, 2021
@davidwrighton davidwrighton deleted the fix_45326 branch April 13, 2023 18:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure: readytorun/coreroot_determinism/coreroot_determinism/coreroot_determinism.sh
3 participants