Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow roots to be pinned for StickyImmix nursery collections #1108

Merged
merged 2 commits into from
Apr 17, 2024

Conversation

k-sareen
Copy link
Collaborator

@k-sareen k-sareen commented Apr 10, 2024

This PR allows StickyImmix to properly deal with transitive pinning trace. Before this PR, StickyImmix may move objects in the transitive pinning trace, as it simply uses GenNurseryProcessEdges which allows moving. With this PR, for transitive pinning trace, StickyImmix uses GenNurseryProcessEdges<..., TRACE_KIND_TRANSITIVE_PIN>. This PR fixes #1097.

@k-sareen k-sareen requested review from qinsoon and wks April 10, 2024 07:05
@k-sareen
Copy link
Collaborator Author

k-sareen commented Apr 10, 2024

Again reinforcing that transitively pinning is still unsupported for the StickyImmix nursery GC (though it should be possible to support it). The main question is how to treat objects in the modbuffer.

@wks Can you try this PR out for Ruby to see if it allows objects to move during the nursery GC?

type ProcessEdgesWorkType = GenNurseryProcessEdges<VM, Self::PlanType>;
type TPProcessEdges = GenNurseryProcessEdges<VM, Self::PlanType>;
type ProcessEdgesWorkType = GenNurseryProcessEdges<VM, Self::PlanType, KIND>;
type TPProcessEdges = UnsupportedProcessEdges<VM>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will changing this break the Julia binding? @qinsoon @udesou

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't work if a binding uses transitive pinning. Can we use TRACE_KIND_FAST here? Will that work?

Copy link
Collaborator Author

@k-sareen k-sareen Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I had I used TRACE_KIND_TRANSITIVE_PIN in my fork before, but then Eduardo mentioned that transitively pinning has unclear semantics for the modbuffer which I did not consider since ART does not do transitively pinning roots. I decided to make it explicit that transitively pinning is still unsupported instead of having a broken implementation in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was that if this will completely break the Julia binding since it's changed to "unsupported" instead of "just working" tm since Julia disables copying for nursery GCs (i.e. sets PREFER_COPY_ON_NURSERY_GC to false).

Comment on lines 16 to 17
type ProcessEdgesWorkType = GenNurseryProcessEdges<VM, Self::PlanType, KIND>;
type TPProcessEdges = UnsupportedProcessEdges<VM>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That breaks Ruby, too. Basically if you set TPProcessEdges to UnsupportedProcessEdges, any unmovable roots (delivered via ProcessEdgesWorkRootsWorkFactory::create_process_pinning_roots_work) will create an work packet that uses UnsupportedProcessEdges to trace the roots, which crashes.

Suggested changes:

Suggested change
type ProcessEdgesWorkType = GenNurseryProcessEdges<VM, Self::PlanType, KIND>;
type TPProcessEdges = UnsupportedProcessEdges<VM>;
type ProcessEdgesWorkType = GenNurseryProcessEdges<VM, Self::PlanType, DEFAULT_TRACE>;
type TPProcessEdges = GenNurseryProcessEdges<VM, Self::PlanType, TRACE_KIND_TRANSITIVE_PIN>;

And you can remove the generic parameter <KIND> from StickyImmixNurseryGCWorkContext, too.

I think the type member type TPProcessEdges is misleading. It's not really related to "transitive pinning" per se. The semantics is:

  • If an edge is updateable (i.e. its target is allowed to move), the GC should create a ProcessEdgesWorkType to process that edge.
  • If an edge cannot be updated (i.e. its target must not move), the GC should create a TPProcessEdges to process that edge.

What really happens is that when creating root packets, there are three options:

method process edges from root to object using process edges between objects using
create_process_edge_roots_work ProcessEdgesWorkType ProcessEdgesWorkType
create_process_pinning_roots_work TPProcessEdges ProcessEdgesWorkType
create_process_tpinning_roots_work TPProcessEdges TPProcessEdges

The kind of root selects the strategy of choosing which packets to use. Non-transitive pinning roots process roots using TPProcessEdges but after that everything else can be processed with ProcessEdgesWorkType. Transitive pinning is implemented with forcing all edges reachable from those roots to be traced using TPProcessEdges.

So the difference is just whether the edge can be updated. I think there is much room for refactoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohkay -- I should probably have tested that one-line change with ART before making this PR. It probably worked in my fork because I implemented TPProcessEdges instead of using the "unsupported" one.

This is very confusing. I re-read the create_process_pinning_roots_work function and it's difficult to reason where the type parameters I and E are actually coming from and what they will be when using pinning roots vs transitively pinning roots. Your explanation makes sense but this is nowhere in the doc comment for the ProcessEdgesWorkRootsWorkFactory type.

@wks
Copy link
Collaborator

wks commented Apr 10, 2024

Again reinforcing that transitively pinning is still unsupported for the StickyImmix nursery GC (though it should be possible to support it). The main question is how to treat objects in the modbuffer.

@wks Can you try this PR out for Ruby to see if it allows objects to move during the nursery GC?

I confirmed that with this PR (and the suggested changes I mentioned in the last comment), Ruby is able to pin object during nursery GC, and the GC is also able to move objects during GC. I am not sure whether it works perfectly because the Ruby binding itself has not been fully tested against moving GC, yet.

@k-sareen
Copy link
Collaborator Author

Done @wks

@k-sareen
Copy link
Collaborator Author

But now with the fix, transitively pinning is actually unsupported (given the issue regarding modbuffer) but MMTk will allow a VM developer to proceed regardless. This can blowup in glorious ways.

@wks
Copy link
Collaborator

wks commented Apr 10, 2024

But now with the fix, transitively pinning is actually unsupported (given the issue regarding modbuffer) but MMTk will allow a VM developer to proceed regardless. This can blowup in glorious ways.

I don't think it conflicts with transitive pinning. They are orthogonal.

The fields of the objects in the modbuf should be considered non-pinning roots (unless the VM has something like "potential pinning parents" like Ruby). If an object is added to the modbuf, it simply means the object has been modified before the current GC, and some of its children may point to young objects. Being in the modbuf doesn't mean the field of the object cannot be updated. But if the same young object is also reachable from pinning roots, the object will be pinned; if it's reachable from transitive pinning roots, it and its children will all be pinned.

I think the current implementation of ProcessModBuf is fine:

        if is_nursery_gc() {
            // Scan objects in the modbuf and forward pointers
            let modbuf = std::mem::take(&mut self.modbuf);
            GCWork::do_work(
                &mut ScanObjects::<E>::new(modbuf, false, WorkBucketStage::Closure), // Put into the Closure bucket
                worker,
                mmtk,
            )
        }

As long as the ScanObjects work packet is put into the Closure bucket, it will be executed after all the transitive pinning roots and their children are traced, and also after all non-transitive pinning roots are traced. It will still keep the children of objects in the modbuf alive, but whether they are pinned is determined by other (transitively pinning or non-transitively pinning) roots.

@k-sareen
Copy link
Collaborator Author

Hm. I see. It'd be better if Julia could actually test this PR out to ensure that nothing breaks since neither ART nor Ruby use the transitively pinning roots.

@qinsoon qinsoon added the PR-extended-testing Run extended tests for the pull request label Apr 11, 2024
@k-sareen
Copy link
Collaborator Author

@qinsoon @wks The JikesRVM test failure was due to a timeout. Have we seen something like this before? I am not sure why this PR would cause a timeout.

@qinsoon
Copy link
Member

qinsoon commented Apr 12, 2024

@qinsoon @wks The JikesRVM test failure was due to a timeout. Have we seen something like this before? I am not sure why this PR would cause a timeout.

I think we have seen timeouts, but not frequently.

@k-sareen k-sareen requested a review from wks April 12, 2024 06:11
@k-sareen
Copy link
Collaborator Author

JikesRVM just seems like it's a flaky CI. This code doesn't even touch the GCs that JikesRVM supports. I don't know why we get the failing instruction not in RVM address error. Should I investigate further?

@k-sareen
Copy link
Collaborator Author

Idk why the ready-to-merge check timed out. I just re-ran it.

@wks
Copy link
Collaborator

wks commented Apr 12, 2024

https://github.com/mmtk/mmtk-core/actions/runs/8657791364/job/23740512154
It looks like it timed out while running GenImmix. This is different from #1103 which is caused by a race condition in MarkSweep.

It may be an unknown bug.

@k-sareen
Copy link
Collaborator Author

k-sareen commented Apr 12, 2024

Idk why the ready-to-merge check timed out. I just re-ran it.

@qinsoon I think the ready-to-merge check is broken when we have the binding tests. They're called "extended-tests-<binding name>" instead of "<binding name>-binding-test".

@k-sareen
Copy link
Collaborator Author

k-sareen commented Apr 12, 2024

https://github.com/mmtk/mmtk-core/actions/runs/8657791364/job/23740512154 It looks like it timed out while running GenImmix. This is different from #1103 which is caused by a race condition in MarkSweep.

It may be an unknown bug.

Yikes. OpenJDK shouldn't time out since this should only change how the pinning/transitively pinning roots work. I'll investigate locally when I have the time.

OR This could be the result of the new change where we removed the coordinator. It would be ideal to be able to ssh to the machine and then inspect the deadlocked process.

@wks
Copy link
Collaborator

wks commented Apr 12, 2024

OR This could be the result of the new change where we removed the coordinator. ...

Yes. This is possible. By removing the coordinator, the mechanism for triggering GC has changed. There could be some corner cases that are not handled properly, but has not been discovered, yet.

@k-sareen k-sareen force-pushed the fix/sticky-immix-nursery-pinning-roots branch from fbf651b to f612af4 Compare April 16, 2024 23:43
@k-sareen
Copy link
Collaborator Author

@qinsoon I'm not sure if the ready to merge check was fixed properly. It seems like it's still waiting on something

@qinsoon
Copy link
Member

qinsoon commented Apr 17, 2024

@qinsoon I'm not sure if the ready to merge check was fixed properly. It seems like it's still waiting on something

Not sure what is happening there. The problem of wrong ignored action names was fixed.

@qinsoon
Copy link
Member

qinsoon commented Apr 17, 2024

@qinsoon I'm not sure if the ready to merge check was fixed properly. It seems like it's still waiting on something

Yeah. It was actually a mistake. #1119 should fix it. The action name for OpenJDK was changed due to the reused workflow. So the merge check was actually waiting for the OpenJDk tests to finish (as it was not ignored).

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Merged via the queue into mmtk:master with commit 5f5c1d0 Apr 17, 2024
24 checks passed
@k-sareen k-sareen deleted the fix/sticky-immix-nursery-pinning-roots branch April 17, 2024 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-extended-testing Run extended tests for the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StickyImmix nursery GC moves objects pointed by pinning roots
3 participants