-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReferenceProcessor does not trace transitively discovered SoftReference instances #1125
Comments
I think we had some discussion about soft reference in #564 (comment) and #564 (review). At that time, we concluded it was fine to not transitively trace soft references. So the actual issue is that we add more soft references to the soft reference list. They are not scanned, and not retained, but we attempt to forward those references. Is this understanding correct? |
Yes. The Java API allows different policies for retaining
Exactly. They should either be retained or cleared, but it's currently neither. And we attempted to forward the referent (The The easiest fix should be clearing the newly added fn retain<E: ProcessEdgesWork>(&self, trace: &mut E, _nursery: bool) {
// ...
for reference in sync.references.iter() {
// ...
if !reference.is_live::<E::VM>() {
// Reference is currently unreachable but may get reachable by the
// following trace. We postpone the decision.
continue;
}
// ...
}
// ...
} We postponed the decision, but we didn't make decision at the right time. See: pub fn scan_soft_refs<E: ProcessEdgesWork>(&self, trace: &mut E, mmtk: &'static MMTK<E::VM>) {
if !mmtk.state.is_emergency_collection() {
self.soft.retain::<E>(trace, is_nursery_gc(mmtk.get_plan()));
}
self.soft.scan::<E>(trace, is_nursery_gc(mmtk.get_plan())); // ERROR: It's too early!!!!!!!!!!!
} The call to One way to implement it is making use of the "bucket sentinel" feature I introduced for implementing VM-side reference processing. If we choose to retain soft references, we should postpone BTW, I think the bucket names |
A related bug is that finalizable objects can also expand the transitive closure when resurrected. If a finalizable object points to a import java.lang.ref.*;
class Bar {
}
class Foo {
public static Foo resurrectedFoo = null;
public WeakReference<Bar> weak;
public Foo() {
this.weak = new WeakReference<>(new Bar());
}
@Override
public void finalize() {
System.out.println("I am live again!");
resurrectedFoo = this;
}
}
public class FinalAndWeak {
public static void mkObject() {
Foo foo = new Foo();
}
public static void main(String[] args) throws InterruptedException {
System.out.println("Making object...");
mkObject();
System.out.println("Triggering GC...");
System.gc();
while (Foo.resurrectedFoo == null) {
System.out.println("Waiting for Foo.finalize()...");
Thread.sleep(100);
}
Foo theFoo = Foo.resurrectedFoo;
System.out.println(theFoo.weak.get());
}
} To fix this problem, we need to re-scan soft and weak references after resurrecting finalizable objects. |
By re-scanning soft and weak references after resurrecting finalizable objects, the crash disappeared. However, something is still inconsistent. For example: import java.lang.ref.*;
class Bar {
int num;
public Bar(int num) {
this.num = num;
}
@Override
public String toString() {
return String.format("Bar(%d)", this.num);
}
}
class Foo {
public static Foo resurrectedFoo = null;
public WeakReference<Bar> weak1;
public WeakReference<Bar> weak2;
public Bar bar2;
public Foo(Bar bar1, Bar bar2) {
this.weak1 = new WeakReference<>(bar1);
this.weak2 = new WeakReference<>(bar2);
this.bar2 = bar2;
}
@Override
public void finalize() {
System.out.println("I am live again!");
resurrectedFoo = this;
}
}
public class FinalAndWeak {
public static WeakReference<Bar> mkObject() {
Bar bar1 = new Bar(1);
Bar bar2 = new Bar(2);
Foo foo = new Foo(bar1, bar2);
WeakReference<Bar> ref2 = new WeakReference<>(bar2);
return ref2;
}
public static void main(String[] args) throws InterruptedException {
System.out.println("Making object...");
WeakReference<Bar> ref2 = mkObject();
System.out.println("Triggering GC...");
System.gc();
while (Foo.resurrectedFoo == null) {
System.out.println("Waiting for Foo.finalize()...");
Thread.sleep(100);
}
Foo theFoo = Foo.resurrectedFoo;
System.out.format("theFoo.weak1.get() == %s%n", theFoo.weak1.get());
System.out.format("theFoo.weak2.get() == %s%n", theFoo.weak2.get());
System.out.format("theFoo.bar2 == %s%n", theFoo.bar2);
System.out.format("ref2.get() == %s%n", ref2.get());
}
} There is a weak reference from the root to
Strictly speaking, this is not compliant with the Java API. The Java API demands that (link)
The result clearly shows that it didn't clear all weak references. However, the official OpenJDK behaves this way, too. Here is the result from vanilla OpenJDK 22:
I won't worry too much about this detail because the vanilla OpenJDK is not compliant, either. |
After retaining soft references, we postpone the scanning of soft references until the transitive closure from retained soft references is fully expanded. While tracing and scanning new objects reached from the referents of soft references, new soft references may be discovered, and they will be added to the reference processor. By postponing the scanning, we ensure the newly discovered soft references are scanned, too. To ensure we do not accidentally enqueue more objects than we should, we no longer call `trace_object` when *scanning* soft, weak and phantom references. Instead, we call `ObjectReference::get_forwarded_object` to get the forwarded references and referents. We still call `trace_object` when *retaining* soft references, and when forwarding references in MarkCompact during the second transitive closure. Fixes: #1125
After retaining and scanning
SoftReference
, it generatesProcessEdgesWork
work packets to trace the children of retainedSoftReference
instances. During this time, moreSoftReference
instances will be discovered, but they will not be traced. That will cause those transitively discoveredSoftReference
to contain un-updated reference to the children in the from-space, or crash MarkCompact due to those object not having forwarding pointers.This bug was originally discovered when running CI tests for the PR removing
ObjectReference::NULL
when runningfop
in DaCapo 2006 using MarkCompact. See: https://github.com/mmtk/mmtk-openjdk/actions/runs/8750247972/job/24016142656?pr=265The program
How to crash SemiSpace?
Run it with
MMTK_NO_REFERENCE_TYPE=true
, preferrably with stack trace and logs enabled.It will print:
How to crash MarkCompact
We first patch mmtk-core so that
MarkCompactSpace::trace_forward_object
asserts the object has forwarding pointer. If an object is reachable during theSecondRoot
stage or theRefForwarding
stage, it must be live, and its forwarding pointer must have been computed during theCalculateForwarding
stage.Then run the same program with the same env vars except using MarkCompact. It will crash with the following log:
What went wrong?
This portion of the log reveals the cause:
A new soft candidate is discovered after processing all soft references. The OpenJDK binding discovers soft/weak/phantom references gradually during tracing. The new soft candidate 0x4066bf28 was added. Itself was traced. But when it was scanned, it added itself to the reference processor using
add_soft_candidate
. However, it never gets processed, and its referent is not traced, either.In SemiSpace, the referent pointer still pointed to the from-space. During the
Release
stage, it tried to find references to enqueue, and the debug assertion discovered that some SoftReference still pointed to the from space.In MarkCompact, things were a bit complicated. During
CalculateForwarding
, the GC computeed the forwarded addresses of all live objects. However, since the referent of the newly discovered SoftReference was not marked, it was considered dead, and was not assigned a forwarding pointer. However, duringSecondRoot
, the GC computed another transitive closure and found the SoftReference to be live. It then attempted to forward the referent (which is dead), and found it did not have a forwarding pointer. The error would have silently slipped away without checking, but after I removedObjectReference::NULL
, it started to require an explicit NULL check against the forwarding pointer, and caught the bug.The text was updated successfully, but these errors were encountered: