Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinned thread when selecting an instance from Arc #41314

Closed
vsevel opened this issue Jun 19, 2024 · 15 comments
Closed

Pinned thread when selecting an instance from Arc #41314

vsevel opened this issue Jun 19, 2024 · 15 comments
Labels
area/arc Issue related to ARC (dependency injection) area/fault-tolerance kind/bug Something isn't working

Comments

@vsevel
Copy link
Contributor

vsevel commented Jun 19, 2024

Describe the bug

I detected the following thread pinning:

Thread[#54,ForkJoinPool-1-worker-1,5,CarrierThreads]
java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:393)
java.base/java.lang.VirtualThread.park(VirtualThread.java:582)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
io.quarkus.arc.impl.LazyValue.get(LazyValue.java:29)
io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:69)
io.quarkus.arc.impl.ComputingCache.computeIfAbsent(ComputingCache.java:49)
io.quarkus.arc.impl.ComputingCache.getValue(ComputingCache.java:40)
io.quarkus.arc.impl.ArcContainerImpl.getResolvedBeans(ArcContainerImpl.java:970)
io.quarkus.arc.impl.Instances.resolveBeans(Instances.java:41)
io.quarkus.arc.impl.Instances.resolveBeans(Instances.java:36)
io.quarkus.arc.impl.InstanceImpl.resolve(InstanceImpl.java:339)
io.quarkus.arc.impl.InstanceImpl.<init>(InstanceImpl.java:169)
io.quarkus.arc.impl.InstanceImpl.child(InstanceImpl.java:135)
io.quarkus.arc.impl.InstanceImpl.select(InstanceImpl.java:204)
io.quarkus.arc.impl.InstanceImpl.select(InstanceImpl.java:41)
io.quarkus.arc.impl.ArcCDIProvider$ArcCDI.select(ArcCDIProvider.java:51)
io.smallrye.faulttolerance.CdiFaultToleranceSpi.lazyDependencies(CdiFaultToleranceSpi.java:76)
io.smallrye.faulttolerance.core.apiimpl.FaultToleranceImpl$BuilderImpl.lambda$build$0(FaultToleranceImpl.java:192)
io.smallrye.faulttolerance.core.apiimpl.LazyFaultTolerance.call(LazyFaultTolerance.java:28) <== monitors:1
io.smallrye.faulttolerance.FaultToleranceInterceptor.preconfiguredFlow(FaultToleranceInterceptor.java:206)
io.smallrye.faulttolerance.FaultToleranceInterceptor.intercept(FaultToleranceInterceptor.java:176)
io.smallrye.faulttolerance.FaultToleranceInterceptor_Bean.intercept(Unknown Source)
io.quarkus.arc.impl.InterceptorInvocation.invoke(InterceptorInvocation.java:42)
io.quarkus.arc.impl.AroundInvokeInvocationContext.perform(AroundInvokeInvocationContext.java:30)
io.quarkus.arc.impl.InvocationContexts.performAroundInvoke(InvocationContexts.java:27)
... business code
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
... business code
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
... business code
io.smallrye.reactive.messaging.providers.AbstractMediator.lambda$invokeBlocking$15(AbstractMediator.java:190)
io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
io.smallrye.mutiny.operators.uni.builders.UniCreateFromDeferredSupplier.subscribe(UniCreateFromDeferredSupplier.java:25)
io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
io.quarkus.virtual.threads.ContextPreservingExecutorService$ContextPreservingRunnable.run(ContextPreservingExecutorService.java:45)
java.base/java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(ThreadPerTaskExecutor.java:314)
java.base/java.lang.VirtualThread.run(VirtualThread.java:309)

we do not believe we are doing any native call in our app.
similar issue to #41313
how can we investigate this?

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

No response

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@vsevel vsevel added the kind/bug Something isn't working label Jun 19, 2024
Copy link

quarkus-bot bot commented Jun 19, 2024

/cc @Ladicek (arc), @manovotn (arc), @mkouba (arc)

@quarkus-bot quarkus-bot bot added the area/arc Issue related to ARC (dependency injection) label Jun 19, 2024
@mkouba
Copy link
Contributor

mkouba commented Jun 19, 2024

Hm, I thought that java.util.concurrent.locks.ReentrantLock is the recommended way to replace the synchronized blocks in order to avoid pinning 🤔.

CC @mariofusco @cescoffier @xstefank

@vsevel
Copy link
Contributor Author

vsevel commented Jun 19, 2024

I believe so as well. I do not know if there is synchronized or a wait hidden in the stack. would we see it in this stacktrace?
I suppose so according to PinnedThreadPrinter.printStackTrace():

    private static void printStackTrace(List<LiveStackFrame> stack,
                                        PrintStream out,
                                        boolean printAll) {
        out.println(Thread.currentThread());
        for (LiveStackFrame frame : stack) {
            var ste = frame.toStackTraceElement();
            int monitorCount = frame.getMonitors().length;
            if (monitorCount > 0) {
                out.format("    %s <== monitors:%d%n", ste, monitorCount);
            } else if (frame.isNativeMethod() || printAll) {
                out.format("    %s%n", ste);
            }
        }
    }

@xstefank
Copy link
Member

Most likely there is. ReentrantLock is working.

@vsevel
Copy link
Contributor Author

vsevel commented Jun 19, 2024

Most likely there is.

do you know how to chase it?

@xstefank
Copy link
Member

@vsevel I don't think there is a general way how to chase it. Can you provide me with a reproducer? Just to be sure there are no monitors in your business code? I can't reproduce it.

@mariofusco
Copy link
Contributor

@vsevel I could also give a look at this, but I agree with @xstefank on the fact that we need a reproducer.

@vsevel
Copy link
Contributor Author

vsevel commented Jun 20, 2024

it is unlikely that I will be able to give a reproducer. I am totally fine if at some point we close it with "cannot reproduce". I still wanted to create the issue so that we have a place to provide as much context as possible, and that is something that can be searched by others that experience the same issue.

@vsevel
Copy link
Contributor Author

vsevel commented Jun 20, 2024

a bit more information. I looked at the app that was experiencing that particular issue, with that exact stacktrace.
it all happened on the same day. and in 2 occasions I looked at the error occurred just after startup.
and what is particularly disturbing is that in these 2 cases, the application experienced multiple exceptions and timeouts, and eventually killed shortly after because of the SmallRye Reactive Messaging - liveness check being down. one time it was because of org.jboss.resteasy.reactive.ClientWebApplicationException: Received: ', status code 401' when invoking ..., and the other time it was {"name":"SmallRye Reactive Messaging - liveness check","status":"DOWN","data":{"x":"[OK]","x":"[KO] - Multiple exceptions caught:\n\t[Exception 0] org.jboss.resteasy.reactive.ClientWebApplicationException: Received: 'Bad Gateway, status code 502' when invoking:.
I do not know what is causing the other.

@franz1981
Copy link
Contributor

franz1981 commented Jun 25, 2024

I see io.smallrye.faulttolerance.core.apiimpl.LazyFaultTolerance.call(LazyFaultTolerance.java:28) <== monitors:1

Which have a monitor there?

@mariofusco
Copy link
Contributor

I investigated a little bit more this problem and, as correctly reported by @franz1981, I believe it is caused by this synchronization https://github.com/smallrye/smallrye-fault-tolerance/blame/63dce3532edbaa7c2b666671a5e7a1e0700d6521/implementation/core/src/main/java/io/smallrye/faulttolerance/core/apiimpl/LazyFaultTolerance.java#L41

In general I don't like that double check lazy initialization pattern, but I understand that sometimes it is necessary. I will try to replace it with an AtomicReference, but if you have any better suggestion please let me know. /cc @Ladicek @FroMage

@mkouba
Copy link
Contributor

mkouba commented Jul 12, 2024

I investigated a little bit more this problem and, as correctly reported by @franz1981, I believe it is caused by this synchronization https://github.com/smallrye/smallrye-fault-tolerance/blame/63dce3532edbaa7c2b666671a5e7a1e0700d6521/implementation/core/src/main/java/io/smallrye/faulttolerance/core/apiimpl/LazyFaultTolerance.java#L41

In general I don't like that double check lazy initialization pattern, but I understand that sometimes it is necessary. I will try to replace it with an AtomicReference, but if you have any better suggestion please let me know. /cc @Ladicek @FroMage

Thanks for your investigation Mario!

@mariofusco
Copy link
Contributor

mariofusco commented Jul 12, 2024

See smallrye/smallrye-fault-tolerance#1035

@mariofusco
Copy link
Contributor

This issue should have been fixed with the merging of this pull request smallrye/smallrye-fault-tolerance#1035

@mkouba
Copy link
Contributor

mkouba commented Sep 26, 2024

smallrye/smallrye-fault-tolerance#1035 is part of SR FT 6.4.0 that is used in Quarkus 3.14.0.CR1+.

@mkouba mkouba closed this as completed Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/arc Issue related to ARC (dependency injection) area/fault-tolerance kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants