Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

native compilation bring in unexpected Netty classes #42903

Open
franz1981 opened this issue Aug 30, 2024 · 18 comments
Open

native compilation bring in unexpected Netty classes #42903

franz1981 opened this issue Aug 30, 2024 · 18 comments
Assignees

Comments

@franz1981
Copy link
Contributor

franz1981 commented Aug 30, 2024

Description

Compiling natively https://github.com/franz1981/quarkus-profiling-workshop produces a list of different recheable classes for Netty, including these:

io.netty.buffer.AbstractByteBuf
io.netty.buffer.AbstractByteBufAllocator
io.netty.buffer.AbstractDerivedByteBuf
io.netty.buffer.AbstractPooledDerivedByteBuf
io.netty.buffer.AbstractPooledDerivedByteBuf$PooledNonRetainedDuplicateByteBuf
io.netty.buffer.AbstractPooledDerivedByteBuf$PooledNonRetainedSlicedByteBuf
io.netty.buffer.AbstractReferenceCountedByteBuf
io.netty.buffer.AbstractReferenceCountedByteBuf$1
io.netty.buffer.AbstractUnpooledSlicedByteBuf
io.netty.buffer.AbstractUnsafeSwappedByteBuf
io.netty.buffer.AdvancedLeakAwareByteBuf
io.netty.buffer.AdvancedLeakAwareCompositeByteBuf
io.netty.buffer.ByteBuf
io.netty.buffer.ByteBufAllocator
io.netty.buffer.ByteBufInputStream
io.netty.buffer.ByteBufUtil
io.netty.buffer.ByteBufUtil$1
io.netty.buffer.ByteBufUtil$HexUtil
io.netty.buffer.ByteBufUtil$SWARByteSearch
io.netty.buffer.ByteBufUtil$ThreadLocalDirectByteBuf
io.netty.buffer.ByteBufUtil$ThreadLocalDirectByteBuf$1
io.netty.buffer.ByteBufUtil$ThreadLocalUnsafeDirectByteBuf
io.netty.buffer.ByteBufUtil$ThreadLocalUnsafeDirectByteBuf$1
io.netty.buffer.CompositeByteBuf
io.netty.buffer.CompositeByteBuf$Component
io.netty.buffer.CompositeByteBuf$CompositeByteBufIterator
io.netty.buffer.DefaultByteBufHolder
io.netty.buffer.DuplicatedByteBuf
io.netty.buffer.EmptyByteBuf
io.netty.buffer.HeapByteBufUtil
io.netty.buffer.IntPriorityQueue
io.netty.buffer.LongLongHashMap
io.netty.buffer.PoolArena
io.netty.buffer.PoolArena$DirectArena
io.netty.buffer.PoolArena$HeapArena
io.netty.buffer.PoolChunk
io.netty.buffer.PoolChunkList
io.netty.buffer.PoolSubpage
io.netty.buffer.PoolThreadCache
io.netty.buffer.PoolThreadCache$FreeOnFinalize
io.netty.buffer.PoolThreadCache$MemoryRegionCache
io.netty.buffer.PoolThreadCache$MemoryRegionCache$1
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
io.netty.buffer.PoolThreadCache$NormalMemoryRegionCache
io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache
io.netty.buffer.PooledByteBuf
io.netty.buffer.PooledByteBufAllocator
io.netty.buffer.PooledByteBufAllocator$1
io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache
io.netty.buffer.PooledByteBufAllocatorMetric
io.netty.buffer.PooledDuplicatedByteBuf
io.netty.buffer.PooledDuplicatedByteBuf$1
io.netty.buffer.PooledHeapByteBuf
io.netty.buffer.PooledHeapByteBuf$1
io.netty.buffer.PooledSlicedByteBuf
io.netty.buffer.PooledSlicedByteBuf$1
io.netty.buffer.PooledUnsafeDirectByteBuf
io.netty.buffer.PooledUnsafeDirectByteBuf$1
io.netty.buffer.PooledUnsafeHeapByteBuf
io.netty.buffer.PooledUnsafeHeapByteBuf$1
io.netty.buffer.ReadOnlyByteBuf
io.netty.buffer.SimpleLeakAwareByteBuf
io.netty.buffer.SimpleLeakAwareCompositeByteBuf
io.netty.buffer.SizeClasses
io.netty.buffer.SwappedByteBuf
io.netty.buffer.Unpooled
io.netty.buffer.UnpooledByteBufAllocator
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledDirectByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeDirectByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeHeapByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf
io.netty.buffer.UnpooledByteBufAllocator$UnpooledByteBufAllocatorMetric
io.netty.buffer.UnpooledDirectByteBuf
io.netty.buffer.UnpooledDuplicatedByteBuf
io.netty.buffer.UnpooledHeapByteBuf
io.netty.buffer.UnpooledSlicedByteBuf
io.netty.buffer.UnpooledUnsafeDirectByteBuf
io.netty.buffer.UnpooledUnsafeHeapByteBuf
io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf
io.netty.buffer.UnreleasableByteBuf
io.netty.buffer.UnsafeByteBufUtil
io.netty.buffer.UnsafeDirectSwappedByteBuf
io.netty.buffer.UnsafeHeapSwappedByteBuf
io.netty.buffer.WrappedByteBuf
io.netty.buffer.WrappedCompositeByteBuf

which include some "unexpected" (to me at least) mutual exclusive variant o the same e.g.:

The same thing should applies to the Heap variants e.g. io.netty.buffer.PooledUnsafeHeapByteBuf vs io.netty.buffer.PooledHeapByteBuf - but I need to better understand if vetx is messing up with them.
FYI these:

io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledDirectByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeDirectByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeHeapByteBuf
io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf

are still related the heap unpooled variant and should be cut in half depending if unsafe is available or not, see:
https://github.com/netty/netty/blob/95d86bbcee4f8e5a7d273d7ee16f69982cf2fab1/buffer/src/main/java/io/netty/buffer/UnpooledByteBufAllocator.java#L92-L94

The most of these classes presence depends by Cleaners and Unsafe presence, which should be something known upfront (and stored, in Netty, in static final fields), hence I would expect to work.

In addition and separately, I've opened long time ago netty/netty#13459 which should take care (after changing vertx to benefit of this) to save these too:

io.netty.buffer.AdvancedLeakAwareByteBuf
io.netty.buffer.AdvancedLeakAwareCompositeByteBuf
io.netty.buffer.SimpleLeakAwareByteBuf
io.netty.buffer.SimpleLeakAwareCompositeByteBuf

To me, apart from improvement in memory footprint, is the arity of virtual calls on buffer operations which would improve A LOT, preventing likely inlining - and that's why native image PGO seems to shine with Netty: having a single byte/short/int/long buffer operation to not be inlined and preprended by some (although predicatable) branch (i.e. types checks) will kill it's performance for buffer maniputaion scenarios - which is the very core of the internal of Netty protocols.

Implementation ideas

No response

Copy link

quarkus-bot bot commented Aug 30, 2024

/cc @cescoffier (netty), @jponge (netty), @zakkak (native-image)

@franz1981
Copy link
Contributor Author

@zakkak I would like to know if my analysis is correct and why we have such classes in

@zakkak
Copy link
Contributor

zakkak commented Aug 30, 2024

produces a list of different loaded classes for Netty

@franz1981 can you clarify how you obtain this list? Are these classes loaded at build time? Do they end up in the image as well?

Thanks

@zakkak zakkak self-assigned this Aug 30, 2024
@franz1981
Copy link
Contributor Author

franz1981 commented Aug 30, 2024

@zakkak

mvn package -DskipTests -Dnative -Dquarkus.native.enable-reports

and opened

# Printing list of used classes to: <bla bla>/target/hello-world-app-1.0.0-SNAPSHOT-native-image-source-jar/reports/used_classes_hello-world-app-1.0.0-SNAPSHOT-runner_20240830_100950.txt

which means that maybe I've not done the right thing, so please correct me 🙏

@zakkak
Copy link
Contributor

zakkak commented Aug 30, 2024

Thanks for the clarification @franz1981.

So looking at it my understanding is that you expect GraalVM to figure out at build time whether unsafe is available or not, however that seems to not always be the case.

HAS_UNSAFE is initialized using PlatformDependent.hasUnsafe() which depends on the value of UNSAFE_UNAVAILABILITY_CAUSE which is reinitialized at run-time (thus considered unknown at build time).

The interesting part is that io.netty.buffer.PoolArena is initialized at BUILD_TIME, hinting that HAS_UNSAFE doesn't necessarily hold the same value that PlatformDependent.hasUnsafe() returns at run time (which seems like a bug I need to look into).

Now irrespective of the above, I believe that GraalVM correctly includes both classes in the native executable since it doesn't know at build time which one will be needed at run-time, e.g. the user might set io.netty.noUnsafe.

Note that this doesn't necessarily mean that both classes are indeed used at run-time, they are just available in case the user runs in a configuration where unsafe for some reason is not available.

@franz1981
Copy link
Contributor Author

franz1981 commented Aug 30, 2024

Yep, but that means likely, that all the call sites which receiver is not known at compile time, become bi or mega morphic. And that would likely prevent inlining too. I can verify it inspecting the compiled blobs (need to ask @galderz how) - but, what would be the right way to make graal able to prune the unused ones?

@zakkak
Copy link
Contributor

zakkak commented Aug 30, 2024

Yep, but that means likely, that all the call sites which receiver is not known at compile time, become bi or mega morphic.

Or just include an if-else :)

And that would likely prevent inlining too.

Why though? Sure it might not be optimal but inlining can still be performed.

I can verify it inspecting the compiled blobs (need to ask @galderz how)

Not sure if @galderz has a better way but I would suggest compiling with debug info generation enabled (-Dquarkus.native.debug.enabled=true) and opening the native executable with gdb. You could then look for the method you are interested in and inspect the generated code (it also provides some level of assembly to java source code mapping but it's not always accurate).

  • but, what would be the right way to make graal able to prune the unused ones?

In this case I believe we would have to stop (re)initializing the whole PlatformDependent and PlatformDependent0 at runtime to avoid the unsafe checks being re-run, but that's not trivial AFAIK as you can't substitute static initializers... I am also not sure how safe/sound such an approach would be if you move the application across different systems and configurations.

@franz1981
Copy link
Contributor Author

franz1981 commented Aug 30, 2024

In this case, let me check what I could see by using perf report on an actual benchmark - if lucky it will still enable us to observe how the actual buffer ops are compiled - or use gdb.
Personally I don't think inlining will happen because I knew pretty well how big are the underlying series of calls...and will make the overall call site to be too big - which is what you would like to save to spare icache bandwidth (and iTLB one).
With @galderz we had a similar behaviour with AsciiString and Stringbased HTTP header validations - which turned out not being correctly optimized by native image.

@franz1981
Copy link
Contributor Author

franz1981 commented Aug 30, 2024

In this case I believe we would have to stop (re)initializing the whole PlatformDependent and PlatformDependent0 at runtime to avoid the unsafe checks being re-run

That's assuming we cannot change Netty, while we can instead (if is acceptable and useful to others, clearly).
What if you help me to understand if we could refactor that huge class (maybe having other classes which instead could be initialized at build time?) in order to isolate the bits re Unsafe which could make this thing to work?
The point is to access a property which is stable and known at build time, from what I understand.

But clearly this work should be justidied by:

  • the pruned classes will effectively reduce the RSS/compilation time for native image - at least theoretically
  • the pruned classes effectively pollute the existing buffer usage (which I'm fairly confident it happens, but gotta check)

@franz1981
Copy link
Contributor Author

Sorry @zakkak I have reread your comment and better understood

since it doesn't know at build time which one will be needed at run-time, e.g. the user might set io.netty.noUnsafe.

But currently I expect us to not "support" the case of noUnsafe (which was meant to exist mostly for Android-ish platforms really, and others JDK impls which were not providing it).
If we are fine to decide such in place of users (by ignoring noUnsafe) we could in theorya performing such decision at build time, no?
Or there are other reasons why we need noUnsafe to still be used by our users?

@franz1981
Copy link
Contributor Author

I am also not sure how safe/sound such an approach would be if you move the application across different systems and configurations.

I can help with that:I think I could split the unsafe presence check in a separate holder class within Netty AND other subsequent ones (bit-ness, unaligned support, max direct memory...) elsewhere. Let's sync together so I can better understand your concerns about portability, so we can drive how to split the detections. The more we can move it at build time, the better, I think

@zakkak
Copy link
Contributor

zakkak commented Sep 2, 2024

If we are fine to decide such in place of users (by ignoring noUnsafe) we could in theorya performing such decision at build time, no?
Or there are other reasons why we need noUnsafe to still be used by our users?

AFAIK no user ever requested it and it was disabled for quite a long time (by having PlatformDependent classes build time initialized). This and the fact that Quarkus tends to be opinionated it might be OK to go back to fixing noUnsafe at build time (note that there is no need to really ignore it, just make it a build time thing).

I can help with that:I think I could split the unsafe presence check in a separate holder class within Netty AND other subsequent ones (bit-ness, unaligned support, max direct memory...) elsewhere. Let's sync together so I can better understand your concerns about portability, so we can drive how to split the detections. The more we can move it at build time, the better, I think

I agree, that sounds like a good plan.

@zakkak
Copy link
Contributor

zakkak commented Sep 2, 2024

@franz1981 I did some more digging into this and I see that:

  1. The same thing should applies to the Heap variants e.g. io.netty.buffer.PooledUnsafeHeapByteBuf vs io.netty.buffer.PooledHeapByteBuf

    PooledHeapByteBuf is expected to be present even if not directly used since PooledUnsafeHeapByteBuf extends it (see https://github.com/netty/netty/blob/95d86bbcee4f8e5a7d273d7ee16f69982cf2fab1/buffer/src/main/java/io/netty/buffer/PooledUnsafeHeapByteBuf.java#L23).

  2. io.netty.buffer.PooledUnsafeDirectByteBuf are mutually exclusive to io.netty.buffer.UnpooledDirectByteBuf: see https://github.com/netty/netty/blob/95d86bbcee4f8e5a7d273d7ee16f69982cf2fab1/buffer/src/main/java/io/netty/buffer/PoolArena.java#L720-L722

    I believe you mean io.netty.buffer.PooledUnsafeDirectByteBuf is mutually exclusive to io.netty.buffer.PooledDirectByteBuf (instead of io.netty.buffer.UnpooledDirectByteBuf). Looking at the reports this seems to be true, I don't see io.netty.buffer.PooledDirectByteBuf being brought in. Note that this is happening because PoolArena is build-time initialized and thus HAS_UNSAFE is hard-coded to true at build time.

  3. io.netty.buffer.UnpooledUnsafeDirectByteBuf/ io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf and io.netty.buffer.UnpooledDirectByteBuf: see https://github.com/netty/netty/blob/95d86bbcee4f8e5a7d273d7ee16f69982cf2fab1/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L406-L407 and https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java#L682-L687

    Note that even if we make hasUnsafe always return true in PlartormDependent, io.netty.buffer.UnpooledDirectByteBuf is still being brought because UnpooledUnsafeDirectByteBuf extends UnpooledDirectByteBuf (see https://github.com/netty/netty/blob/95d86bbcee4f8e5a7d273d7ee16f69982cf2fab1/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java#L30).

To sum up, I see some differences in what's being brought in with the following patch:

diff --git a/extensions/netty/runtime/src/main/java/io/quarkus/netty/runtime/graal/NettySubstitutions.java b/extensions/netty/runtime/src/main/java/io/quarkus/netty/runtime/graal/NettySubstitutions.java
index 37a79003cd5..e25189b27f6 100644
--- a/extensions/netty/runtime/src/main/java/io/quarkus/netty/runtime/graal/NettySubstitutions.java
+++ b/extensions/netty/runtime/src/main/java/io/quarkus/netty/runtime/graal/NettySubstitutions.java
@@ -621,6 +621,44 @@ final class Target_io_netty_util_internal_shaded_org_jctools_util_UnsafeRefArray
     public static int LONG_ELEMENT_SHIFT;
 }
 
+@TargetClass(className = "io.netty.util.internal.PlatformDependent")
+final class Target_io_netty_util_internal_PlatformDependent {
+
+    @Substitute
+    public static boolean hasUnsafe() {
+        return true;
+    }
+
+    @Substitute
+    public static Throwable getUnsafeUnavailabilityCause() {
+        return null;
+    }
+
+    @Substitute
+    public static boolean useDirectBufferNoCleaner() {
+        return true;
+    }
+}
+
+@TargetClass(className = "io.netty.util.internal.PlatformDependent0")
+final class Target_io_netty_util_internal_PlatformDependent0 {
+
+    @Substitute
+    static boolean hasUnsafe() {
+        return true;
+    }
+
+    @Substitute
+    static Throwable getUnsafeUnavailabilityCause() {
+        return null;
+    }
+
+    @Substitute
+    static boolean hasDirectBufferNoCleanerConstructor() {
+        return true;
+    }
+}
+
 class IsBouncyNotThere implements BooleanSupplier {
 
     @Override

but I still see classes you originally mentioned you expect to not be present and it's not caused by some inaccuracy or uncertainty at build time, but due to class hierarchy in netty.

@franz1981
Copy link
Contributor Author

thanks for checking @zakkak I didn't looked at the (weird) choice of class hiearchy of Netty :/ but the most important one seems valid, which you checked already i.e.

I believe you mean io.netty.buffer.PooledUnsafeDirectByteBuf is mutually exclusive to io.netty.buffer.PooledDirectByteBuf (instead of io.netty.buffer.UnpooledDirectByteBuf). Looking at the reports this seems to be true

To sum up, I see some differences in what's being brought in with the following patch:

Now need to check this impact the quality of the generated assembly (and performance)

I suspect that alignments is another one which we really don't expect to change - but I need to verify because it depends by the OS as well what happen if you don't do it

@galderz
Copy link
Member

galderz commented Sep 2, 2024

@franz1981 and I have been looking at the assembly with the debugger to see if we can if megamorphic calls are being used to call into some of crucial ByteBuf methods. This is still WIP.

We've focused on ByteBufUtil.firstIndexOf which contains this code:

        final long pattern = SWARUtil.compilePattern(value);
        for (int i = 0; i < longCount; i++) {
            // use the faster available getLong
            final long word = useLE? buffer._getLongLE(offset) : buffer._getLong(offset);
            final long result = SWARUtil.applyPattern(word, pattern);

We know that SWARUtil.compilePattern uses the constant 0x101010101010101L and SWARUtil.applyPattern uses the constant 0x7F7F7F7F7F7F7F7FL.

Also, we know that the assembly calls into PooledUnsafeDirectByteBuf._getLongLE:

(gdb) 
1: x/i $pc
=> 0x68b3e6 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+486>:	movslq %ebx,%rsi
(gdb) 
io.netty.util.internal.SWARUtil::compilePattern(signed char) (byteToFind=<optimized out>) at io/netty/util/internal/SWARUtil.java:27
1: x/i $pc
=> 0x68b3e6 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+486>:	movslq %ebx,%rsi
(gdb) fin
Run till exit from #0  io.netty.util.internal.SWARUtil::compilePattern(signed char) (byteToFind=<optimized out>) at io/netty/util/internal/SWARUtil.java:27
io.netty.buffer.ByteBufUtil::firstIndexOf(io.netty.buffer.AbstractByteBuf*, int, int, signed char) (buffer=<optimized out>, fromIndex=<optimized out>, 
    toIndex=<optimized out>, value=<optimized out>) at io/netty/buffer/ByteBufUtil.java:598
1: x/i $pc
=> 0x68b406 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+518>:	mov    %ebp,%esi
(gdb) s
io.netty.buffer.PooledUnsafeDirectByteBuf::_getLongLE(int) (this=0x7fffe41ca198, index=6) at io/netty/buffer/PooledUnsafeDirectByteBuf.java:119
1: x/i $pc
=> 0x6c5eb0 <_ZN41io.netty.buffer.PooledUnsafeDirectByteBuf10_getLongLEEJli>:	sub    $0x8,%rsp
(gdb) fin
Run till exit from #0  io.netty.buffer.PooledUnsafeDirectByteBuf::_getLongLE(int) (this=0x7fffe41ca198, index=6)
    at io/netty/buffer/PooledUnsafeDirectByteBuf.java:119
0x000000000068b416 in io.netty.buffer.ByteBufUtil::firstIndexOf(io.netty.buffer.AbstractByteBuf*, int, int, signed char) (buffer=0x7fffe41ca198, 
    fromIndex=<optimized out>, toIndex=<optimized out>, value=<optimized out>) at io/netty/buffer/ByteBufUtil.java:598
1: x/i $pc
=> 0x68b416 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+534>:	nop
Value returned is $1 = 6076561101375171685

With both of those in mind and viewing the assembly from a few instructions before we see this:

(gdb) x/100i $pc
=> 0x68b37b <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+379>:	mov    0x308(%r14,%rax,1),%rcx
   0x68b383 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+387>:	shr    $0x3,%edx
   0x68b386 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+390>:	mov    %edx,0x34(%rsp)
   0x68b38a <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+394>:	mov    0x18(%rsp),%rdi
   0x68b38f <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+399>:	mov    %rcx,%rax
   0x68b392 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+402>:	call   *%rax
   0x68b394 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+404>:	nop
   0x68b395 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+405>:	mov    %rax,0x8(%rsp)
   0x68b39a <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+410>:	mov    $0x0,%r8d
   0x68b3a0 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+416>:	mov    0x3c(%rsp),%ebp
   0x68b3a4 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+420>:	jmp    0x68b45f <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+607>
   0x68b3a9 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+425>:	data16 data16 nopw 0x0(%rax,%rax,1)
   0x68b3b4 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+436>:	data16 data16 xchg %ax,%ax
   0x68b3b8 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+440>:	nopl   0x0(%rax,%rax,1)
   0x68b3c0 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+448>:	mov    %r8d,0x3c(%rsp)
   0x68b3c5 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+453>:	movabs $0x113b120,%r9
   0x68b3cf <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+463>:	lea    (%r14,%r9,1),%r9
   0x68b3d3 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+467>:	mov    0x560(%r14,%rcx,1),%r10
   0x68b3db <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+475>:	cmp    %rax,%r9
   0x68b3de <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+478>:	sete   %r11b
   0x68b3e2 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+482>:	movzbl %r11b,%r11d
   0x68b3e6 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+486>:	movslq %ebx,%rsi
   0x68b3e9 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+489>:	and    $0xff,%rsi
   0x68b3f0 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+496>:	movabs $0x101010101010101,%r12
   0x68b3fa <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+506>:	imul   %r12,%rsi
   0x68b3fe <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+510>:	mov    %rdi,%r12
   0x68b401 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+513>:	mov    %rsi,0x28(%rsp)
   0x68b406 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+518>:	mov    %ebp,%esi
   0x68b408 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+520>:	mov    %r10,%rax
   0x68b40b <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+523>:	mov    %r11d,0x24(%rsp)
   0x68b410 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+528>:	mov    %ebp,0x20(%rsp)
   0x68b414 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+532>:	call   *%rax
   0x68b416 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+534>:	nop
   0x68b417 <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+535>:	mov    0x28(%rsp),%rsi
   0x68b41c <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+540>:	xor    %rax,%rsi
   0x68b41f <_ZN27io.netty.buffer.ByteBufUtil12firstIndexOfEJiP31io.netty.buffer.AbstractByteBufiia+543>:	movabs $0x7f7f7f7f7f7f7f7f,%rax

The call in 0x68b414 is the call that invokes the method, but we're not fully sure how the decision to call this particular method is done.

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 2, 2024

@galderz at this point, as @zakkak said; if we're "lucky" we should have just a bimorphic call there, but the fact that we have a call where getLongLE is supposed to happen, is not a good sign: having a full call doesn't look good to read a single long value.
In JVM mode if we're lucky this is mostly inlined

@cescoffier
Copy link
Member

Any progress on this one?

@galderz
Copy link
Member

galderz commented Dec 2, 2024

@cescoffier Nothing to report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants