Check for adcx instruction in the addcarryx intrinsics instead of just adc #666

gnzlbg · 2019-01-31T10:18:51Z

cc @jethrogb who reported this here #631 (comment)

jethrogb · 2019-01-31T10:22:52Z

Title should say "addcarryx"

gnzlbg · 2019-01-31T11:14:57Z

LLVM doesn't emit this intrinsic anymore: https://reviews.llvm.org/D51754

The rationale given is:

There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.

So I'm closing this. If you feel like we should emit this instruction here, feel free to re-open this, and fill an LLVM bug about it pinging the people involved in the submission and review of that pull-request, and link it here.

jethrogb · 2019-01-31T11:17:08Z

The public interface we expose has nothing to do with what LLVM does internally. For simple functions such as this, our public interface needs to match the Intel Intrinsics guide. If we can't implement it properly in LLVM, we should not stabilize the intrinsic.

If you didn't care about using ADX, you could use the addcarry intrinsic which always works.

gnzlbg · 2019-01-31T11:46:31Z

You are right, I think that llvm diff is unrelated. We should be using llvm.addcarryx instead of llvm.addcarry here. I'll start by adding some tests first and see if I can get that intrinsic to work.

gnzlbg · 2019-01-31T11:47:26Z

FWIW, we don't promise which instructions any intrinsic will generate. We promise that the running executable won't be able to tell the difference.

gnzlbg · 2019-01-31T12:04:49Z

So looking at the LLVM tests, it appears that indeed addcarryx is replaced with addcarry when possible. AFAICT replacing it would only be incorrect if the user were to inspect the overflow flag which we are not doing (and I am unsure whether that's possible or easy to do in LLVM - cc @rkruppe).

@jethrogb I have a PR (#672) that updates these to use the llvm addcarryx intrinsic, but the code generation issue persists. We could workaround that using inline assembly, but before we add any workarounds one would need to fill in an LLVM bug showing an MWE in which the transformation is incorrect.

gnzlbg · 2019-04-19T10:02:19Z

@jethrogb do you think you could help with an MWE for an LLVM bug report for this ?

jethrogb · 2019-04-19T16:07:10Z

I came up with this

/// Returns `(a + b, c + d)`
#[inline(never)]
#[no_mangle]
pub fn dual_mp_add(a: &[u64], b: &[u64], c: &[u64], d: &[u64]) -> (Vec<u64>, Vec<u64>) {
    unsafe {
        let len = a.len();
        assert_eq!(len, b.len());
        assert_eq!(len, c.len());
        assert_eq!(len, d.len());
        let mut ab = vec![0; len];
        let mut cd = vec![0; len];

        let mut ab_carry = 0;
        let mut cd_carry = 0;
        for i in 0..len {
            ab_carry = _addcarryx_u64(ab_carry, *a.get_unchecked(i), *b.get_unchecked(i), ab.get_unchecked_mut(i));
            cd_carry = _addcarryx_u64(cd_carry, *c.get_unchecked(i), *d.get_unchecked(i), cd.get_unchecked_mut(i));
        }

        (ab, cd)
    }
}

But the assembly is so laughably inefficient (the addcarry call isn't even inlined) that not using the ADX instruction isn't really going to do much in terms of performance.

jethrogb · 2019-04-19T22:45:29Z

Sorry, I forgot that you need to add a + in -C target-feature and there is no error if you don't. When doing that, I get this:

    34a0:       49 8b 4c f5 00          mov    0x0(%r13,%rsi,8),%rcx
    34a5:       40 80 c7 ff             add    $0xff,%dil
    34a9:       49 13 0c f4             adc    (%r12,%rsi,8),%rcx
    34ad:       40 0f 92 c7             setb   %dil
    34b1:       48 89 4c f5 00          mov    %rcx,0x0(%rbp,%rsi,8)
    34b6:       49 8b 0c f7             mov    (%r15,%rsi,8),%rcx
    34ba:       80 c2 ff                add    $0xff,%dl
    34bd:       49 13 0c f0             adc    (%r8,%rsi,8),%rcx
    34c1:       0f 92 c2                setb   %dl
    34c4:       48 89 0c f0             mov    %rcx,(%rax,%rsi,8)
    34c8:       48 8d 76 01             lea    0x1(%rsi),%rsi
    34cc:       48 39 f3                cmp    %rsi,%rbx
    34cf:       75 cf                   jne    34a0 <dual_mp_add+0xf0>

I think this is supposed to be slower than using ADX.

jethrogb · 2019-04-19T23:53:43Z

~~Actually, you can't implement dual integer addition with ADX because AFAICT there's no way to check the loop variable without touching CF/OF. I'll try to come up with something else.~~

Edit: You can using JRCXZ!

jethrogb · 2019-04-20T02:28:23Z

Ok, so I replaced that code with this assembly:

   neg    %rbx    
   xor    %edx,%edx
   xor    %edi,%edi
   xor    %esi,%esi
1:
   mov    0x0(%r13,%rsi,8),%rcx
   adcx   (%r12,%rsi,8),%rcx
   mov    %rcx,0x0(%rbp,%rsi,8)
   mov    (%r15,%rsi,8),%rcx
   adox   (%r8,%rsi,8),%rcx
   mov    %rcx,(%rax,%rsi,8)
   lea    0x1(%rsi),%rsi
   lea    (%rsi,%rbx),%rcx
   jrcxz  2f
   jmp 1b
2:

This is basically the same speed with my poor benchmark.

gnzlbg · 2019-04-20T06:59:18Z

If the code is correct and the speed is the same, there might be an advantage to not using adcx intrinsics that we are not seeing but LLVM might (code size, port pressure, ...). We can still ask a question in LLVM bugzilla about why isn't adcx being picked up though, but we do allow LLVM to do these "optimizations" as long as the resulting machine code is not slower.

jethrogb · 2019-04-20T07:16:40Z

I decided to implement MP multiply, which is documented by Intel as a main usecase for these instructions:

#![feature(test, asm)]

extern crate test;

use core::arch::x86_64::{_addcarryx_u64, _mulx_u64};

#[inline(never)]
#[no_mangle]
pub fn mp_mul_intrin(a: &[u64], b: &[u64]) -> Vec<u64> {
    unsafe {
        let len = a.len();
        assert_eq!(len, b.len());
        let mut c = vec![0; len * 2];

        for b_i in (0..len).into_iter().rev() {
            let b_elem = *b.get_unchecked(b_i);
            let mut carry_lo = 0;
            let mut carry_hi = 0;
            for a_i in (0..len).into_iter().rev() {
                let mut hi = 0;
                let lo = _mulx_u64(*a.get_unchecked(a_i), b_elem, &mut hi);
                carry_lo = _addcarryx_u64(carry_lo, lo, *c.get_unchecked(b_i + a_i + 1), c.get_unchecked_mut(b_i + a_i + 1));
                carry_hi = _addcarryx_u64(carry_hi, hi, *c.get_unchecked(b_i + a_i), c.get_unchecked_mut(b_i + a_i));
            }
            *c.get_unchecked_mut(b_i) += carry_lo as u64;
        }

        c
    }
}

#[inline(never)]
#[no_mangle]
pub fn mp_mul_asm(a: &[u64], b: &[u64]) -> Vec<u64> {
    unsafe {
        let len = a.len();
        assert_eq!(len, b.len());
        let mut c = vec![0; len * 2];
        
        asm!("
    # start iteration of `b_i`: `len`-1 downto 0
    lea -1($3), %rsi
1:  # every iteration of `b_i`
        # load `b_elem`
        mov ($1, %rsi, 8), %rdx
        # clear `carry_lo`, `carry_hi`
        xor %r9, %r9
        # start iteration of `a_i`+1: `len` downto 1
        mov $3, %rcx
        jmp 3f
2:          # the end of every iteration of `a_i`+1, except the last iteration
            # no displacement, RCX has already been decremented
            mov %r10, (%r11, %rcx, 8)
3:      # every iteration of `a_i`+1
            # compute c + b_i
            lea ($2, %rsi, 8), %r11
            # compute a[a_i]*b_elem
            mulx -8($0, %rcx, 8), %r8, %r9
            # add low word
            mov (%r11, %rcx, 8), %r10
            adcx %r8, %r10
            mov %r10, (%r11, %rcx, 8)
            # add high word
            mov -8(%r11, %rcx, 8), %r10
            adox %r9, %r10
            # this is done later to be able to add in last carry in last iteration of `a_i`+1
            # mov %r10, -8(%r11, %rcx, 8)
            # advance `a_i`+1
            lea -1(%rcx), %rcx
            jrcxz 4f
            jmp 2b
4:          # the end of the last iteration of `a_i`+1
            adc $$0, %r10
            mov %r10, (%r11)
        # advance `b_i`
        dec %rsi
        jns 1b
"
            :
            : "r"(a.as_ptr()), "r"(b.as_ptr()), "r"(c.as_mut_ptr()), "r"(len)
            : "rsi", "rcx", "rdx", "r8", "r9", "r10", "r11", "flags"
        );

        c
    }
}

use num_bigint::{RandBigInt, BigUint};

fn bigint_to_slice(i: &BigUint) -> Vec<u64> {
    i.to_bytes_be().chunks(8).map(|chunk| {
        let mut a = [0u8; 8];
        a.copy_from_slice(chunk);
        u64::from_be_bytes(a)
    }).collect()
}


#[test]
fn test_rand() {
    for _ in 0..1_000 {
        let a = rand::thread_rng().gen_biguint(1024);
        let b = rand::thread_rng().gen_biguint(1024);
        if a.bits() + b.bits() <= 2041 {
            continue;
        }
        let c = &a * &b;
        assert_eq!(
            mp_mul_intrin(&bigint_to_slice(&a), &bigint_to_slice(&b)),
            bigint_to_slice(&c)
        );
        assert_eq!(
            mp_mul_asm(&bigint_to_slice(&a), &bigint_to_slice(&b)),
            bigint_to_slice(&c)
        );
        println!("ok");
    }
}

#[bench]
fn bench_intrin(bench: &mut test::Bencher) {
    loop {
        let a = rand::thread_rng().gen_biguint(1024);
        let b = rand::thread_rng().gen_biguint(1024);
        if a.bits() + b.bits() <= 2041 {
            continue;
        }
        let a = bigint_to_slice(&a);
        let b = bigint_to_slice(&b);
        bench.iter(|| mp_mul_intrin(&a, &b));
        break;
    }
}

#[bench]
fn bench_asm(bench: &mut test::Bencher) {
    loop {
        let a = rand::thread_rng().gen_biguint(1024);
        let b = rand::thread_rng().gen_biguint(1024);
        if a.bits() + b.bits() <= 2041 {
            continue;
        }
        let a = bigint_to_slice(&a);
        let b = bigint_to_slice(&b);
        bench.iter(|| mp_mul_asm(&a, &b));
        break;
    }
}

Here you can see a pretty significant difference:

test bench_asm    ... bench:         390 ns/iter (+/- 58)
test bench_intrin ... bench:         507 ns/iter (+/- 65)

For reference, here's the assembly I get for the intrinsics version:

   1638d:       49 8d 3c d8             lea    (%r8,%rbx,8),%rdi
   16391:       48 83 c7 f0             add    $0xfffffffffffffff0,%rdi
   16395:       49 89 da                mov    %rbx,%r10
   16398:       4d 8d 4a ff             lea    -0x1(%r10),%r9
   1639c:       48 85 db                test   %rbx,%rbx
   1639f:       75 24                   jne    163c5 <mp_mul_intrin+0xb5>
   163a1:       eb 6d                   jmp    16410 <mp_mul_intrin+0x100>
   163a3:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   163aa:       00 00 00 
   163ad:       0f 1f 00                nopl   (%rax)
   163b0:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
   163b5:       48 83 c7 f8             add    $0xfffffffffffffff8,%rdi
   163b9:       4d 89 ca                mov    %r9,%r10
   163bc:       4d 8d 4a ff             lea    -0x1(%r10),%r9
   163c0:       48 85 db                test   %rbx,%rbx
   163c3:       74 4b                   je     16410 <mp_mul_intrin+0x100>
   163c5:       4b 8b 74 d7 f8          mov    -0x8(%r15,%r10,8),%rsi
   163ca:       31 ed                   xor    %ebp,%ebp
   163cc:       31 c9                   xor    %ecx,%ecx
   163ce:       66 90                   xchg   %ax,%ax
   163d0:       48 89 f0                mov    %rsi,%rax
   163d3:       49 f7 64 dc f8          mulq   -0x8(%r12,%rbx,8)
   163d8:       80 c1 ff                add    $0xff,%cl
   163db:       48 11 44 df 08          adc    %rax,0x8(%rdi,%rbx,8)
   163e0:       0f 92 c1                setb   %cl
   163e3:       40 80 c5 ff             add    $0xff,%bpl
   163e7:       48 11 14 df             adc    %rdx,(%rdi,%rbx,8)
   163eb:       40 0f 92 c5             setb   %bpl
   163ef:       48 83 c3 ff             add    $0xffffffffffffffff,%rbx
   163f3:       75 db                   jne    163d0 <mp_mul_intrin+0xc0>
   163f5:       0f b6 c1                movzbl %cl,%eax
   163f8:       4b 01 44 d0 f8          add    %rax,-0x8(%r8,%r10,8)
   163fd:       4d 85 c9                test   %r9,%r9
   16400:       75 ae                   jne    163b0 <mp_mul_intrin+0xa0>
   16402:       eb 1b                   jmp    1641f <mp_mul_intrin+0x10f>
   16404:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   1640b:       00 00 00 
   1640e:       66 90                   xchg   %ax,%ax
   16410:       31 c9                   xor    %ecx,%ecx
   16412:       0f b6 c1                movzbl %cl,%eax
   16415:       4b 01 44 d0 f8          add    %rax,-0x8(%r8,%r10,8)
   1641a:       4d 85 c9                test   %r9,%r9
   1641d:       75 91                   jne    163b0 <mp_mul_intrin+0xa0>

gnzlbg · 2019-04-20T07:32:24Z

Thank you, I think that's very convincing. Could you fill in an LLVM bug for this and link it here? (otherwise I can do it for you if you want).

jethrogb · 2019-04-20T07:34:11Z

I'd prefer it if you did it, I'm guessing you have a better picture of what should be in such a bug report.

gnzlbg · 2019-04-20T08:37:37Z

So investigating for the LLVM bug report I produced the following slightly more minimal example (godbolt - no std, no debug info, no println, no panics, ...):

#![feature(adx_target_feature)]
#![no_std]
use core::{arch::x86_64::*, hint::unreachable_unchecked};

#[target_feature(enable = "adx,bmi2")]
pub unsafe fn mp_mul_intrin(a: &[u64], b: &[u64], c: &mut [u64]) {
    let len = a.len();
    if len != b.len() {
        unreachable_unchecked();
    }
    if c.len() != len * 2 {
        unreachable_unchecked();
    }

    for b_i in (0..len).into_iter().rev() {
        let b_elem = *b.get_unchecked(b_i);
        let mut carry_lo = 0;
        let mut carry_hi = 0;
        for a_i in (0..len).into_iter().rev() {
            let mut hi = 0;
            let lo = _mulx_u64(*a.get_unchecked(a_i), b_elem, &mut hi);
            carry_lo = _addcarryx_u64(
                carry_lo,
                lo,
                *c.get_unchecked(b_i + a_i + 1),
                c.get_unchecked_mut(b_i + a_i + 1),
            );
            carry_hi = _addcarryx_u64(
                carry_hi,
                hi,
                *c.get_unchecked(b_i + a_i),
                c.get_unchecked_mut(b_i + a_i),
            );
        }
        *c.get_unchecked_mut(b_i) += carry_lo as u64;
    }
}

Inspecting the LLVM-IR in debug mode, we are not emitting llvm.x86.addcarryx instructions and I don't know why. Looking at core::arch source of _addcarryx_u64: https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/core_arch/src/x86_64/adx.rs#L33 we are calling llvm_addcarryx_u64 https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/core_arch/src/x86_64/adx.rs#L9 which links llvm.x86.addcarryx.u64. So in debug mode, that's what we should be seeing. I've checked the nightly version in godbolt, and it is from 18.04.2019, so core::arch appears to be recent enough. Any ideas ?

gnzlbg · 2019-04-20T08:42:24Z

This reproduces the issue (godbolt):

#![feature(link_llvm_intrinsics, abi_unadjusted, adx_target_feature)]

#[allow(improper_ctypes)]
extern "unadjusted" {
    #[link_name = "llvm.x86.addcarryx.u64"]
    fn llvm_addcarryx_u64(a: u8, b: u64, c: u64, d: *mut u8) -> u8;
}

#[target_feature(enable = "adx")]
pub unsafe fn _addcarryx_u64(c_in: u8, a: u64, b: u64, out: &mut u64) -> u8 {
    llvm_addcarryx_u64(c_in, a, b, out as *mut _ as *mut u8)
}

produces

ource_filename = "example.3a1fbbbh-cgu.0"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i8 @_ZN7example14_addcarryx_u6417hf98d062ffaad3a58E(i8 %c_in, i64 %a, i64 %b, i64* align 8 dereferenceable(8) %out) unnamed_addr #0 {
start:
  %0 = bitcast i64* %out to i8*
  %1 = call { i8, i64 } @llvm.x86.addcarry.64(i8 %c_in, i64 %a, i64 %b)
  %2 = extractvalue { i8, i64 } %1, 1
  %3 = bitcast i8* %0 to i64*
  store i64 %2, i64* %3, align 1
  %4 = extractvalue { i8, i64 } %1, 0
  br label %bb1

bb1: ; preds = %start
  ret i8 %4
}

declare { i8, i64 } @llvm.x86.addcarry.64(i8, i64, i64) #1

attributes #0 = { nounwind nonlazybind "probe-stack"="__rust_probestack""target-cpu"="x86-64""target-features"="+adx" }
attributes #1 = { nounwind readnone }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"RtLibUseGOT", i32 1}

when compiled with "-C opt-level=0 -C debuginfo=0 -C panic=abort --edition=2018 -C lto=no --emit=llvm-ir".

gnzlbg · 2019-04-20T08:59:07Z

It appears that this is just LLVM replacing the addcarryx intrinsic with addcarry (e.g. https://llvm.godbolt.org/z/vfVH70), even with opt-level=0.

jethrogb · 2019-04-20T16:47:29Z

It looks like the mulx instrinsic also doesn't work correctly? Edit: there we don't even try to call the LLVM intrinsic...

gnzlbg · 2019-04-20T16:50:34Z

IIRC the bmi2 mulx is almost never worth it codegen wise. I’m on the phone but there should be a closed issue here about that.

jethrogb · 2019-04-20T16:53:34Z

That is #27. In the code I shared, without mulx, it's no use emitting adcx/adox, because it would be overwriting the flags we were so carefully trying to keep around.

jethrogb · 2019-04-20T16:54:44Z

#27 claims the 64-bit version "works fine" but that's not (or no longer) the case.

gnzlbg · 2019-04-20T16:57:04Z

Damn, we should probably also fill an LLVM issue about that :/

…

On Sat 20. Apr 2019 at 18:54, jethrogb ***@***.***> wrote: #27 <#27> claims the 64-bit version "works fine" but that's not (or no longer) the case. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#666 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAG43JU5QCOL2LEGV5OEYFLPRNDFLANCNFSM4GTP7DSQ> .

gnzlbg · 2019-04-22T06:30:40Z

So this is now blocked on LLVM: https://bugs.llvm.org/show_bug.cgi?id=41546

It seems LLVM is not set up to generate code like this.

jon-chuang · 2020-03-30T06:32:13Z

Does this affect using the asm! macro? Or does it only affect the intrinsics in std_arch?

Amanieu · 2020-03-30T06:43:49Z

This only affects the intrinsics.

jon-chuang · 2020-04-03T23:56:50Z

Doesn't u128's mul_assign depend on mulx?

tarcieri · 2021-10-27T20:10:28Z

I saw @gnzlbg mention this earlier:

We could workaround that using inline assembly, but before we add any workarounds one would need to fill in an LLVM bug showing an MWE in which the transformation is incorrect.

...and @Amanieu mentioned that inline assembly is not impacted here.

It seems that the LLVM bug report has been filed, so is an inline assembly workaround potentially on the table here now?

bjorn3 · 2021-10-27T20:35:14Z

AFAICT this is only a perf issue and not a correctness issue. Using inline assembly in the body of just this intrinsic definition will likely only slowdown things as LLVM won't be able to fold an immediate into the instruction. It can't cause a speedup as adcx still clobbers the carry flag and there is no way to say that only the carry flag is clobbered and not the other flags. To get a perf improvement from using inline assembly you will likely need to write the entire code sequence that contains the adcx usage in inline asm.

tarcieri · 2021-10-27T20:41:23Z

@bjorn3 yeah, I was worried about that.

As @jethrogb mentioned earlier what would be really nice here is some way to combine ADC/MULX such that flags aren't clobbered. But if they are, it defeats the point.

To make matters even more complicated we're looking for options that can potentially work in a const context, which is why we're looking for an intrinsics-based approach as opposed to directly using inline assembly ourselves. But if that isn't possible, we can potentially pursue these sorts of optimizations using inline assembly outside of a const context.

dignifiedquire · 2021-10-27T21:44:15Z

Disclaimer: this might be unsound & unsafe.

So I have this prototype, which uses only functions + asm! to build a carry chain. It does generate the expected assembly.

playground::carry_chain: # @playground::carry_chain
# %bb.0:
	pushq	%rax
	movq	%rdx, %rax

	xorq	%rdx, %rdx
	movq	%rdi, %rdx
	mulxq	%rax, %rdx, %rdi
	adoxq	%rdi, %rdx
	adcxq	%rax, %rdi

	movq	%rsi, %rdx
	mulxq	%rcx, %rax, %rdx
	adoxq	%rdx, %rax
	adcxq	%rcx, %rdx

	popq	%rcx
	retq
                                        # -- End function

(removed #APP comments for better readability)

I haven't been able to get rid of the mov statements inbetween, but probably with some tweaking that should be possible as well (as those should not be interleaved in the hand written assembly).

Rust Playground

gnzlbg mentioned this issue Jan 31, 2019

Add ADX-related intrinsics #631

Merged

gnzlbg changed the title ~~Check for adcx instruction in the addcarry intrinsics instead of just adc~~ Check for adcx instruction in the addcarryx intrinsics instead of just adc Jan 31, 2019

gnzlbg closed this as completed Jan 31, 2019

gnzlbg reopened this Jan 31, 2019

gnzlbg added the A-llvm label Apr 22, 2019

recmo mentioned this issue Jan 14, 2020

Inline assembly rust-lang/rfcs#2850

Closed

recmo mentioned this issue Mar 19, 2020

Intrinsics based approach usind AD_X is currently blocked on LLVM. 0xProject/OpenZKP#616

Open

Pratyush mentioned this issue Feb 4, 2021

Investigate ways to improve basic primitive speed arkworks-rs/algebra#198

Open

tarcieri mentioned this issue Oct 27, 2021

Support detecting const-eval vs non-const-eval in a const fn rust-lang/const-eval#7

Open

tarcieri mentioned this issue Feb 6, 2022

crypto-intrinsics RustCrypto/utils#730

Closed

gnzlbg mannequin mentioned this issue Dec 10, 2021

suboptimal codegen for llvm.x86.addcarryx.u32 and u64 llvm/llvm-project#40891

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for adcx instruction in the addcarryx intrinsics instead of just adc #666

Check for adcx instruction in the addcarryx intrinsics instead of just adc #666

gnzlbg commented Jan 31, 2019 •

edited

Loading

jethrogb commented Jan 31, 2019

gnzlbg commented Jan 31, 2019

jethrogb commented Jan 31, 2019 •

edited

Loading

gnzlbg commented Jan 31, 2019 •

edited

Loading

gnzlbg commented Jan 31, 2019

gnzlbg commented Jan 31, 2019 •

edited

Loading

gnzlbg commented Apr 19, 2019

jethrogb commented Apr 19, 2019 •

edited

Loading

jethrogb commented Apr 19, 2019

jethrogb commented Apr 19, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019 •

edited

Loading

gnzlbg commented Apr 20, 2019

jethrogb commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 •

edited

Loading

gnzlbg commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019 •

edited

Loading

gnzlbg commented Apr 20, 2019 via email

jethrogb commented Apr 20, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 via email

gnzlbg commented Apr 22, 2019

jon-chuang commented Mar 30, 2020

Amanieu commented Mar 30, 2020

jon-chuang commented Apr 3, 2020

tarcieri commented Oct 27, 2021

bjorn3 commented Oct 27, 2021

tarcieri commented Oct 27, 2021 •

edited

Loading

dignifiedquire commented Oct 27, 2021 •

edited

Loading

Check for adcx instruction in the addcarryx intrinsics instead of just adc #666

Check for adcx instruction in the addcarryx intrinsics instead of just adc #666

Comments

gnzlbg commented Jan 31, 2019 • edited Loading

jethrogb commented Jan 31, 2019

gnzlbg commented Jan 31, 2019

jethrogb commented Jan 31, 2019 • edited Loading

gnzlbg commented Jan 31, 2019 • edited Loading

gnzlbg commented Jan 31, 2019

gnzlbg commented Jan 31, 2019 • edited Loading

gnzlbg commented Apr 19, 2019

jethrogb commented Apr 19, 2019 • edited Loading

jethrogb commented Apr 19, 2019

jethrogb commented Apr 19, 2019 • edited Loading

jethrogb commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 • edited Loading

jethrogb commented Apr 20, 2019 • edited Loading

gnzlbg commented Apr 20, 2019

jethrogb commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 • edited Loading

gnzlbg commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 • edited Loading

jethrogb commented Apr 20, 2019 • edited Loading

gnzlbg commented Apr 20, 2019 via email

jethrogb commented Apr 20, 2019 • edited Loading

jethrogb commented Apr 20, 2019

gnzlbg commented Apr 20, 2019 via email

gnzlbg commented Apr 22, 2019

jon-chuang commented Mar 30, 2020

Amanieu commented Mar 30, 2020

jon-chuang commented Apr 3, 2020

tarcieri commented Oct 27, 2021

bjorn3 commented Oct 27, 2021

tarcieri commented Oct 27, 2021 • edited Loading

dignifiedquire commented Oct 27, 2021 • edited Loading

gnzlbg commented Jan 31, 2019 •

edited

Loading

jethrogb commented Jan 31, 2019 •

edited

Loading

gnzlbg commented Jan 31, 2019 •

edited

Loading

gnzlbg commented Jan 31, 2019 •

edited

Loading

jethrogb commented Apr 19, 2019 •

edited

Loading

jethrogb commented Apr 19, 2019 •

edited

Loading

gnzlbg commented Apr 20, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019 •

edited

Loading

gnzlbg commented Apr 20, 2019 •

edited

Loading

gnzlbg commented Apr 20, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019 •

edited

Loading

jethrogb commented Apr 20, 2019 •

edited

Loading

tarcieri commented Oct 27, 2021 •

edited

Loading

dignifiedquire commented Oct 27, 2021 •

edited

Loading