Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compile issues when build on RHEL5_64 with gcc 4.9.4 #8

Closed
wants to merge 3 commits into from

Conversation

bryce-shang
Copy link
Contributor

Issues

This change is to fix compile issues when build on RHEL5_64 with gcc 4.9.4.

@bryce-shang bryce-shang closed this Jul 6, 2020
@darylmartin100 darylmartin100 deleted the gcc4.9 branch August 26, 2020 15:14
nebeid pushed a commit that referenced this pull request Jul 29, 2022
Resolves parsing issues for ARMv8 assembly with clang7 on ubuntu 20.04 in fips static build (found through PR #566 for SHA3 assembly implementation).

- Fix parsing issue in `delocate.peg` for ARM assembly.
- Edit rule `RegisterOrConstant` to allow shifting a register/constant by two digit value (e.g., the case of ARMv8 mask for  SHA3 hardware support) instead of just one digit.
- Add a new rule for allowing addition, subtraction and multiplication in the offset. (Note: useful for looping address accesses, e.g., `#8*($i+2)`). Add a set of `OffsetOperator` to define the operations allowed in the offset. Add a new set of `Offset` rule operations interpreted depending on parenthesis location, if added. 

Note: The parenthesis in the `Offset` rule should be either both included or both left out; i.e., the parenthesis set should be closed. The `OffsetOperator` includes addition, subtraction and multiplication only.

This change was tested successfully in PR #566.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 15, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to
precompute and cache common computations across multiple
invocations of AES-GCM using the same key, thereby improving
performance.

The main example of such a common precomputation is the
computation of powers of the H-value used the GHASH algorithm --
giving the H-Table its name. However, despite the name, the
structure of the H-Table is opaque to the code invoking AES-GCM,
and implementations are free to populate it with arbitrary data.

This freedom is already being leveraged: Currently, the AArch64
implementation of AES-GCM not only stores powers of H in the
HTable (H1-H8 in the code), but also their 'Karatsuba
preprocessing's, which are the EORs of the low and high halves.
Those naturally occur when using Karatsuba's algorithm to reduce a
128-bit polynomial multiplication over GF(2) to 3x 64-bit
polynomial.

This commit changes the structure of the H-Table for AArch64
implementations slightly for better performance:
It is observed that every time a power of H is loaded from the
H-Table (H1-H8), the first operation that happens to it in both
aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low
and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps
can be precomputed, and the Hi values stores in swapped form in the
HTable, thereby eliminating the swaps from the critical loop of AES-GCM.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 15, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to
precompute and cache common computations across multiple
invocations of AES-GCM using the same key, thereby improving
performance.

The main example of such common precomputation is the
computation of powers of the H-value used in the GHASH algorithm
-- giving the H-Table its name. However, despite the name, the
structure of the H-Table is opaque to the code invoking AES-GCM,
and implementations are free to populate it with arbitrary data.

This freedom is already being leveraged: Currently, the AArch64
implementation of AES-GCM not only stores powers of H in the
HTable (H1-H8 in the code), but also their 'Karatsuba
preprocessing's, which are the EORs of the low and high halves.
Those naturally occur when using Karatsuba's algorithm to reduce a
128-bit polynomial multiplication over GF(2) to 3x 64-bit
polynomial.

This commit changes the structure of the H-Table for AArch64
implementations of AES-GCM slightly to obtain a small performance gain:

It is observed that every time a power of H is loaded from the
H-Table (H1-H8), the first operation that happens to it in both
aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low
and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps
can be precomputed, and the H{1-8} values stored in swapped form in the
HTable, thereby eliminating the swaps from the critical loop of AES-GCM.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 16, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to
precompute and cache common computations across multiple
invocations of AES-GCM using the same key, thereby improving
performance.

The main example of such common precomputation is the
computation of powers of the H-value used in the GHASH algorithm
-- giving the H-Table its name. However, despite the name, the
structure of the H-Table is opaque to the code invoking AES-GCM,
and implementations are free to populate it with arbitrary data.

This freedom is already being leveraged: Currently, the AArch64
implementation of AES-GCM not only stores powers of H in the
HTable (H1-H8 in the code), but also their 'Karatsuba
preprocessing's, which are the EORs of the low and high halves.
Those naturally occur when using Karatsuba's algorithm to reduce a
128-bit polynomial multiplication over GF(2) to 3x 64-bit
polynomial.

This commit changes the structure of the H-Table for AArch64
implementations of AES-GCM slightly to obtain a small performance gain:

It is observed that every time a power of H is loaded from the
H-Table (H1-H8), the first operation that happens to it in both
aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low
and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps
can be precomputed, and the H{1-8} values stored in swapped form in the
HTable, thereby eliminating the swaps from the critical loop of AES-GCM.

This commit modifies the H-table precomputation ghash_init_v8 in the
simplest way possible to introduce the desired swaps, bracketing store
instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting
initialization code is slightly slower than the original one and will
be simplified in the next commit.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 16, 2024
This is the first in a series of commits aiming to rewrite gcm_ghash_v8
to work directly with the swapped H-table values, rather than swapping them
back after loading and falling back to the old code.

As a first step, the swapping of A = {H,H2} are removed and all uses of

```
pmull.64 Y, A, X
```

replaced by the equivalent

```
vext.8 X, X, X, aws#8
pmull2.64 Y, A, X
vext.8 X, X, X, aws#8
```

(and similarly for pmull2).

This works so long as X and Y don't alias.

Of course, the above conversion makes the code much less efficient,
and is not final. The next commit will eliminate `vext`.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 16, 2024
`In` and `t1` are swapped versions of each other. Therefore,

```
        vext.8          $In, $In, $In, aws#8
	vpmull2.p64	$Xln,$H,$In		@ H·Ii+1
        vext.8          $In, $In, $In, aws#8
```

is equivalent to

```
        vext.8          $In, $In, $In, aws#8
	vpmull2.p64	$Xln,$H,$t1		@ H·Ii+1
        vext.8          $In, $In, $In, aws#8
```

is equivalent to

```
	vpmull2.p64	$Xln,$H,$t1		@ H·Ii+1
        vext.8          $In, $In, $In, aws#8
        vext.8          $In, $In, $In, aws#8
```

is equivalent to

```
	vpmull2.p64	$Xln,$H,$t1		@ H·Ii+1
```
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 16, 2024
In the context of the change, t0 and IN are the same after

```
veor		$IN,$t0,$t2		@ inp^=Xi
veor		$t1,$t0,$t2		@ $t1 is rotated inp^Xi
```

Moreover, after all of

```
vpmull2.p64	$Xl,$H,$IN		@ H.lo·Xi.lo
vext.8          $IN, $IN, $IN, aws#8

veor		$t1,$t1,$IN		@ Karatsuba pre-processing
vpmull.p64	$Xm,$Hhl,$t1		@ (H.lo+H.hi)·(Xi.lo+Xi.hi)

vext.8          $IN, $IN, $IN, aws#8
```

`IN` is unchanged because it was swapped twice, and t1 only feeds
into the computation of Xm and is not used further afterwards.

Hence, the above is equivalent to

```
vpmull2.p64	$Xl,$H,$IN		@ H.lo·Xi.lo
vext.8          $t1, $IN, $IN, aws#8

veor		$t1,$t1,$IN		@ Karatsuba pre-processing
vpmull.p64	$Xm,$Hhl,$t1		@ (H.lo+H.hi)·(Xi.lo+Xi.hi)
```

removing one `vext`.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jan 16, 2024
In the context of the change, t0 and IN are the same after

```
veor		$IN,$t0,$t2		@ inp^=Xi
veor		$t1,$t0,$t2		@ $t1 is rotated inp^Xi
```

Moreover, after all of

```
vpmull2.p64	$Xl,$H,$IN		@ H.lo·Xi.lo
vext.8          $IN, $IN, $IN, aws#8

veor		$t1,$t1,$IN		@ Karatsuba pre-processing
vpmull.p64	$Xm,$Hhl,$t1		@ (H.lo+H.hi)·(Xi.lo+Xi.hi)

vext.8          $IN, $IN, $IN, aws#8
```

`IN` is unchanged because it was swapped twice, and t1 only feeds
into the computation of Xm and is not used further afterwards.

Hence, the above is equivalent to

```
vpmull2.p64	$Xl,$H,$IN		@ H.lo·Xi.lo
vext.8          $t1, $IN, $IN, aws#8

veor		$t1,$t1,$IN		@ Karatsuba pre-processing
vpmull.p64	$Xm,$Hhl,$t1		@ (H.lo+H.hi)·(Xi.lo+Xi.hi)
```

removing one `vext`.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Mar 21, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to
precompute and cache common computations across multiple
invocations of AES-GCM using the same key, thereby improving
performance.

The main example of such common precomputation is the
computation of powers of the H-value used in the GHASH algorithm
-- giving the H-Table its name. However, despite the name, the
structure of the H-Table is opaque to the code invoking AES-GCM,
and implementations are free to populate it with arbitrary data.

This freedom is already being leveraged: Currently, the AArch64
implementation of AES-GCM not only stores powers of H in the
HTable (H1-H8 in the code), but also their 'Karatsuba
preprocessing's, which are the EORs of the low and high halves.
Those naturally occur when using Karatsuba's algorithm to reduce a
128-bit polynomial multiplication over GF(2) to 3x 64-bit
polynomial.

This commit changes the structure of the H-Table for AArch64
implementations of AES-GCM slightly to obtain a small performance gain:

It is observed that every time a power of H is loaded from the
H-Table (H1-H8), the first operation that happens to it in both
aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low
and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps
can be precomputed, and the H{1-8} values stored in swapped form in the
HTable, thereby eliminating the swaps from the critical loop of AES-GCM.

This commit modifies the H-table precomputation ghash_init_v8 in the
simplest way possible to introduce the desired swaps, bracketing store
instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting
initialization code is slightly slower than the original one and will
be simplified in the next commit.
hanno-becker added a commit to hanno-becker/aws-lc that referenced this pull request Jul 8, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to
precompute and cache common computations across multiple
invocations of AES-GCM using the same key, thereby improving
performance.

The main example of such common precomputation is the
computation of powers of the H-value used in the GHASH algorithm
-- giving the H-Table its name. However, despite the name, the
structure of the H-Table is opaque to the code invoking AES-GCM,
and implementations are free to populate it with arbitrary data.

This freedom is already being leveraged: Currently, the AArch64
implementation of AES-GCM not only stores powers of H in the
HTable (H1-H8 in the code), but also their 'Karatsuba
preprocessing's, which are the EORs of the low and high halves.
Those naturally occur when using Karatsuba's algorithm to reduce a
128-bit polynomial multiplication over GF(2) to 3x 64-bit
polynomial.

This commit changes the structure of the H-Table for AArch64
implementations of AES-GCM slightly to obtain a small performance gain:

It is observed that every time a power of H is loaded from the
H-Table (H1-H8), the first operation that happens to it in both
aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low
and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps
can be precomputed, and the H{1-8} values stored in swapped form in the
HTable, thereby eliminating the swaps from the critical loop of AES-GCM.

This commit modifies the H-table precomputation ghash_init_v8 in the
simplest way possible to introduce the desired swaps, bracketing store
instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting
initialization code is slightly slower than the original one and will
be simplified in the next commit.
nebeid pushed a commit that referenced this pull request Jul 11, 2024
AArch64 assembly implementations of AES-GCM in AWS-LC use an "H-Table"
to precompute and cache common computations across multiple invocations
of AES-GCM using the same key, thereby improving performance.

The main example of such common precomputation is the computation of
powers of the H-value used in the GHASH algorithm -- giving the H-Table
its name. However, despite the name, the structure of the H-Table is
opaque to the code invoking AES-GCM, and implementations are free to
populate it with arbitrary data.

This freedom is already being leveraged: Currently, the AArch64
implementation of AES-GCM not only stores powers of H in the HTable
(H1-H8 in the code), but also their 'Karatsuba preprocessing's, which
are the EORs of the low and high halves. Those naturally occur when
using Karatsuba's algorithm to reduce a 128-bit polynomial
multiplication over GF(2) to 3x 64-bit polynomial.

This commit changes the structure of the H-Table for AArch64 implementations
of AES-GCM slightly to obtain a small performance gain:

It is observed that every time a power of H is loaded from the H-Table
(H1-H8), the first operation that happens to it in both
aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and
high halves via `ext arg.16b, arg.16b, arg.16b, #8`. Those swaps can be
precomputed, and the H{1-8} values stored in swapped form in the HTable,
thereby eliminating the swaps from the critical loop of AES-GCM.

This gives a small performance gain for AES-GCM on Graviton3, at the
cost of slightly slower one-off initialization. For Graviton2, the
AES-GCM AArch64 assembly loads the H-table only once, outside of the
critical loop; hence, there is no performance benefit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant