Reciprocal square root, parallel r-sqrt/inv #362
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
the inverse of a second input, with a single exponentiation.
Because sqrt and inverse (by exponentiation) use very similar addition chains, some interesting functions are possible. x^((p - 3)/4) is the reciprocal square root 1/sqrt(x) (for quadratic residue x, else -1/sqrt(-x)), and already appears in the calculation of _fe_inverse.
The reciprocal root (RR) is in a way more useful than the sqrt itself, as we can calculate both the square root (x.RR) and the inverse (x.RR^4) from it very cheaply. Furthermore we can calculate RR(x) and 1/y for completely independent inputs x, y - with a single exponentiation, via RR(x.y^4). (Note that the current implementation does not try to handle x or y being 0).
Some representative benchmark results (bench_internal):
field_inverse: min 4.74us / avg 4.77us / max 4.85us
field_sqrt_var: min 4.67us / avg 4.72us / max 4.86us
field_rsqrt_var: min 4.71us / avg 4.75us / max 4.84us
field_par_rsqrt_inv_var: min 4.90us / avg 4.93us / max 5.13us
I think this can be applied to #262 to recover the output y coordinate. The trick there is to take a compressed input (i.e. x-only) X0, calculate K (== Y0^2) via the curve equation and then take the point (K.X0, K^2) as the "decompressed" point on an isomorphic curve (where the "u" value for the isom. is implicitly sqrt(K) == Y0). So we avoid a sqrt upfront. However at the end we calculate an inverse (using _fe_inv) already, so we can at that point calculate the actual RR(K) and 1/Z in parallel. Then we have sqrt(K) which is Y0 and we can choose which root based on the sign bit from the compressed input as usual. With sqrt(K) and 1/Z we can fully recover the output y.
I'll try to find some time to re-do that pull request against the latest ECDH code.