-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement finite field ccopy
, neg
, cneg
, nsqr
, ... for CUDA target
#466
Open
Vindaar
wants to merge
63
commits into
master
Choose a base branch
from
moreFFnvidia
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
E.g. when calling `addIncoming` for φ nodes, one needs to pass the current function.
`nqsr` exists both as a 'compile time' and 'runtime' version, where compile time here refers to the JIT compilation of the CUDA code.
And a very basic load / store example (useful to understand how data is passed to GPU depending on type)
That extra special `r` out of place argument was a bit useless
Because we might not always want to load directly, but rather keep the pointer to some arbitrary (nested) array.
This allows for a bit more sanity wrt differentiating between field points and EC points
We will need 'prime plus 1 div 2' later to implement `div2` for finite field points. The code for the logic is directly ported from the `precompute.nim` logic. Ideally we could avoid the duplication of logic, but well.
I mainly added to debug a bug I saw
I started with the CPU `sumImpl` template and line by line added each operation for the GPU. With a bunch of helper templates the code essentially looks identical. I checked every line to see if they match. Hence all the commented out `asy.store()` instructions and different proc signatures.
Vindaar
changed the title
Implement finite field
Implement finite field Sep 19, 2024
ccopy
, neg
, cneg
, nsqr
for CUDA targetccopy
, neg
, cneg
, nsqr
, ... for CUDA target
Clearer as to what the "test" does and adds a doc comment to the top explaining how it was used
Deals with deciding if to allocate or just pass by value. Though in a very simple manner!
EcPoint now is EcPointJac. We will have separate files for different coordinates, like for the CPU code. The distinct Array types will be defined in their respective files.
Slightly more type safe version of the templates included in some of the procs previously.
Raises at Nim runtime but LLVM code construction time. The exception should never raise under normal conditions, only if the code construction is wrong.
TODO: We still need to differentiate between 2.0 and devel due to `var object` requirement for destroy there iirc
We could consider to make all the `_internal` procedures take `EcPoint` etc types. That way overload resolution would not be a problem and we'd avoid more nasty bugs
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This extends the existing
add
,mul
,sub
operations on finite fields on the Nvidia target byccopy
,neg
,cneg
,nsqr
. Fornsqr
we have two different implementations for now.I added a simple test case for each operation, which compares with the equivalent operation on the CPU. Currently I just cloned the test file for each case. It'd probably be wise to try to unite them / abstract some of the boilerplate a bit.
Update 2024/09/19
I have now added the following more finite field operations:
setZero
cadd
csub
isZero
isOdd
shiftRight
div2
In addition, I split all the implementations for each finite field op into an internal and public part. That way we can reuse the operations in other ops.
Further, I added distinct
Array
types to deal with finite field points and elliptic curve points. With those, I then added a 'test case' of sorts, where I ported the EC short Weierstrass JacobiansumImpl
logic line by line. With the help of some templates the implementation is essentially the same as for the CPU now and the code produces the same result.Next I'll clean up that test case and add the EC point addition to the
pub_curves.nim
module.Update 2024/09/23
I've now added the EC addition to a new
pub_curves.nim
module. In addition, I added a helper typeNvidiaAssembler
to simplify the Nvidia setup & compilation process. It can now be done with 2 lines:(alternatively one can use the
Assembler_LLVM
part of theNvidiaAssembler
object to build instructions / call a function and then just pass the name of the kernel tocompile
)Using this, the boilerplate in the tests is reduced massively.
Note: The
t_ec_sum_port.nim
file is the one I used to write the port. I think it can be useful to give ideas on how to port some our existing CPU logic to the LLVM target.Next: I'll add other EC operations.
Update 2024/09/26
Over the last few days I have:
EcPoint
type intoEcPointAff
,EcPointJac
andEcPointProj
, which each have their own set of internal / public proceduresfieldOps
,ellipticOps
,ellipticAffOps
that inject multiple templates for operations to allow writing code equivalent to the CPU code that canmixedSum
for Jacobian + Affine EC points. Thanks to the templates, all that was needed to port the code was replace variable declarations and=
assignments byvar X = asy.newEcPointJac(ed)
andstore(X, Y)
. Aside from a minor issue due to anisNeutral
name collision (internal procs are still usingValueRef
as arguments and hence overload resolution is an issue; since renamed the affine version), it worked immediately.