Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: SIMD and other goodies #534

Draft
wants to merge 89 commits into
base: master
Choose a base branch
from

Conversation

tmcdonell
Copy link
Member

@tmcdonell tmcdonell commented Sep 28, 2023

Motivation and context
A while ago @HugoPeters1024 asked something along the lines of "how do I make a SIMD vector of this data type". The answer was "you can't", and so he made a workaround instead that didn't require it.

This patch fixes that infelicity. Now you can:

data Thing = Thing Int Float
  deriving (Generic, Elt, SIMD n) -- for some fixed or variable type nat `n`

and then:

V2 (Thing 0 1) (Thing 2 3)

will create a 2-wide SIMD-vector-like thing, where internally the usual struct-of-array type of thing is happening to store the ints and floats in their own vector registers.

Of course, just being able to represent SIMD vector data in the language isn't enough, you need to be able to compute with that data as well. Now you can write something like this SIMD-vectorised Mandelbrot computation (adapted from here), where the SIMD width is altered by changing the type:

type Complex n a = (Vec n a, Vec n a)

mandelbrot
    :: forall n a. (P.Num a, P.Enum a, RealFloat a, FromIntegral Int a, Num (Vec n a), RealFloat (Vec n a), VOrd n a)
    => Int                                  -- ^ image width
    -> Int                                  -- ^ image height
    -> Acc (Scalar a)                       -- ^ centre x
    -> Acc (Scalar a)                       -- ^ centre y
    -> Acc (Scalar a)                       -- ^ view width
    -> Acc (Scalar Int32)                   -- ^ iteration limit
    -> Acc (Scalar a)                       -- ^ divergence radius
    -> Acc (Array DIM2 (Complex n a, Vec n Bool, Vec n Int32))
mandelbrot screenX screenY (the -> x0) (the -> y0) (the -> width) (the -> limit) (the -> radius) =
  generate (constant (Z :. screenY :. screenX'))
           (\ix -> let z0 = complexOfPixel ix
                       zn = while (\(T3 _ m i) -> vor m && vand (i <* splat limit))
                                  (\(T3 z _ i) -> let z' = step z0 z
                                                      m' = dot z' <* splat radius
                                                      i' = i + fromBool m'
                                                  in
                                                  T3 z' m' i')
                                  (T3 z0 (splat True) 0)
                   in
                   zn)
  where
    n         = fromInteger (natVal' (proxy# :: Proxy# n))
    screenX'  = screenX `quot` n

    -- Convert the given array index, representing a pixel in the final image,
    -- into the corresponding point on the complex plane.
    complexOfPixel :: Exp DIM2 -> Exp (Complex n a)
    complexOfPixel (I2 j i) =
      let
          height = P.fromIntegral screenY / P.fromIntegral screenX * width
          xmin   = x0 - width  / 2
          ymin   = y0 - height / 2
          --
          offset = pack [constant x | x <- [0..]]
          re     = splat xmin + ((splat (fromIntegral (i * constant n)) + offset) * splat width)  / splat (fromIntegral (constant screenX))
          im     = splat (ymin + (fromIntegral j * height) / fromIntegral (constant screenY))
      in
      T2 re im

    -- Divergence condition
    dot :: Exp (Complex n a) -> Exp (Vec n a)
    dot (T2 x y) = x*x + y*y  -- fast but numerically dodgy; ignores floating-point rounding

    -- Take a single step of the recurrence relation
    step :: Exp (Complex n a) -> Exp (Complex n a) -> Exp (Complex n a)
    step (T2 cr ci) (T2 zr zi) = T2 (cr + zr * zr) (ci + zi * zi)

Description
Since we are making fundamental changes to the internal representation of types, we might as well tick off a bunch of TODOs at the same time. Here is a rough list of changes in this patch, but this has been in progress for a while so I've probably forgotten some stuff.

  • Representation types only support fixed width types; none of this variable-size Int and Word nonsense; that gets stripped out at the surface level now
  • Support for 8, 16, 32, 64, and 128-bit signed and unsigned integers
    • EDIT: if you look closely you'll see that the encoding can represent arbitrary bit-width integers, but reflecting that property into the Haskell type system, as is required at various points, is currently not possible. Still, I have found it occasionally useful to be able to do this, so maybe at some point this can be expanded upon.
  • Support for 16, 32, 64, and 128-bit floating point numbers. The latter is guarded behind a flag -ffloat128 since that needs the quadmath library installed (gcc/linux x86_64 will have it, other platforms may or may not).
    • EDIT: I had considered adding other floating-point types as well, bfloat16 or 8-bit floats (which one?) but I'll leave that for now
  • Bool is now stored as a single bit; vectors of Bool are stored as packed bit-vectors
  • SIMD vectors of all of the above
  • The type class hierarchy changes in various subtle (and not so subtle) ways in order to accommodate SIMD types
  • Several new type classes specifically for vector types; VEq, VOrd, VNum
    • e.g. when comparing SIMD vectors for ordering there are a couple ways you can interpret that. The usual Ord instance will do it in the same way as ordering on tuples, whereas VOrd will do it lane-wise. Similarly, min will do the tuple-style minimum whereas vmin will do it lane-wise and vminimum will give you the minimum element in that vector (i.e. horizontal reduction across the vector) etc. etc.
  • Extra functions for dealing with SIMD vectors (pack, unpack, insert, extract)
  • Pattern synonyms for creating SIMD values/expressions
  • FromBool to convert Bools to and from integral types
  • Coerce arrays from one type to another. This will change the (innermost) dimension of the array as necessary. Useful to e.g. convert between Vector Float and Vector (Vec 2 Float), or Vector Bool (bit vector) and Vector Word64, etc. Works for structured types (pairs, etc.) as well.
  • Various embedding-polymorphic pattern synonyms, so that you can use (e.g.) V2 to construct a 2-wide SIMD vector at both the Haskell value level as well as the Accelerate expression level. I find it useful, but otoh it does hide a bit what is going on, so the type errors can be worse (as usual).
  • Some extra primops, bitreverse, byteswap, things specific for SIMD vectors, etc.
  • maybe other stuff...? if you see something I'll add it here

There are still TODOs in the code and to an extent the design depends on what code we can eventually generate from it (in progress), but I'm just throwing this up for now to start getting eyes on it.

How has this been tested?
>_>

Types of changes

  • Bug fix (non-breaking change which fixes an issue) (yeah I think I found some bugs in the instances for tuples?)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change) (for backends at least)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • TODO: I have added tests to cover my changes
  • All new and existing tests passed

HugoPeters1024 and others added 30 commits October 19, 2021 23:56
…/accelerate into wip/vector-operations

# Conflicts:
#	src/Data/Array/Accelerate/Trafo/Algebra.hs
Previously we could represent SIMD types (e.g. for complex numbers) but calculations directly in this representation were not possible.

User defined (sum and product) types can also be stored in SIMD format through the (generic derivable) SIMD class. Nested SIMD types are not supported.

New type classes VOrd and VEq which provide operations from Eq and Ord that compute their result per-lane.

Types of member functions of several (otherwise standard H98) classes are changed in order to support SIMD types, in particular Bits.
[ 99 of 112] Compiling Data.Array.Accelerate.Lift
ghc: panic! (the 'impossible' happened)
  (GHC version 8.6.5 for x86_64-unknown-linux):
	expectJust mkOneConFull
CallStack (from HasCallStack):
  error, called at compiler/utils/Maybes.hs:55:27 in ghc:Maybes
  expectJust, called at compiler/deSugar/Check.hs:1312:37 in ghc:Check
This should now match the behaviour of the tuple instances
# Conflicts:
#	.github/workflows/ci-linux.yml
#	accelerate.cabal
#	src/Data/Array/Accelerate/AST/Idx.hs
#	src/Data/Array/Accelerate/Array/Data.hs
#	src/Data/Array/Accelerate/Classes/RealFrac.hs
#	src/Data/Array/Accelerate/Classes/RealFrac.hs-boot
#	src/Data/Array/Accelerate/Data/Complex.hs
#	src/Data/Array/Accelerate/Language.hs
#	src/Data/Array/Accelerate/Prelude.hs
#	src/Data/Array/Accelerate/Sugar/Vec.hs
#	stack-8.10.yaml
#	stack-9.2.yaml
#	stack.yaml
ghc-iserv: munmapForLinker: m32_release_page: Failed to unmap 4096 bytes at ...: Attempt to access invalid address.
- VNum class for horizontal sum/product of a vector
- vminimum/vmaximum for horizontal vector min/max reduction
- bitreverse
- byteswap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants