Register to access the Cerebras SDK here. Documentation for the SDK can be found here.
This repository contains examples of CSL code. Each example has the following properties:
- The source code is complete and self-contained and can be compiled using the CSL compiler.
- The compiled code can be simulated using our fabric simulator, or it can be executed on the Cerebras hardware itself.
Each example is located in its own sub-folder, and contains files of the following types:
\*.csl
: CSL source code files. If there is more than one CSL source file, the top-level file, specified when compiling, will import all other CSL files.run.py
: This script drives the simulator (or the Cerebras fabric itself). It creates input data, runs the simulator, gets the simulation result and compares to an expected result, also computed in this file.commands.sh
: Shell script which contains the exact commands you need to execute to first compile the source code and then simulate it.\*.rst
: Documentation.
The examples are divided into two main categories, tutorials and benchmarks, described in the following sections.
This is the place to start.
There are 10 tutorials which teach basic CSL language features and
SdkRuntime
host runtime features by building up an increasingly
complex code to compute a GEMV.
There are an additional 15 tutorial examples which illustrate
specific language features,
and 3 tutorial examples which build an increasingly complex
pipelined computation.
The material in the benchmarks
folder contains more complex
sample applications solving specific computational problems.
The sample applications available are:
gemv-checkerboard-pattern
: This is arguably the simplest application and therefore a good place to start. It implements generalized matrix-vector multiplication in about 100 lines of CSL.gemv-collectives_2d
: Implements GEMV, like the previous example, but uses thecollectives
library.gemm-collectives_2d
: Implements generalized matrix-matrix multiplication (GEMM) using thecollectives
library.residual
: Computes the norm of the residual of a matrix-vector multiplication. Builds on thegemv-checkerboard-pattern
example.25-pt-stencil
: A 3D 25-point stencil finite difference code for solving a wave equation with a source perturbation.bandwidth-test
: Benchmarks the bandwidth of data transfers between host and device using thememcpy
framework and theSdkRuntime
host API.spmv-hypersparse
: Computes a sparse matrix-vector product using a hypersparse matrix.7pt-stencil-spmv
: Computes a sparse matrix-vector product using a matrix generated by a 3D 7-point stencil.power-method
: Implements the Power method to compute the eigenvector of the largest eigenvalue of a matrix generated by a 7-point stencil.conjugate-gradient
: Implements the Conjugate Gradient (CG) method to approximate the solution to a system of linear equationsA*x = b
, whereA
is a matrix generated by a 7-point stencil.preconditioned-conjugate-gradient
: Implements the Preconditioned Conjugate Gradient method (PCG) to approximate the solution to a system of linear equationsA*x = b
, whereA
is a matrix generated by a 7-point stencil.bicgstab
: Implements BiCGSTAB to approximate the solution to a system of linear equationsA*x = b
, whereA
is a matrix generated by a 7-point stencil.wide-multiplication
: Implements multiplication of two 128-bit unsigned integers.histogram-torus
: A communication demo. The fabric memory is filled with random values which are then sorted into buckets, where each bucket is a single processing element of the WSE.mandelbrot
: Computes a visualization of the Mandelbrot set on a 16x16 grid of PEs.cholesky
: Computes the Cholesky decomposition of a symmetric positive- definite matrix.FFT
: Implements 1D and 2D Discrete Fourier Transforms (DFT).single-tile-matvec
: Implements highly optimizedN x N
matrix-vector products, in which each PE performs the same matrix-vector computation.row-col-broadcast
: Benchmarks the bandwidth of data transfers between host and device, where data is broadcast across a row or column of PEs, usingmemcpy_h2d_colbcast
andmemcpy_h2d_rowbcast
.game-of-life
: Implements Conway's Game of Life, where each PE is treated as a single cell.
For each release of the SDK, there is a corresponding release tag in this
repository which contains a version of the CSL examples which are compatible
with that SDK release. For example, the tag rel-sdk-1.3.0
in this
repository contains a version of the CSL examples which will work (compile and
simulate) with the SDK 1.3.0 release. The master
branch is identical to the
newest release.
Full backward compatibility of the SDK is not guaranteed. This means that a CSL example compatible with an older SDK release may not work with a newer SDK release.
For more information, see the SDK documentation here.
The End User Software License Agreement (EULA) is available here.