Skip to content

Cerebras/csl-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSL Examples

Register to access the Cerebras SDK here. Documentation for the SDK can be found here.

This repository contains examples of CSL code. Each example has the following properties:

  • The source code is complete and self-contained and can be compiled using the CSL compiler.
  • The compiled code can be simulated using our fabric simulator, or it can be executed on the Cerebras hardware itself.

Each example is located in its own sub-folder, and contains files of the following types:

  • \*.csl : CSL source code files. If there is more than one CSL source file, the top-level file, specified when compiling, will import all other CSL files.
  • run.py: This script drives the simulator (or the Cerebras fabric itself). It creates input data, runs the simulator, gets the simulation result and compares to an expected result, also computed in this file.
  • commands.sh: Shell script which contains the exact commands you need to execute to first compile the source code and then simulate it.
  • \*.rst: Documentation.

The examples are divided into two main categories, tutorials and benchmarks, described in the following sections.

Tutorials

This is the place to start. There are 10 tutorials which teach basic CSL language features and SdkRuntime host runtime features by building up an increasingly complex code to compute a GEMV. There are an additional 15 tutorial examples which illustrate specific language features, and 3 tutorial examples which build an increasingly complex pipelined computation.

Benchmarks

The material in the benchmarks folder contains more complex sample applications solving specific computational problems.

The sample applications available are:

  • gemv-checkerboard-pattern: This is arguably the simplest application and therefore a good place to start. It implements generalized matrix-vector multiplication in about 100 lines of CSL.
  • gemv-collectives_2d: Implements GEMV, like the previous example, but uses the collectives library.
  • gemm-collectives_2d: Implements generalized matrix-matrix multiplication (GEMM) using the collectives library.
  • residual: Computes the norm of the residual of a matrix-vector multiplication. Builds on the gemv-checkerboard-pattern example.
  • 25-pt-stencil: A 3D 25-point stencil finite difference code for solving a wave equation with a source perturbation.
  • bandwidth-test: Benchmarks the bandwidth of data transfers between host and device using the memcpy framework and the SdkRuntime host API.
  • spmv-hypersparse: Computes a sparse matrix-vector product using a hypersparse matrix.
  • 7pt-stencil-spmv: Computes a sparse matrix-vector product using a matrix generated by a 3D 7-point stencil.
  • power-method: Implements the Power method to compute the eigenvector of the largest eigenvalue of a matrix generated by a 7-point stencil.
  • conjugate-gradient: Implements the Conjugate Gradient (CG) method to approximate the solution to a system of linear equations A*x = b, where A is a matrix generated by a 7-point stencil.
  • preconditioned-conjugate-gradient: Implements the Preconditioned Conjugate Gradient method (PCG) to approximate the solution to a system of linear equations A*x = b, where A is a matrix generated by a 7-point stencil.
  • bicgstab: Implements BiCGSTAB to approximate the solution to a system of linear equations A*x = b, where A is a matrix generated by a 7-point stencil.
  • wide-multiplication: Implements multiplication of two 128-bit unsigned integers.
  • histogram-torus: A communication demo. The fabric memory is filled with random values which are then sorted into buckets, where each bucket is a single processing element of the WSE.
  • mandelbrot: Computes a visualization of the Mandelbrot set on a 16x16 grid of PEs.
  • cholesky: Computes the Cholesky decomposition of a symmetric positive- definite matrix.
  • FFT: Implements 1D and 2D Discrete Fourier Transforms (DFT).
  • single-tile-matvec: Implements highly optimized N x N matrix-vector products, in which each PE performs the same matrix-vector computation.
  • row-col-broadcast: Benchmarks the bandwidth of data transfers between host and device, where data is broadcast across a row or column of PEs, using memcpy_h2d_colbcast and memcpy_h2d_rowbcast.
  • game-of-life: Implements Conway's Game of Life, where each PE is treated as a single cell.

Branches

For each release of the SDK, there is a corresponding release tag in this repository which contains a version of the CSL examples which are compatible with that SDK release. For example, the tag rel-sdk-1.3.0 in this repository contains a version of the CSL examples which will work (compile and simulate) with the SDK 1.3.0 release. The master branch is identical to the newest release.

Full backward compatibility of the SDK is not guaranteed. This means that a CSL example compatible with an older SDK release may not work with a newer SDK release.

Documentation

For more information, see the SDK documentation here.

End User License Agreement

The End User Software License Agreement (EULA) is available here.