Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extended kernel generator docs #1802

Merged
merged 6 commits into from
Apr 1, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions stan/math/opencl/kernel_generator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,110 @@
#define STAN_MATH_OPENCL_KERNEL_GENERATOR_HPP
#ifdef STAN_OPENCL

/**
* \ingroup opencl
* \defgroup opencl_kernel_generator OpenCL Kernel Generator
*
* The OpenCL kernel generator is used to combine multiple matrix operations
* into a single OpenCL kernel. This is much simpler than writing
* multi-operation kernels by hand.
*
* Because global GPU memory loads and stores are relativly slow compared to
* calculations in a kernel, using one kernel for multiple operations is faster
* than using one kernel per operation.
*
* The kernel generator uses lazy evaluation. Each operation is represented by
* an object derived from `operation_cl`. Such an object holds arguments of the
* operations as well as meta information needed to generate calculations on the
* arguments. Arguments to operations can be other operations, scalars
* or `matrix_cl` objects. An operation is evaluated when either an operation is
* assigned to a `matrix_cl` or a left-hand-side operation or `.eval()` is
* called.
*
* ## Defining a new kernel generator operation
*
* Unary functions can be added using one of the macros in
* `unary_functions.hpp`.
*
* New kernel generator classes must satsify the conditions below:
*
* 1. The class must be derived from a class inheriting from `operation_cl`.
* Optionally, if the operation should support being assigned to, it can be
* derived from a class inheriting `operation_cl_lhs` instead.
* 2. It's parent template arguments should be set to derived type, type of
* scalar and types of any expression arguements.
* 3. Member type `Scalar` should be defined as scalar type of the result of
* the operation.
* 4. Member function `generate` has the signature
* ```cpp
* inline kernel_parts generate(const std::string& i, const std::string& j,
* const std::string& var_name_arg)
* ```
* 5. Member function `view()` should return the correct `matrix_cl_view` after
* applying the operation. For instance `transpose()` returns an `UPPER` View
* if a `matrix_cl` with a `LOWER` view was the input.
* 6. Member function `deep_copy` should make a copy of the expression.
* Arguments that are operations should be copied by calling their `deep_copy`.
*
* The following functions can optionally be defined. Defaults are implemented
* in `operation_cl`:
* - `void modify_argument_indices(std::string& i, std::string& j)`:
* - Modifies what indices are passed to argument's `generate()`.
* - Default: No-op
* - `void set_args(std::set<const operation_cl_base*>& generated,
* cl::Kernel& kernel, int& arg_num)`:
* - Sets additional kernel arguments.
* - Default: Calls `set_args()` on arguments.
* - `int rows()`:
* - Returns Number of rows of the result.
* - Default: Returns maximum of the arguments' rows.
* - `int cols()`:
* - Returns number of columns of the result.
* - Default: Returns maximum of the arguments' columns.
* - `int thread_rows()`:
* - Number of threads required for this operation in rows direction.
* - Default: returns `rows()`.
* - `int thread_cols()`:
* - Number of threads required for this operation in cols direction.
* - Default: `cols()`.
* - `int bottom_diagonal()`:
* - Index of bottom nonzero diagonal of the result (0 is the diagonal,
* positive values are superdiagonals, negative values are subdiagonals).
* - Default: Returns minimum of arguments' `bottom_diagonal()`.
* - `int top_diagonal()`:
* - Index of top nonzero diagonal of the result (0 is the diagonal,
* positive values are superdiagonals, negative values are subdiagonals).
* - Default: Returns maximum of arguments' `top_diagonal()`.
*
* If an operation should support being assigned to it should also define the
* following:
*
* 1. Member function `generate_lhs` with same signature as `generate`
* that returns generated code when the operation is assigned to.
*
* The below functions can be optionally defined for operations that support
* being assigned to. Defaults are in `operation_cl_lhs`.
* - `void set_view(int bottom_diagonal, int top_diagonal, int
* bottom_zero_diagonal, int top_zero_diagonal)`:
* - Sets view of the underlying `matrix_cl` depending on where the extreme
* sub-/super-diagonals are written.
* - Default: Calls `set_view` on expression arguments with the same
* arguments.
* - `void check_assign_dimensions(int rows, int cols)`:
* - If the operation size can be modified, it should be set to the given
* size. Otherwise it should check that this operation's size matches given
* size.
* - Default: By default calls `check_assign_dimensions` on expression
* arguments with the same arguments.
*
* A new operation should also have a user-facing function that accepts
* arguments to the operation and returns the operation object. Arguments should
* be passed trough function `as_operation_cl` so that they are wrapped in
* operations if they are not operations themselves. If the operation defines
* `modify_argument_indices` this function should make copies of arguments by
* calling `deep_copy()` on them internally.
*/

#include <stan/math/opencl/kernel_generator/operation_cl.hpp>
#include <stan/math/opencl/kernel_generator/operation_cl_lhs.hpp>
#include <stan/math/opencl/kernel_generator/as_operation_cl.hpp>
Expand Down