diff --git a/stan/math/opencl/kernel_generator.hpp b/stan/math/opencl/kernel_generator.hpp index 6a76beb320f..bc466677ff0 100644 --- a/stan/math/opencl/kernel_generator.hpp +++ b/stan/math/opencl/kernel_generator.hpp @@ -2,6 +2,110 @@ #define STAN_MATH_OPENCL_KERNEL_GENERATOR_HPP #ifdef STAN_OPENCL +/** + * \ingroup opencl + * \defgroup opencl_kernel_generator OpenCL Kernel Generator + * + * The OpenCL kernel generator is used to combine multiple matrix operations + * into a single OpenCL kernel. This is much simpler than writing + * multi-operation kernels by hand. + * + * Because global GPU memory loads and stores are relativly slow compared to + * calculations in a kernel, using one kernel for multiple operations is faster + * than using one kernel per operation. + * + * The kernel generator uses lazy evaluation. Each operation is represented by + * an object derived from `operation_cl`. Such an object holds arguments of the + * operations as well as meta information needed to generate calculations on the + * arguments. Arguments to operations can be other operations, scalars + * or `matrix_cl` objects. An operation is evaluated when either an operation is + * assigned to a `matrix_cl` or a left-hand-side operation or `.eval()` is + * called. + * + * ## Defining a new kernel generator operation + * + * Unary functions can be added using one of the macros in + * `unary_functions.hpp`. + * + * New kernel generator classes must satsify the conditions below: + * + * 1. The class must be derived from a class inheriting from `operation_cl`. + * Optionally, if the operation should support being assigned to, it can be + * derived from a class inheriting `operation_cl_lhs` instead. + * 2. It's parent template arguments should be set to derived type, type of + * scalar and types of any expression arguements. + * 3. Member type `Scalar` should be defined as scalar type of the result of + * the operation. + * 4. Member function `generate` has the signature + * ```cpp + * inline kernel_parts generate(const std::string& i, const std::string& j, + * const std::string& var_name_arg) + * ``` + * 5. Member function `view()` should return the correct `matrix_cl_view` after + * applying the operation. For instance `transpose()` returns an `UPPER` View + * if a `matrix_cl` with a `LOWER` view was the input. + * 6. Member function `deep_copy` should make a copy of the expression. + * Arguments that are operations should be copied by calling their `deep_copy`. + * + * The following functions can optionally be defined. Defaults are implemented + * in `operation_cl`: + * - `void modify_argument_indices(std::string& i, std::string& j)`: + * - Modifies what indices are passed to argument's `generate()`. + * - Default: No-op + * - `void set_args(std::set& generated, + * cl::Kernel& kernel, int& arg_num)`: + * - Sets additional kernel arguments. + * - Default: Calls `set_args()` on arguments. + * - `int rows()`: + * - Returns Number of rows of the result. + * - Default: Returns maximum of the arguments' rows. + * - `int cols()`: + * - Returns number of columns of the result. + * - Default: Returns maximum of the arguments' columns. + * - `int thread_rows()`: + * - Number of threads required for this operation in rows direction. + * - Default: returns `rows()`. + * - `int thread_cols()`: + * - Number of threads required for this operation in cols direction. + * - Default: `cols()`. + * - `int bottom_diagonal()`: + * - Index of bottom nonzero diagonal of the result (0 is the diagonal, + * positive values are superdiagonals, negative values are subdiagonals). + * - Default: Returns minimum of arguments' `bottom_diagonal()`. + * - `int top_diagonal()`: + * - Index of top nonzero diagonal of the result (0 is the diagonal, + * positive values are superdiagonals, negative values are subdiagonals). + * - Default: Returns maximum of arguments' `top_diagonal()`. + * + * If an operation should support being assigned to it should also define the + * following: + * + * 1. Member function `generate_lhs` with same signature as `generate` + * that returns generated code when the operation is assigned to. + * + * The below functions can be optionally defined for operations that support + * being assigned to. Defaults are in `operation_cl_lhs`. + * - `void set_view(int bottom_diagonal, int top_diagonal, int + * bottom_zero_diagonal, int top_zero_diagonal)`: + * - Sets view of the underlying `matrix_cl` depending on where the extreme + * sub-/super-diagonals are written. + * - Default: Calls `set_view` on expression arguments with the same + * arguments. + * - `void check_assign_dimensions(int rows, int cols)`: + * - If the operation size can be modified, it should be set to the given + * size. Otherwise it should check that this operation's size matches given + * size. + * - Default: By default calls `check_assign_dimensions` on expression + * arguments with the same arguments. + * + * A new operation should also have a user-facing function that accepts + * arguments to the operation and returns the operation object. Arguments should + * be passed trough function `as_operation_cl` so that they are wrapped in + * operations if they are not operations themselves. If the operation defines + * `modify_argument_indices` this function should make copies of arguments by + * calling `deep_copy()` on them internally. + */ + #include #include #include