diff --git a/stan/math/opencl/kernel_generator.hpp b/stan/math/opencl/kernel_generator.hpp
index 6a76beb320f..bc466677ff0 100644
--- a/stan/math/opencl/kernel_generator.hpp
+++ b/stan/math/opencl/kernel_generator.hpp
@@ -2,6 +2,110 @@
 #define STAN_MATH_OPENCL_KERNEL_GENERATOR_HPP
 #ifdef STAN_OPENCL
 
+/**
+ * \ingroup opencl
+ * \defgroup opencl_kernel_generator OpenCL Kernel Generator
+ *
+ * The OpenCL kernel generator is used to combine multiple matrix operations
+ * into a single OpenCL kernel. This is much simpler than writing
+ * multi-operation kernels by hand.
+ *
+ * Because global GPU memory loads and stores are relativly slow compared to
+ * calculations in a kernel, using one kernel for multiple operations is faster
+ * than using one kernel per operation.
+ *
+ * The kernel generator uses lazy evaluation. Each operation is represented by
+ * an object derived from `operation_cl`. Such an object holds arguments of the
+ * operations as well as meta information needed to generate calculations on the
+ * arguments. Arguments to operations can be other operations, scalars
+ * or `matrix_cl` objects. An operation is evaluated when either an operation is
+ * assigned to a `matrix_cl` or a left-hand-side operation or `.eval()` is
+ * called.
+ *
+ * ## Defining a new kernel generator operation
+ *
+ * Unary functions can be added using one of the macros in
+ * `unary_functions.hpp`.
+ *
+ * New kernel generator classes must satsify the conditions below:
+ *
+ * 1. The class must be derived from a class inheriting from `operation_cl`.
+ * Optionally, if the operation should support being assigned to, it can be
+ * derived from a class inheriting `operation_cl_lhs` instead.
+ * 2. It's parent template arguments should be set to derived type, type of
+ *  scalar and types of any expression arguements.
+ * 3. Member type `Scalar` should be defined as scalar type of the result of
+ * the operation.
+ * 4. Member function `generate` has the signature
+ * ```cpp
+ * inline kernel_parts generate(const std::string& i, const std::string& j,
+ *                            const std::string& var_name_arg)
+ * ```
+ * 5. Member function `view()` should return the correct `matrix_cl_view` after
+ * applying the operation. For instance `transpose()` returns an `UPPER` View
+ * if a `matrix_cl` with a `LOWER` view was the input.
+ * 6. Member function `deep_copy` should make a copy of the expression.
+ * Arguments that are operations should be copied by calling their `deep_copy`.
+ *
+ * The following functions can optionally be defined. Defaults are implemented
+ * in `operation_cl`:
+ * - `void modify_argument_indices(std::string& i, std::string& j)`:
+ *     - Modifies what indices are passed to argument's `generate()`.
+ *     - Default: No-op
+ * - `void set_args(std::set<const operation_cl_base*>& generated,
+ *        cl::Kernel& kernel, int& arg_num)`:
+ *     - Sets additional kernel arguments.
+ *     - Default: Calls `set_args()` on arguments.
+ * - `int rows()`:
+ *     - Returns Number of rows of the result.
+ *     - Default: Returns maximum of the arguments' rows.
+ * - `int cols()`:
+ *     - Returns number of columns of the result.
+ *     - Default: Returns maximum of the arguments' columns.
+ * - `int thread_rows()`:
+ *     - Number of threads required for this operation in rows direction.
+ *     - Default: returns `rows()`.
+ * - `int thread_cols()`:
+ *     - Number of threads required for this operation in cols direction.
+ *     - Default: `cols()`.
+ * - `int bottom_diagonal()`:
+ *     - Index of bottom nonzero diagonal of the result (0 is the diagonal,
+ * positive values are superdiagonals, negative values are subdiagonals).
+ *     - Default: Returns minimum of arguments' `bottom_diagonal()`.
+ * - `int top_diagonal()`:
+ *     - Index of top nonzero diagonal of the result (0 is the diagonal,
+ * positive values are superdiagonals, negative values are subdiagonals).
+ *     - Default: Returns maximum of arguments' `top_diagonal()`.
+ *
+ * If an operation should support being assigned to it should also define the
+ * following:
+ *
+ * 1. Member function `generate_lhs` with same signature as `generate`
+ * that returns generated code when the operation is assigned to.
+ *
+ * The below functions can be optionally defined for operations that support
+ * being assigned to. Defaults are in `operation_cl_lhs`.
+ * - `void set_view(int bottom_diagonal, int top_diagonal, int
+ * bottom_zero_diagonal, int top_zero_diagonal)`:
+ *    - Sets view of the underlying `matrix_cl` depending on where the extreme
+ * sub-/super-diagonals  are written.
+ *    - Default: Calls `set_view` on expression arguments with the same
+ * arguments.
+ * - `void check_assign_dimensions(int rows, int cols)`:
+ *    - If the operation size can be modified, it should be set to the given
+ * size. Otherwise it should check that this operation's size matches given
+ * size.
+ *    - Default: By default calls `check_assign_dimensions` on expression
+ * arguments with the same arguments.
+ *
+ * A new operation should also have a user-facing function that accepts
+ * arguments to the operation and returns the operation object. Arguments should
+ * be passed trough function `as_operation_cl` so that they are wrapped in
+ * operations if they are not operations themselves. If the operation defines
+ * `modify_argument_indices` this function should make copies of arguments by
+ * calling `deep_copy()` on them internally.
+ */
+
 #include <stan/math/opencl/kernel_generator/operation_cl.hpp>
 #include <stan/math/opencl/kernel_generator/operation_cl_lhs.hpp>
 #include <stan/math/opencl/kernel_generator/as_operation_cl.hpp>