Merge remote-tracking branch 'origin/develop' into experimental/laplace

stan-dev · Jul 22, 2024 · e76c980 · e76c980
2 parents 04b17c5 + 80d22fb
commit e76c980
Show file tree

Hide file tree

Showing 459 changed files with 2,715 additions and 6,021 deletions.
diff --git a/RELEASE-NOTES.txt b/RELEASE-NOTES.txt
@@ -1,5 +1,41 @@
 Stan Math Library Release Notes
 
+======================================================================
+v4.9.0 (3 June 2024)
+======================================================================
+
+ - Stan now detects if your compiler supports C++17 and will issue a warning if it is not available. Support for pre-C++17 compilers may be removed as early as next version. (#3063)
+ - Added a new distribution: `wiener_full_lpdf`. (#2822)
+ - Replaced `static const` objects with `static constexpr` when available. (#2830)
+ - Gradients for the `hypergeometric_pFq` function rewritten for increased speed and stability. (#2961)
+ - Added the Hypergeometric 1F0 function and gradients. (#2962)
+ - Removed macro usage in the definitions of the `require_` templates. (#2965)
+ - Added a sparse matrix implimentation for `arena_matrix`. (#2971)
+ - Added hand-calculated derivatives for the multivariate normal lpdf. (#2980)
+ - Added derivatives for the `multi_student_t_cholesky_lpdf`. (#2990)
+ - Added constrain function for creating matrices holding row or column simplexes. (#2992)
+ - Updated TBB Windows build rules for compatibility with RTools make. (#2999)
+ - Improved efficiency of `von_mises_lpdf` gradient calculation, credit to @venpopov. (#3010)
+ - Fixed a few additional issues in checking the inputs to distributions defined over cholesky matrices. (#3012)
+ - Functions which are wrappers around CVODES and IDAS routines from Sundials now throw `domain_error`, rather than a mix of `domain_error` and `runtime_error`. (#3013)
+ - Fixed the stack_allocator being able to return non-8-byte aligned pointers. (#3014)
+ - Upgrade bundled Boost headers to 1.84. (#3001, #3018, #3019)
+ - Fixed a bug where `linspaced_array` was returning its results truncated to integers. (#3023)
+ - Fixed the return type of `max(int, int)` being a double. (#3024)
+ - Added 'override' to built-in make variables. (#3028)
+ - Maded `sqrt(x)` have a finite gradient at `x=0`. Allows `distance(x,y)` to work for `x=y` too. (#3033)
+ - Improved stability of the `von_mises_lpdf` function to avoid numeric overflow. Now it's no longer necessary to use the normal_lpdf for kappa>100, allowing vectorizing of the likelihood. (#3036)
+ - Updated the `weibull_cdf` and `weibull_lcdf` functions for numerical stability as well as correct handling of `y == 0.0`. (#3037)
+ - Improved speed of `inv_Phi` and `std_normal_qf` functions. (#3046)
+ - Fixed spurious linker issue with `csr_matrix_times_vector`. (#3048, #3053)
+ - Improved error messages when variables dimensions do not match in operations that require it. (#3049)
+ - Added support for Windows ARM64 with RTools ARM64 toolchain. (#3051)
+ - Fixed `clean-all` error when using external SUNDIALS libraries. (#3054)
+ - Fixed several small floating-point accuracy issues with ARM64 platforms. (#3059)
+ - The RNGs for the multinomial and multinomial_logit distributions now accept a zero total count, resulting in a zero integer vector. (#3061)
+ - Fixed a data race in profiling in multiple threads. (#3066)
+ - Backported SUNDIALS bugfix for hashmap integer overflow. (#3072) 
+
 ======================================================================
 v4.8.1 (23 January 2024)
 ======================================================================

diff --git a/doxygen/README.md b/doxygen/README.md
@@ -125,8 +125,7 @@ export TBB_LIB="$TBB/lib/intel64/gcc4.8"
 
 - Set `Stan` local compiler flags to use the new TBB interface:
 ```bash
-mkdir -p ~/.config/stan
-echo TBB_INTERFACE_NEW=true>> ~/.config/stan/make.local
+echo TBB_INTERFACE_NEW=true>> ./make/local
 ```
 
 Compilers

diff --git a/doxygen/contributor_help_pages/common_pitfalls.md b/doxygen/contributor_help_pages/common_pitfalls.md
@@ -154,8 +154,6 @@ The implementation of @ref stan::math::make_holder is [here](https://github.com/
 
 ### Move Semantics
 
-In general, Stan Math does not use move semantics very often.
-This is because of our arena allocator.
 Move semantics generally work as
 
 ```cpp
@@ -179,6 +177,96 @@ We can see in the above that the standard style of a move (the constructor takin
 But in Stan, particularly for reverse mode, we need to keep memory around even if it's only temporary for when we call the gradient calculations in the reverse pass.
 And since memory for reverse mode is stored in our arena allocator no copying happens in the first place.
 
+Functions for Stan Math's reverse mode autodiff should use [_perfect forwarding_](https://drewcampbell92.medium.com/understanding-move-semantics-and-perfect-forwarding-part-3-65575d523ff8) arguments. Perfect forwarding arguments use a template parameter wit no attributes such as `const` and `volatile` and have a double ampersand `&&` next to them.
+
+```c++
+  template <typename T>
+  auto my_function(T&& x) {
+    return my_other_function(std::forward<T>(x));
+  }
+```
+
+The `std::forward<T>` in the in the code above tells the compiler that if `T` is deduced to be an rvalue reference (such as `Eigen::MatrixXd&&`), then it should be moved to `my_other_function`, where there it can possibly use another objects move constructor to reuse memory.
+A perfect forwarding argument of a function accepts any reference type as its input argument. 
+The above signature is equivalent to writing out several functions with different reference types
+
+```c++
+  // Accepts a plain lvalue reference
+  auto my_function(Eigen::MatrixXd& x) {
+    return my_other_function(x);
+  }
+  // Accepts a const lvalue reference
+  auto my_function(const Eigen::MatrixXd& x) {
+    return my_other_function(x);
+  }
+  // Accepts an rvalue reference
+  auto my_function(Eigen::MatrixXd&& x) {
+    return my_other_function(std::move(x));
+  }
+  // Accepts a const rvalue reference
+  auto my_function(const Eigen::MatrixXd&& x) {
+    return my_other_function(std::move(x));
+  }
+```
+
+In Stan, perfect forwarding is used in reverse mode functions which can accept an Eigen matrix type.
+
+```c++
+template <typename T, require_eigen_vt<is_var, T>* = nullptr>
+inline auto sin(T&& x) {
+  // Store `x` on the arena
+  arena_t<T> x_arena(std::forward<T>(x));
+  arena_t<T> ret(x_arena.val().array().sin().matrix());
+  reverse_pass_callback([x_arena, ret] mutable {
+    x_arena.adj() += ret.adj().cwiseProduct(x_arena.val().array().cos().matrix());
+  });
+  return ret;
+}
+```
+
+Let's go through the above line by line.
+
+```c++
+template <typename T, require_eigen_vt<is_var, T>* = nullptr>
+inline auto sin(T&& x) {
+```
+
+The signature for this function has a template `T` that is required to be an Eigen type with a `value_type` that is a `var` type. 
+The template parameter `T` is then used in the signature as an perfect forwarding argument. 
+
+```c++
+  // Store `x` on the arena
+  arena_t<T> x_arena(std::forward<T>(x));
+```
+
+The input is stored in the arena, which is where the perfect forwarding magic actually occurs. 
+If `T` is an lvalue type such as `Eigen::MatrixXd&` then `arena_matrix` will use it's copy constructor, creating new memory in Stan's arena allocator and then copying the values of `x` into that memory. 
+But if `T` was a temporary rvalue type such as `Eigen::MatrixXd&&`, then the `arena_matrix` class will use it's move constructor to place the temporary matrix in Stan's `var_alloc_stack_`. 
+The `var_alloc_stick_` is used to hold objects that were created outside of the arena allocator but need to be deleted when the arena allocator is cleared. 
+This allows the `arena_matrix` to reuse the memory from the temporary matrix. Then the matrix will be deleted once arena allocator requests memory to be cleared.
+
+```c++
+  arena_t<T> ret(x_arena.val().array().sin().matrix());
+```
+
+This construction of an `arena_matrix` will *not* use the move constructor for `arena_matrix`. 
+Here, `x_arena` is an `arena_matrix<T>`, which is then wrapped in an expression to compute the elementwise `sin`. 
+That expression will be evaluated into new memory allocated in the arena allocator and then a pointer to it will be stored in the `arena_matrix.`
+
+```c++
+  reverse_pass_callback([x_arena, ret] mutable {
+    x_arena.adj() += ret.adj().cwiseProduct(x_arena.val().array().cos().matrix());
+  });
+  return ret;
+```
+
+The rest of this code follows the standard format for the rest of Stan Math's reverse mode that accepts Eigen types as input. 
+The `reverse_pass_callback` function accepts a lambda as input and places the lambda in Stan's callback stack to be called later when `grad()` is called by the user.
+Since `arena_matrix` types only store a pointer to memory allocated elsewhere they are copied into the lambda. 
+The body of the lambda holds the gradient calculation needed for the reverse mode pass.
+
+Then finally `ret`, the `arena_matrix` type is returned by the function.
+
 When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy.
 For instance, Stan's `var` type uses the pointer to implementation (PIMPL) pattern, so it simply holds a pointer of size 8 bytes.
 A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs with at least 64-byte cache lines.
@@ -190,6 +278,45 @@ The general rules to follow for passing values to a function are:
 2. If you are writing a function for reverse mode, pass values by `const&`
 3. In prim, if you are confident and working with larger types, use perfect forwarding to pass values that can be moved from. Otherwise simply pass values by `const&`.
 
+### Using auto is Dangerous With Eigen Matrix Functions in Reverse Mode
+
+The use of auto with the Stan Math library should be used with care, like in [Eigen](https://eigen.tuxfamily.org/dox/TopicPitfalls.html). 
+Along with the cautions mentioned in the Eigen docs, there are also memory considerations when using reverse mode automatic differentiation. 
+When returning from a function in the Stan Math library with an Eigen matrix output with a scalar `var` type, the actual returned type will often be an `arena_matrix<Eigen::Matrix<...>>`. 
+The `arena_matrix` class is an Eigen matrix where the underlying array of memory is located in Stan's memory arena. 
+The `arena_matrix` that is returned by Math functions is normally the same one resting in the callback used to calculate gradients in the reverse pass.
+Directly changing the elements of this matrix would also change the memory the reverse pass callback sees which would result in incorrect calculations.
+
+The simple solution to this is that when you use a math library function that returns a matrix and then want to assign to any of the individual elements of the matrix, assign to an actual Eigen matrix type instead of using auto. 
+In the below example, we see the first case which uses auto and will change the memory of the `arena_matrix` returned in the callback for multiply's reverse mode.
+Directly below it is the safe version, which just directly assigns to an Eigen matrix type and is safe to do element insertion into.
+
+```c++
+Eigen::Matrix<var, -1, 1> y;
+Eigen::Matrix<var, -1, -1> X;
+// Bad!! Will change memory used by reverse pass callback within multiply!
+auto mu = multiply(X, y);
+mu(4) = 1.0;
+// Good! Will not change memory used by reverse pass callback within multiply
+Eigen::Matrix<var, -1, 1> mu_good = multiply(X, y);
+mu_good(4) = 1.0;
+```
+
+The reason we do this is for cases where function returns are passed to other functions. 
+An `arena_matrix` will always make a shallow copy when being constructed from another `arena_matrix`, which lets the functions avoid unnecessary copies.
+
+```c++
+Eigen::Matrix<var, -1, 1> y1;
+Eigen::Matrix<var, -1, -1> X1;
+Eigen::Matrix<var, -1, 1> y2;
+Eigen::Matrix<var, -1, -1> X2;
+auto mu1 = multiply(X1, y1);
+auto mu2 = multiply(X2, y2);
+// Inputs not copied in this case!
+auto z = add(mu1, mu2);
+```
+
+
 ### Passing variables that need destructors called after the reverse pass (`make_chainable_ptr`)
 
 When possible, non-arena variables should be copied to the arena to be used in the reverse pass.
@@ -242,22 +369,17 @@ grad();
 ```
 
 Now `res` is `innocent_return` and we've changed one of the elements of `innocent_return`, but that is also going to change the element of `res` which is being used in our reverse pass callback!
-The answer for this is simple but sadly requires a copy.
 
-```cpp
-template <typename EigVec, require_eigen_vt<is_var, EigVec>* = nullptr>
-inline var cool_fun(const EigVec& v) {
-  arena_t<EigVec> arena_v(v);
-  arena_t<EigVec> res = arena_v.val().array() * arena_v.val().array();
-  reverse_pass_callback([res, arena_v]() mutable {
-    arena_v.adj().array() += (2.0 * res.adj().array()) * arena_v.val().array();
-  });
-  return plain_type_t<EigVec>(res);
-}
-```
+Care must be taken by end users of Stan Math by using `auto` with caution. 
+When a user wishes to manipulate the coefficients of a matrix that is a return from a function in Stan Math, they should assign the matrix to a plain Eigen type.
 
-we make a deep copy of the return whose inner `vari` will not be the same, but the `var` will produce a new copy of the pointer to the `vari`.
-Now the user code above will be protected, and it is safe for them to assign to individual elements of the `auto` returned matrix.
+```c++
+Eigen::Matrix<var, -1, 1> x = Eigen::Matrix<double, -1, 1>::Random(5);
+Eigen::MatrixXd actually_innocent_return = cool_fun(x);
+actually_innocent_return.coeffRef(3) = var(3.0);
+auto still_unsafe_return = cool_fun2(actually_innocent_return);
+grad();
+```
 
 ### Const correctness, reverse mode autodiff, and arena types
 

diff --git a/doxygen/contributor_help_pages/developer_doc.md b/doxygen/contributor_help_pages/developer_doc.md
@@ -165,9 +165,7 @@ Within the Math library, the only build targets are the tests (there are no othe
 
 ## Makefile Variables
 
-To customize how the tests are built in C++, variables can be set in a file called `make/local`. This is the preferred way of customizing the build. You can also set it more permanently by setting variables in `~/.config/stan/make.local`.
-
-Note: variables in `make/local` override variables in `~/.config/stan/make/local`.
+To customize how the tests are built in C++, variables can be set in a file called `make/local`. This is the preferred way of customizing the build.
 
 There are a lot of make variables that can be set. In general, `CXXFLAGS_*` is for C++ compiler flags, `CPPFLAGS_*` is for C preprocessor flags, `LDFLAGS_*` is for linker flags, and `LDLBIS_*` is for libraries that need to be linked in.
 
@@ -310,7 +308,7 @@ If you want to use custom locations for the library locations, set these makefil
 - `GTEST`
 - `CPPLINT` (optional)
 
-Example `~/.config/stan/make.local` file:
+Example `make/local` file:
 ```
 BOOST = ~/boost
 ```

diff --git a/doxygen/contributor_help_pages/distribution_tests.md b/doxygen/contributor_help_pages/distribution_tests.md
@@ -28,7 +28,7 @@ Here, I'm just going to describe the steps that are taken to run the tests. Deta
 
 1. `make generate-tests` is called.
     1. This first builds the `test/prob/generate_tests` executable from `test/prob/generate_tests.cpp`.
-    2. For each test file inside `test/prob/*/*`, it will call the executable with the test file as the first argument and the number of template instantiations per file within the second argument. For example, for testing the `bernoulli_log()` function, make will call: `test/prob/generate_tests test/prob/bernoulli/bernoulli_test.hpp 100`
+    2. For each test file inside `test/prob/*/*`, it will call the executable with the test file as the first argument and the number of template instantiations per file within the second argument. For example, for testing the `bernoulli_lpmf()` function, make will call: `test/prob/generate_tests test/prob/bernoulli/bernoulli_test.hpp 100`
     The call to the executable will generate 5 different test files, all within the `test/prob/bernoulli/` folder:
         - bernoulli\_00000\_generated\_fd\_test.cpp
         - bernoulli\_00000\_generated\_ffd\_test.cpp

diff --git a/make/standalone b/make/standalone
@@ -14,7 +14,6 @@
 
 MATH_MAKE ?=$(dir $(abspath $(firstword $(MAKEFILE_LIST))))
 MATH ?= $(realpath $(MATH_MAKE)..)/
--include $(HOME)/.config/stan/make.local
 -include $(MATH)make/local
 -include $(MATH)make/compiler_flags
 -include $(MATH)make/libraries

diff --git a/makefile b/makefile
@@ -2,16 +2,11 @@
 # Stan Math Library
 # -----------------
 #
-# To customize your build, set make variables in either:
-#    ~/.config/stan/make.local
-#    make/local
-# Variables in make/local is loaded after ~/.config/stan/make.local
-
+# To customize your build, set make variables in the file make/local.
 
 ## 'help' is the default make target.
 help:
 
--include $(HOME)/.config/stan/make.local  # user-defined variables
 -include make/local                       # user-defined variables
 
 include make/compiler_flags               # CXX, CXXFLAGS, LDFLAGS set by the end of this file

diff --git a/runTests.py b/runTests.py
@@ -23,28 +23,34 @@
 
 allowed_paths_with_jumbo = [
     "test/unit/math/prim/",
+    "test/unit/math/prim/constraint",
     "test/unit/math/",
     "test/unit/math/rev/",
+     "test/unit/math/rev/constraint",
     "test/unit/math/fwd/",
+    "test/unit/math/fwd/constraint",
     "test/unit/math/mix/",
     "test/unit/math/mix/fun/",
     "test/unit/math/opencl/",
     "test/unit/",
 ]
 
 jumbo_folders = [
+    "test/unit/math/prim/constraint",
     "test/unit/math/prim/core",
     "test/unit/math/prim/err",
     "test/unit/math/prim/fun",
     "test/unit/math/prim/functor",
     "test/unit/math/prim/meta",
     "test/unit/math/prim/prob",
+     "test/unit/math/rev/constraint",
     "test/unit/math/rev/core",
     "test/unit/math/rev/err",
     "test/unit/math/rev/fun",
     "test/unit/math/rev/functor",
     "test/unit/math/rev/meta",
     "test/unit/math/rev/prob",
+    "test/unit/math/fwd/constraint",
     "test/unit/math/fwd/core",
     "test/unit/math/fwd/fun",
     "test/unit/math/fwd/functor",
@@ -58,7 +64,9 @@
     "test/unit/math/opencl/device_functions",
     "test/unit/math/opencl/kernel_generator",
     "test/unit/math/opencl/prim",
+    "test/unit/math/opencl/prim/constraint",
     "test/unit/math/opencl/rev",
+    "test/unit/math/opencl/rev/constraint",
 ]
 
 

diff --git a/stan/math/fwd.hpp b/stan/math/fwd.hpp
@@ -5,8 +5,10 @@
 
 #ifdef STAN_OPENCL
 #include <stan/math/opencl/prim.hpp>
+#include <stan/math/opencl/prim_constraint.hpp>
 #endif
 
+#include <stan/math/fwd/constraint.hpp>
 #include <stan/math/fwd/core.hpp>
 #include <stan/math/fwd/meta.hpp>
 #include <stan/math/fwd/fun.hpp>

diff --git a/stan/math/fwd/constraint.hpp b/stan/math/fwd/constraint.hpp
@@ -0,0 +1,9 @@
+#ifndef STAN_MATH_FWD_CONSTRAINT_HPP
+#define STAN_MATH_FWD_CONSTRAINT_HPP
+
+#include <stan/math/prim/fun/Eigen.hpp>
+#include <stan/math/fwd/fun/Eigen_NumTraits.hpp>
+
+#include <stan/math/fwd/constraint/unit_vector_constrain.hpp>
+
+#endif
diff --git a/stan/math/fwd/fun/unit_vector_constrain.hpp → .../fwd/constraint/unit_vector_constrain.hpp b/stan/math/fwd/fun/unit_vector_constrain.hpp → .../fwd/constraint/unit_vector_constrain.hpp
@@ -1,5 +1,5 @@
-#ifndef STAN_MATH_FWD_FUN_UNIT_VECTOR_CONSTRAIN_HPP
-#define STAN_MATH_FWD_FUN_UNIT_VECTOR_CONSTRAIN_HPP
+#ifndef STAN_MATH_FWD_CONSTRAINT_UNIT_VECTOR_CONSTRAIN_HPP
+#define STAN_MATH_FWD_CONSTRAINT_UNIT_VECTOR_CONSTRAIN_HPP
 
 #include <stan/math/fwd/core.hpp>
 #include <stan/math/fwd/fun/tcrossprod.hpp>
@@ -8,7 +8,7 @@
 #include <stan/math/prim/fun/dot_self.hpp>
 #include <stan/math/prim/fun/Eigen.hpp>
 #include <stan/math/prim/fun/inv.hpp>
-#include <stan/math/prim/fun/unit_vector_constrain.hpp>
+#include <stan/math/prim/constraint/unit_vector_constrain.hpp>
 #include <stan/math/prim/fun/tcrossprod.hpp>
 #include <cmath>
 

diff --git a/stan/math/fwd/fun.hpp b/stan/math/fwd/fun.hpp
@@ -121,7 +121,7 @@
 #include <stan/math/fwd/fun/trigamma.hpp>
 #include <stan/math/fwd/fun/trunc.hpp>
 #include <stan/math/fwd/fun/typedefs.hpp>
-#include <stan/math/fwd/fun/unit_vector_constrain.hpp>
+#include <stan/math/fwd/constraint/unit_vector_constrain.hpp>
 #include <stan/math/fwd/fun/value_of.hpp>
 #include <stan/math/fwd/fun/value_of_rec.hpp>