uses Eigen's internal multiindexing for multi indexing on vectors and matrices #3250

SteveBronder · 2024-01-08T16:59:38Z

Submission Checklist

Run unit tests: ./runTests.py src/test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Summary

Uses Eigen 3.4's new non-contiguous indexing expressions to have the rvalue() functions that use multi_index return an expression of multi indexing instead of eagerly creating a new matrix from the indices.

In the Stan language doing a loop like the code in (a) is faster than doing a multi index like in (b) because we have to make a giant matrix in (b) for the multi index.

(a)

for (i in 1:N)
  mu[i] = beta * X[idx[i]];

(b)

mu = beta * x[idx]

Eigen 3.4 introduced an expression for non-contiguous indexing which we can utilize to not have a big matrix creation in (b).

Intended Effect

Make vectorized multi indexing as fast as the loop

How to Verify

Tests pass for current rvalue tests with no changes

Side Effects

Nope

Documentation

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Simons Foundation

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

… matrices

WardBrian

A few questions - might be easier to explain next week over a board.

Have you fed this into your benchmarking framework to make sure it has the desired effect?

src/stan/model/indexing/rvalue.hpp

src/test/unit/model/indexing/rvalue_test.cpp

…ulti-index

WardBrian · 2024-01-30T15:12:59Z

@SteveBronder did you get a chance to try this in your stan-benchmark playground?

SteveBronder · 2024-01-31T16:21:14Z

I'm waiting till I have the math pr for this up as well so that I can benchmark this and the new matrix type all at once

…ulti-index

stan-buildbot · 2024-04-22T19:25:35Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
arma/arma.stan	0.2	0.19	1.06	5.87% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.1	8.95% faster
gp_regr/gen_gp_data.stan	0.02	0.02	1.07	6.34% faster
gp_regr/gp_regr.stan	0.11	0.1	1.06	5.62% faster
sir/sir.stan	77.44	75.91	1.02	1.98% faster
irt_2pl/irt_2pl.stan	3.82	3.71	1.03	2.93% faster
eight_schools/eight_schools.stan	0.05	0.05	1.06	5.86% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.26	0.24	1.08	7.3% faster
pkpd/one_comp_mm_elim_abs.stan	18.31	17.54	1.04	4.21% faster
garch/garch.stan	0.46	0.44	1.04	3.4% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.76	2.74	1.01	0.7% faster
arK/arK.stan	1.62	1.59	1.02	2.0% faster
gp_pois_regr/gp_pois_regr.stan	2.53	2.41	1.05	4.99% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	9.13	8.88	1.03	2.72% faster
performance.compilation	174.23	178.9	0.97	-2.68% slower
Mean result: 1.0426783571790479

Jenkins Console Log
Blue Ocean
Commit hash: 934e17704b703a32c4c29f9ab5ae1913c4a58a57

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 3308.200
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

WardBrian

@SteveBronder and I discussed some benchmarking he did (which hopefully he can post a nice graph of) which showed that this change did not immediately lead to the improvements we thought it might - that will require some specializations in the arena matrix representation.

But, it also didn't make things any worse, so I'm approving this on its own now.

SteveBronder · 2024-04-24T16:33:14Z

Running some benchmarks idt we should merge this PR yet. The benchmark can be found here. I took the code that brms generates and compared the new and current multi indexing to a loop as well as to a raw call from Eigen that does the same thing.

The plot has facets by the number of group (M) and plots vectors of size N against the average time for each combination of M and N. The indices are set up so that each inner index does the worst case scenario of random access to elements in the vectors.

Each line represents

Code generated from a for loop in Stan
Code generated from a multi index in Stan with the current multi index code
Code generated from a multi index in Stan with the new multi index code
Code generated from a multi index in Stan with the new multi index code, removing any checks
Using plain eigen to do the multi index

The results were very suprising to me

Looping is very consistent with linear time, but it looks like our multi indexing has a kink in it once we cross 50K observations or so irrespective of the number of groups. The main reason the new and current multi indexing is slower is because of the checks, which you can see from the nocheck time

The loop does not have to do any checks. For multi index we have to check that the index is valid within the range of the size of the vector. For the current multi index we do that check as we are creating the index

        return plain_type_t<EigVec>::NullaryExpr(
            idx.ns_.size(), [name, &idx, &v_ref](Eigen::Index i) {
              math::check_range("vector[multi] indexing", name, v_ref.size(),
                                idx.ns_[i]);
              return v_ref.coeff(idx.ns_[i] - 1);
            });

but in the new code we do that check outside of the loop

  for (auto idx_i : idx.ns_) {
    math::check_range("vector[multi] indexing", name, v_size, idx_i);
  }

I think iterating over the index for the check is causing the main slowdown. The kink in the graph after 50K most likely happens because the index and vectors no longer fit in the lower levels of cache. Having that check inside or outside of the index creation most likely explains why this PRs multi indexing is slower than the current multi indexing

The nocheck benchmark does a few things that I think we could do in Stan, from most to least performance gain:

Removes the check. This could be done with a flag defined at compile time. We could also do something smart in the compiler if we wanted to
- Since we know the size of the containers we will be indexing into we could actually have a check in the constructor that validates the size once. Then we know that the index will be correct for all the future operations.
Removes multi_index. The multi_index class is really just a holder for a vector with nothing else, but constructing that class requires a copy of the indexing vector. We would need to write rvalue() functions that accept vectors as the index and then the compiler would need to generate the right code.
Removes the deep_copy(mu) normallly generated by the code. We don't manipulate mu here so it is safe to remove. When and when not to deep copy is kind of tricky. We want the deep copy when
- The left hand side and right hand side use the same object (mu here for the +=)
- The right hand side accesses the data used in the lhs in a non-linearly increasing and/or non-sequential order
  - (ex mu[1:3] += mu[{1, 1, 1}] * b where the first indexed value changes after the first assignment)

I think 1:3 are very feasible and the speed gain seems very worth it

SteveBronder · 2024-04-24T16:34:34Z

But my end recommendation is we close this PR without merge. I can write up a seperate PR that does 1 and 2 from the list above. We can add those signatures and then add the necessary components from the compiler.

The good news is we have a reasonable path forward that should give a magnitude speedup to any brms model that does mixed effects!

WardBrian · 2024-04-24T16:37:14Z

I am still not sure if the benchmark you posted represents @bob-carpenter's original case he was concerned with, since it doesn't ever end up calling a lpdf. The concern as I understand it was when these matrices are constructed for use in a distribution, one way of writing it ends up leading to an un-vectorized lpdf, which goes against the other advice we give

SteveBronder · 2024-04-24T18:49:51Z

Doing the multi index inside an lpdf would give the same effects

SteveBronder · 2024-04-25T18:38:12Z

Actually @WardBrian I want to reopen this. Running the benchmarks on my mac, just turning on STAN_NO_RANGE_CHECKS with the new code gives us a very good speedup.

bob-carpenter · 2024-04-25T19:10:23Z

The plots make it look like the Eigen approach is the best. Am I missing something?

Also, what specifically are you evaluating? I see the label on one of the plots involves multi-indexing with an array and then elementwise multiplication? Usually we just do the indexing for random effects, so I'm not sure where the elementwise multiply is coming from. Normally it'd just be foo[idxs] + bar[idxs2] + ... for varying effects.

I had thought part of the problem with multi-indexing was creating an intermediate, but it's now returning a template expression created by binding.

template <typename EigVec, require_eigen_vector_t<EigVec>* = nullptr>
inline auto rvalue(EigVec&& v, const char* name, const index_multi& idx) {
  return stan::math::make_holder(
      [name, &idx](auto& v_ref) {
        return plain_type_t<EigVec>::NullaryExpr(
            idx.ns_.size(), [name, &idx, &v_ref](Eigen::Index i) {
              math::check_range("vector[multi] indexing", name, v_ref.size(),
                                idx.ns_[i]);
              return v_ref.coeff(idx.ns_[i] - 1);
            });
      },
      stan::math::to_ref(v));
}

Instead, it looks like the issue is creating the nested NullaryExpr for every single argument. Won't that be at least as expensive as the range check?

With the Eigen approach, how do you avoid the - 1 for indexing?

SteveBronder · 2024-04-25T19:43:04Z

The plots make it look like the Eigen approach is the best. Am I missing something?

The code for all of the benchmarks are here. The Eigen benchmark is just doing

Eigen::Matrix<double, -1, 1> mu = r_1_1(J_1_map).array() * Z_1_1.array() +
     r_1_2(J_1_map).array() * Z_1_2.array();

Which is a baseline for "If we remove all of the stan overhead like checks, indexing from 1, expression safety, etc. what is the best possible performance we could get?"

Also, what specifically are you evaluating?

I have all the benchmark code in the link above and the comment above (comment here) gives all the info for each benchmark. But tldr each line is for

(loop) Code generated from a for loop in Stan
(orig) Code generated from a multi index in Stan with the current multi index code
(new) Code generated from a multi index in Stan with the new multi index code
(newnocheck) Code generated from a multi index in Stan with the new multi index code, removing any checks
(eigen) Using plain eigen to do the multi index

I see the label on one of the plots involves multi-indexing with an array and then elementwise multiplication? Usually we just do the indexing for random effects, so I'm not sure where the elementwise multiply is coming from. Normally it'd just be foo[idxs] + bar[idxs2] + ... for varying effects.

The elemenwise multiplication and multi index is what brms generates for models that do random effects like y ~ (effect | group)

Instead, it looks like the issue is creating the nested NullaryExpr for every single argument. Won't that be at least as expensive as the range check?

The Nullary expression is just done once and is then returned so it should not be expensive. After all these benchmarks I'm really convinced the slowdown is from the range check. In the graph above removing the range check gives about a magnitude of performance speedup. You can see in the graph in the comment above all of the stan versions that do the range check sit near one another, but removing that range check with STAN_NO_RANGE_CHECK we get much closer to the baseline of a plain Eigen call

With the Eigen approach, how do you avoid the - 1 for indexing?

It just uses a zero based index instead of Stan's 1 based index since it's a baseline of how fast this operation is minus any Stan changes.

bob-carpenter · 2024-04-25T21:13:02Z

Thanks for all the clarification. That makes a lot more sense now.

The elemenwise multiplication and multi index is what brms generates for models that do random effects like y ~ (effect | group)

Aha! A varying slope model. I just couldn't think of where we'd want to do a multi-index and then elementwise multiply. In any case, it sounds like the Eigen baseline isn't something that'd be workable in Stan due to the indexing-from-1 problem.

Sorry, but no insight about how to fix it easily. It does seem like the current Stan approach is allocating a new class for each index. Isn't that causing a lot of overhead or is the overhead all compiled away somehow? I would've thought you could get away with just one level of indirection here.

stan-buildbot · 2024-04-25T21:15:59Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
arma/arma.stan	0.2	0.18	1.08	7.32% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.01	0.01	1.09	8.42% faster
gp_regr/gen_gp_data.stan	0.02	0.02	1.06	5.29% faster
gp_regr/gp_regr.stan	0.11	0.11	0.97	-2.6% slower
sir/sir.stan	80.44	75.15	1.07	6.58% faster
irt_2pl/irt_2pl.stan	4.15	3.95	1.05	4.69% faster
eight_schools/eight_schools.stan	0.05	0.05	1.05	4.53% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.26	0.25	1.02	1.54% faster
pkpd/one_comp_mm_elim_abs.stan	18.57	18.35	1.01	1.23% faster
garch/garch.stan	0.48	0.47	1.03	3.2% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.86	2.85	1.0	0.34% faster
arK/arK.stan	1.67	1.66	1.01	0.6% faster
gp_pois_regr/gp_pois_regr.stan	2.59	2.6	1.0	-0.29% slower
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	9.39	9.44	0.99	-0.53% slower
performance.compilation	181.91	186.82	0.97	-2.7% slower
Mean result: 1.026985752701417

Jenkins Console Log
Blue Ocean
Commit hash: 13017193216fe150e9a8fadbc7b3f9cb71f8a58d

Machine information

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 1.3 MiB
L1i cache: 1.3 MiB
L2 cache: 40 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities

G++:
g++ (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Clang:
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

SteveBronder · 2024-04-26T00:53:41Z

Np!

In any case, it sounds like the Eigen baseline isn't something that'd be workable in Stan due to the indexing-from-1 problem.

Totally agree. I just have it there as a sort of "if we could do the most performance thing what would that look". It's not a feasible answer but moreso for comparison

It does seem like the current Stan approach is allocating a new class for each index. Isn't that causing a lot of overhead or is the overhead all compiled away somehow?

It does make a new class but that class is an expression so there won't be any actual allocation from value. There's a small amount of overhead but most of it is compiled away

I think we should merge this PR and if people want speed they can define the macro to remove the range checks. Idt there's any other way for us to make multi indexing fast

SteveBronder and others added 8 commits January 8, 2024 11:45

uses Eigen's internal multiindexing for multi indexing on vectors and…

e46a24b

… matrices

fix docs

fc3d4ff

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

cb7d951

fix template error

0218f2e

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

0173de3

fix requires for rvalue

6dc3f4c

Make Eigen::Map for indexing have an inner const type

b20db94

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

a84f6a9

SteveBronder requested a review from WardBrian January 12, 2024 19:20

WardBrian reviewed Jan 12, 2024

View reviewed changes

src/stan/model/indexing/rvalue.hpp Show resolved Hide resolved

src/stan/model/indexing/rvalue.hpp Show resolved Hide resolved

src/stan/model/indexing/rvalue.hpp Show resolved Hide resolved

src/test/unit/model/indexing/rvalue_test.cpp Outdated Show resolved Hide resolved

SteveBronder added 2 commits January 18, 2024 12:36

Merge remote-tracking branch 'origin/develop' into feature/eigen-34-m…

17b9932

…ulti-index

change back name of rvalue test function

cc1cf33

Merge remote-tracking branch 'origin/develop' into feature/eigen-34-m…

8c3451e

…ulti-index

WardBrian approved these changes Apr 23, 2024

View reviewed changes

SteveBronder closed this Apr 24, 2024

SteveBronder reopened this Apr 25, 2024

SteveBronder merged commit 4a85b73 into develop Jun 18, 2024
5 checks passed

WardBrian deleted the feature/eigen-34-multi-index branch November 14, 2024 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uses Eigen's internal multiindexing for multi indexing on vectors and matrices #3250

uses Eigen's internal multiindexing for multi indexing on vectors and matrices #3250

SteveBronder commented Jan 8, 2024

WardBrian left a comment

WardBrian commented Jan 30, 2024

SteveBronder commented Jan 31, 2024

stan-buildbot commented Apr 22, 2024

WardBrian left a comment •

edited

Loading

SteveBronder commented Apr 24, 2024 •

edited

Loading

SteveBronder commented Apr 24, 2024 •

edited

Loading

WardBrian commented Apr 24, 2024

SteveBronder commented Apr 24, 2024

SteveBronder commented Apr 25, 2024

bob-carpenter commented Apr 25, 2024

SteveBronder commented Apr 25, 2024 •

edited

Loading

bob-carpenter commented Apr 25, 2024

stan-buildbot commented Apr 25, 2024

SteveBronder commented Apr 26, 2024

uses Eigen's internal multiindexing for multi indexing on vectors and matrices #3250

uses Eigen's internal multiindexing for multi indexing on vectors and matrices #3250

Conversation

SteveBronder commented Jan 8, 2024

Submission Checklist

Summary

Intended Effect

How to Verify

Side Effects

Documentation

Copyright and Licensing

WardBrian left a comment

Choose a reason for hiding this comment

WardBrian commented Jan 30, 2024

SteveBronder commented Jan 31, 2024

stan-buildbot commented Apr 22, 2024

WardBrian left a comment • edited Loading

Choose a reason for hiding this comment

SteveBronder commented Apr 24, 2024 • edited Loading

SteveBronder commented Apr 24, 2024 • edited Loading

WardBrian commented Apr 24, 2024

SteveBronder commented Apr 24, 2024

SteveBronder commented Apr 25, 2024

bob-carpenter commented Apr 25, 2024

SteveBronder commented Apr 25, 2024 • edited Loading

bob-carpenter commented Apr 25, 2024

stan-buildbot commented Apr 25, 2024

SteveBronder commented Apr 26, 2024

WardBrian left a comment •

edited

Loading

SteveBronder commented Apr 24, 2024 •

edited

Loading

SteveBronder commented Apr 24, 2024 •

edited

Loading

SteveBronder commented Apr 25, 2024 •

edited

Loading