Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: make kernel generator wait for events on matrices #1796

Merged
merged 15 commits into from
Mar 27, 2020

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Mar 24, 2020

Summary

Fixes kernel generator so it only executes kernels after appropriate events on matrices used by a kernel are complete.

Tests

Added tests that check events are waited on.

Side Effects

None.

Checklist

  • Math issue Implement OpenCL kernel generator #1342

  • Copyright holder: Tadej Ciglarič

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.85 4.92 0.99 -1.52% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.45% slower
eight_schools/eight_schools.stan 0.09 0.09 0.98 -1.75% slower
gp_regr/gp_regr.stan 0.22 0.22 1.01 0.86% faster
irt_2pl/irt_2pl.stan 6.49 6.43 1.01 0.86% faster
performance.compilation 87.79 86.61 1.01 1.34% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.56 7.52 1.01 0.56% faster
pkpd/one_comp_mm_elim_abs.stan 20.91 20.3 1.03 2.93% faster
sir/sir.stan 90.66 91.49 0.99 -0.92% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.99 -0.99% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 0.14% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.33 0.94 -6.85% slower
arK/arK.stan 1.73 1.75 0.99 -0.64% slower
arma/arma.stan 0.66 0.66 1.0 -0.04% slower
garch/garch.stan 0.51 0.51 1.0 -0.12% slower
Mean result: 0.99473878573

Jenkins Console Log
Blue Ocean
Commit hash: 1e37a41


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.83 4.84 1.0 -0.31% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.9% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 -0.02% slower
gp_regr/gp_regr.stan 0.22 0.22 1.01 1.19% faster
irt_2pl/irt_2pl.stan 6.43 6.44 1.0 -0.1% slower
performance.compilation 87.65 86.64 1.01 1.15% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.57 7.58 1.0 -0.04% slower
pkpd/one_comp_mm_elim_abs.stan 21.5 21.44 1.0 0.31% faster
sir/sir.stan 94.82 95.65 0.99 -0.87% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.01 1.17% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.96 1.0 -0.09% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.31 1.03 3.3% faster
arK/arK.stan 1.73 1.74 1.0 -0.41% slower
arma/arma.stan 0.66 0.66 1.0 0.01% faster
garch/garch.stan 0.51 0.52 0.99 -1.42% slower
Mean result: 1.00210829808

Jenkins Console Log
Blue Ocean
Commit hash: 1e37a41


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.89 4.91 1.0 -0.46% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.52% slower
eight_schools/eight_schools.stan 0.09 0.09 1.04 3.66% faster
gp_regr/gp_regr.stan 0.22 0.22 0.99 -1.31% slower
irt_2pl/irt_2pl.stan 6.56 6.45 1.02 1.64% faster
performance.compilation 87.69 86.56 1.01 1.3% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.53 7.52 1.0 0.16% faster
pkpd/one_comp_mm_elim_abs.stan 21.07 20.73 1.02 1.66% faster
sir/sir.stan 93.79 93.68 1.0 0.12% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.01 1.37% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 0.01% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.34 0.31 1.09 8.03% faster
arK/arK.stan 1.91 1.74 1.1 9.18% faster
arma/arma.stan 0.66 0.67 0.99 -0.6% slower
garch/garch.stan 0.51 0.51 0.99 -1.48% slower
Mean result: 1.01391105204

Jenkins Console Log
Blue Ocean
Commit hash: 1e37a41


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.87 4.9 0.99 -0.69% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.79% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 0.44% faster
gp_regr/gp_regr.stan 0.22 0.22 1.0 0.13% faster
irt_2pl/irt_2pl.stan 6.47 6.48 1.0 -0.13% slower
performance.compilation 87.77 86.67 1.01 1.25% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.51 7.52 1.0 -0.18% slower
pkpd/one_comp_mm_elim_abs.stan 20.38 20.71 0.98 -1.59% slower
sir/sir.stan 93.94 90.82 1.03 3.33% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 0.01% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 0.09% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.33 0.93 -6.97% slower
arK/arK.stan 1.91 1.74 1.1 8.88% faster
arma/arma.stan 0.67 0.65 1.02 1.63% faster
garch/garch.stan 0.52 0.51 1.01 0.75% faster
Mean result: 1.00447902855

Jenkins Console Log
Blue Ocean
Commit hash: 1169069


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.83 4.88 0.99 -1.03% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.57% slower
eight_schools/eight_schools.stan 0.09 0.1 0.94 -6.54% slower
gp_regr/gp_regr.stan 0.22 0.22 1.0 0.3% faster
irt_2pl/irt_2pl.stan 6.48 6.53 0.99 -0.76% slower
performance.compilation 87.76 86.66 1.01 1.26% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.54 7.53 1.0 0.13% faster
pkpd/one_comp_mm_elim_abs.stan 21.07 21.03 1.0 0.22% faster
sir/sir.stan 95.59 94.61 1.01 1.03% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.01 0.88% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.95 1.0 0.11% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.32 0.98 -1.82% slower
arK/arK.stan 1.73 1.75 0.99 -0.93% slower
arma/arma.stan 0.65 0.69 0.95 -5.49% slower
garch/garch.stan 0.51 0.52 0.98 -1.55% slower
Mean result: 0.989394151046

Jenkins Console Log
Blue Ocean
Commit hash: 1169069


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator

@serban-nicusor-toptal Are all the windows boxes down or is there one still running so it's going slow? Looks like most tests are stopping while waiting for upstream windows boxes here

https://jenkins.mc-stan.org/blue/organizations/jenkins/CmdStan/detail/downstream_tests/1534/pipeline

@serban-nicusor-toptal
Copy link
Contributor

There are two in use, one is down and can't be handled on-site because of covid-19.
On-Demand machines have a bug, I'm currently working on a fix with @rok-cesnovar to enable them back and get rid of the pressure.

@SteveBronder
Copy link
Collaborator

Sounds good thanks for the update!

@rok-cesnovar
Copy link
Member

There was a huge backlog of Windows jobs. Its unfortunate that the Windows jobs take the longest and we also have the least workers (and obviousyl the least stable).

This job will finish in 20 minutes https://jenkins.mc-stan.org/blue/organizations/jenkins/Stan/detail/downstream_tests/1417/pipeline then this PR should be next.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny comment

inline void get_clear_read_write_events(
std::vector<cl::Event>& events) const {
index_apply<N>([&](auto... Is) {
(void)std::initializer_list<int>{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be static_cast<void>()? I'm surprised this is getting through the cpplint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Fixed.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I'm good with the rest of this

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.84 4.85 1.0 -0.19% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.47% faster
eight_schools/eight_schools.stan 0.09 0.09 1.06 5.64% faster
gp_regr/gp_regr.stan 0.22 0.22 0.98 -2.21% slower
irt_2pl/irt_2pl.stan 6.43 6.45 1.0 -0.39% slower
performance.compilation 88.94 86.85 1.02 2.36% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.63 7.57 1.01 0.83% faster
pkpd/one_comp_mm_elim_abs.stan 20.52 20.45 1.0 0.36% faster
sir/sir.stan 95.23 93.67 1.02 1.64% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 0.31% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 -0.04% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 1.0 0.31% faster
arK/arK.stan 1.74 1.9 0.91 -9.29% slower
arma/arma.stan 0.66 0.66 1.01 0.63% faster
garch/garch.stan 0.51 0.51 1.0 0.07% faster
Mean result: 1.0011866993

Jenkins Console Log
Blue Ocean
Commit hash: a622e48


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1 t4c1 merged commit b6134fb into stan-dev:develop Mar 27, 2020
@t4c1 t4c1 deleted the cl_kernel_generator_wait_events branch November 30, 2020 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants