Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and Benchmarks #104

Merged
merged 18 commits into from
Jan 19, 2022
Merged

Performance and Benchmarks #104

merged 18 commits into from
Jan 19, 2022

Conversation

KDr2
Copy link
Member

@KDr2 KDr2 commented Jan 7, 2022

No description provided.

@KDr2
Copy link
Member Author

KDr2 commented Jan 7, 2022

It's interesting, in the last commit I found step_in a tape is faster than running a model directly, maybe I missed something?

$ julia --project perf/p0.jl
  2.286 μs (38 allocations: 1.95 KiB)
  97.440 ns (1 allocation: 48 bytes)
  440.273 μs (48 allocations: 2.56 KiB)
  969.364 ns (4 allocations: 288 bytes)

@yebai
Copy link
Member

yebai commented Jan 7, 2022

It is possible that step_in the tape executes faster than the original function f(args...) since the tape specialises more (e.g. removing control flows, caching input/output arguments). Note that the total runtime (of Turing inference algorithm) also depends on the CTask constructor (see 1, 2):

julia> @btime t = Libtask.CTask(f, args...);
  258.923 ms (766345 allocations: 43.38 MiB)

julia> @btime Libtask.step_in(t.tf.tape, args)
  95.054 ns (1 allocation: 48 bytes)

julia> @btime f(args...)
  2.549 μs (38 allocations: 1.95 KiB)
(2.0, VarInfo (2 variables (μ, σ), dimension 2; logp: -1.2750123006e7))

So it appears that a lot of time is spent on repetitively constructing CTask. Maybe we can speed this up by resuing tapes?

@KDr2
Copy link
Member Author

KDr2 commented Jan 9, 2022

Without Cache:

$ julia --project perf/p0.jl
"Directly call..." = "Directly call..."
  2.233 μs (38 allocations: 1.95 KiB)
"CTask construction..." = "CTask construction..."
  410.719 ms (974878 allocations: 59.77 MiB)
"Step in a tape..." = "Step in a tape..."
  90.543 ns (1 allocation: 48 bytes)
"Directly call..." = "Directly call..."
  422.273 μs (48 allocations: 2.56 KiB)
"CTask construction..." = "CTask construction..."
  416.812 ms (974908 allocations: 59.77 MiB)
"Step in a tape..." = "Step in a tape..."
  923.559 ns (4 allocations: 288 bytes)

With IR and Tape Cache:

$ julia --project perf/p0.jl
"Directly call..." = "Directly call..."
  2.117 μs (38 allocations: 1.95 KiB)
"CTask construction..." = "CTask construction..."
  99.222 μs (489 allocations: 22.02 KiB)
"Step in a tape..." = "Step in a tape..."
  87.400 ns (1 allocation: 48 bytes)
"Directly call..." = "Directly call..."
  417.133 μs (48 allocations: 2.56 KiB)
"CTask construction..." = "CTask construction..."
  103.745 μs (495 allocations: 22.48 KiB)
"Step in a tape..." = "Step in a tape..."
  924.314 ns (4 allocations: 288 bytes)

@KDr2
Copy link
Member Author

KDr2 commented Jan 11, 2022

In spite of numeric test failures and a few errors, unit tests finished in about 2 hours on my machine:

real    130m46.448s
user    126m4.793s
sys     5m25.685s

src/tapedtask.jl Outdated Show resolved Hide resolved
src/tapedtask.jl Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
src/tapedtask.jl Outdated Show resolved Hide resolved
@KDr2 KDr2 marked this pull request as ready for review January 12, 2022 00:41
perf/p0.jl Show resolved Hide resolved
src/tapedtask.jl Outdated Show resolved Hide resolved
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@yebai yebai changed the title [WIP] Performance and Benchmarks Performance and Benchmarks Jan 12, 2022
perf/src/LibtaskPerf.jl Outdated Show resolved Hide resolved
Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the tests now run correctly - we can rerun the Turing CI once this PR is merged. Fingers crossed!

perf/p2.jl Show resolved Hide resolved
src/tapedfunction.jl Outdated Show resolved Hide resolved
src/tapedfunction.jl Outdated Show resolved Hide resolved
@KDr2
Copy link
Member Author

KDr2 commented Jan 19, 2022

This PR is ready to merge. @yebai

@KDr2 KDr2 requested a review from yebai January 19, 2022 00:47
@yebai yebai merged commit ccc293c into master Jan 19, 2022
@delete-merged-branch delete-merged-branch bot deleted the perf branch January 19, 2022 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants