CI Performance Tracking for v0.5 #13893

jrevels · 2015-11-06T16:36:28Z

As progress moves forward on v0.5 development (especially JuliaLang/LinearAlgebra.jl#255), we'll need an automated system for executing benchmarks and identifying performance regressions.

The Julia group has recently purchased dedicated performance testing hardware which is hosted at CSAIL, and I've been brought on to facilitate the development of a system that takes advantage of this hardware. I'm hoping we can use this issue to centralize discussion/development efforts.

Desired features

Any implementation of a performance tracking system should:

track multiple metrics (e.g. time, % GC time, bytes allocated)
allow collaborators to trigger the following via GitHub's UI:
- benchmark execution on a specific commit
- benchmark result comparison between two specific commits
incorporate a tagging system so that groups of benchmarks can be selectively run by topic (e.g. run only parallel benchmarks, or only string benchmarks)
store benchmark result history for future analysis
report benchmark status/results through GitHub's UI
be applicable to normal Julia packages, not just Base (envision a future where PackageEvaluator benchmarks registered packages)
preferably be written in native Julia

Feel free to chime in with additional feature goals - the ones listed above just outline what I've been focusing on so far.

Existing work

In order to make progress on this issue, I've been working on JuliaCI/BenchmarkTrackers.jl, which supplies a unified framework for writing, executing, and tracking benchmarks. It's still very much in development, but currently supports all of the goals listed above. I encourage you to check it out and open up issues/PRs in that repository if you have ideas or concerns. Just don't expect the package to be stable yet - I'm in the process of making some drastic changes (mainly to improve the testability of the code).

Here are some other packages that any interested parties will want to be familiar with:

johnmyleswhite/Benchmarks.jl: a framework for benchmark execution, with the goal of delivering more rigorous/useful metrics than those provided by simple calls to @time.
JuliaWeb/GitHub.jl: will soon contain the GitHub event-handling infrastructure that I've been working on, which allows stuff like comment-based triggers and GitHub status reporting.
IainNZ/BasePerfTests.jl: a collection of performance tests (I'm not sure that this is still being developed).
staticfloat/Perftests.jl: another collection of performance tests + a grouping/logging framework for ease of execution. I believe @staticfloat has been running these on occasion via the buildbots.

Eventually we will want to consolidate "blessed" benchmarking/CI packages under an umbrella Julia group on Github (maybe JuliaCI)?

I saw that Codespeed was used for a while, but that effort was abandoned due to the burden of maintaining the server through volunteer effort. I've also been told that Codespeed didn't integrate well with the Github-centric CI workflow that we've become accustomed to.

Resolving this issue

Taking into account the capabilities previously mentioned, I imagine that a single CI benchmark cycle for Base would go through these steps:

Collaborator comments on a commit with the appropriate trigger phrase
A comment-listening server (the meat of which I've implemented here) triggers ~~the buildbots~~ a CI server to build Julia at that commit
~~The buildbots call out to~~ BenchmarkTrackers is used to process an external package of performance tests (similar to Perftests, but written using BenchmarkTrackers)
BenchmarkTrackers reports the results via GitHub statuses to the commit. Any additional reporting can be done with the @nanosoldier bot ("nanosolider" is the hostname of the hardware at CSAIL, and @jakebolewski managed to grab the GitHub account name).

If we can deliver the workflow described above, I think that will be sufficient to declare this issue "resolved."

Next steps

The following still needs to get done:

Configure a webhook on the JuliaLang/julia repo for firing off events so that we can begin testing the necessary CI infrastructure
Get BenchmarkTrackers.jl to a release-ready state, which means more robust testing and further development on both itself and its dependencies, GitHub.jl and Benchmarks.jl
Start organizing/rewriting existing performance tests using BenchmarkTrackers.jl
Set up the @nanosoldier to do any additional status reporting we want (e.g. linking to a commit status page after somebody triggers benchmark execution - GitHub doesn't do a good job of displaying statuses for commits outside of PRs).

Regression Examples

Here are some examples from regression-prone areas that I think could be more easily moderated with an automated performance tracking system:

parallel (performance regression with @parallel #12794, Regression in task switching performance? #12223)
inlining/type inference (Broken inlining #13551, Inlining arithmetic can produce slowdowns #13350, Performance/type inference regression from #12327 #12476)
"real-world" use cases (Performance regression after "Adapt AbstractArray subtypes for indexing fallbacks" #11823, 5 x decrease in performance with new Julia build #7000, Performance regression from commit a2acf88 #6415)
individual functions (Regression in methodshow/print_to_string #11700, performance regression in println #10650, mapreduce performance regression #8100, performance regressions since 0.2 #6112)
codegen/vectorization (SIMD regression tests #13777, Load from GC frame preventing vectorization #13301, SLP vectorization not working for tuples #11899, Inefficient codegen for unimported constructors from other modules #11997)

The text was updated successfully, but these errors were encountered:

musm · 2015-11-06T16:48:57Z

Will this include tracking performance on all three major operating systems (mac, linux distro,windows)? To identify possible regressions affecting only one system?

jakebolewski · 2015-11-06T17:03:55Z

No, performance tracking will be limited to running on Ubuntu Linux (similar to Travis). We don't have the manpower / resources to do cross platform performance testing at least initially.

Part of the goal is to make all this testing infrastructure lightweight / modular enough that it can be run on a user's computer with minimal effort (another reason to use Julia and not something like CodeSpeed). This way volunteer's (or organizations) could plug gaps in systems we don't support through automated CI while using the same benchmarking stack.

xianyi · 2015-11-06T17:09:34Z

+1 I think I can use this infrastructure to track OpenBLAS performance, too.

jrevels · 2015-11-06T17:15:34Z

Will this include tracking performance on all three major operating systems (mac, linux distro,windows)?

In the future, if we had something like a periodic (e.g. weekly) benchmark pass separate from the "on-demand" CI cycle being discussing in this issue, we might consider doing a full OS sweep on occasion (not per-commit, though).

But as @jakebolewski pointed out, we're not really concerned with cross-platform tracking at the moment, especially given our current resource limitations.

StefanKarpinski · 2015-11-06T17:41:33Z

Let's get it working on Linux. Running it regularly on other OSes can be a later goal.

tkelman · 2015-11-06T18:39:26Z

We can turn the webhook on now if you know how it'll need to be configured yet.

jrevels · 2015-11-06T19:26:58Z

I'm not sure yet what the payload URL is going to be, but I'll keep you posted with the details once we figure it out.

staticfloat · 2015-11-08T07:39:17Z

Hey guys, sorry to be MIA for the last week or two. @jrevels asked me in private a little while ago to write up my plan about performance testing that I am 30% through enacting, and so I am going to data-dump it here so that everyone can see it, critique it, and help shape it moving forward to make something equally usable by all.

I'm personally not so concerned with the testing methodology, statistical significance etc... of our benchmarking. We have much more qualified minds to duke that out, what I'm interested in is the infrastructure; how do we make this easy to setup, easy to maintain, and easy to use. Here's my wishlist/design doc for performance eval of Base; this is completely separate from package performance tracking which is of lower priority IMO.

Runs automatically on whitelisted branches, but can be manually run by contributors

Right now, 90% of what I do with Julia revolves around creative abuse of buildbot. I have a system setup where my Perftests.jl package gets run on every commit of master that makes it through the test suite. That package dumps results to .csv files, a sample of the last month of those runs is available as a tarball here, for the time being (Warning: ~700MB large because it contains the data from every single sample taken during a Benchmarks.jl run). It's not unthinkable that we could have a [perftest] tag in a git commit message (that was committed by a contributor) to trigger a perf test, or some other kind of github integration. (I don't know enough about Github integration to know what the best use case here is; if we want to run/rerun a perftest after the fact, can we make a button to do so? Is the way to do it to run it manually via buildbot, and then update a github status somewhere?)

Is independent of Base and runs on every version that we care about

Right now, that includes 0.4 and 0.5, but could possibly include 0.3. Obviously, there will be tests that we don't want to run on older versions, or even tests that we will drop as APIs change. But having our test infrastructure independent of Base (unlike the old test/perf/ directory) will make it a lot cleaner to maintain, and especially easier to fix problems in our test infrastructure without messing around with Base.

This is why I made the Perftests.jl package, but real life caught up to me faster than I thought, and so that repository, while functional, is missing some Sparse tests that the old test/perf/ directory had. Other than that, it's what I would consider "functional".

Is stored in a location easy for anyone and anything to access

I like making pretty visualizations. But I am nothing compared to what the rest of the Julia community is capable of, and I'd really like to make getting at and visualizing our data as easy as possible. To me, that means storing the data in something robust, public-facing, and easily queried. For our use cases, I think InfluxDB is a reasonably good choice, as I don't think reinventing the database wheel is a good use of our time, and it provides nice, standard ways of getting at the data.

In my nanosoldier/Perftest.jl world, my next step would be to write a set of Julia scripts that parse the .csv files generated by Perftests.jl, distill them down into the metrics that we would want to keep around forever (which is a much smaller set of data than what is generated in the .csv files right now) and upload them to an InfluxDB instance. I've got the uploading part done, I just haven't made a way to parse the .csv files and get the relevant bits out.

That server, being designed for timeseries and publicly available, would likely function better than anything we would cobble together ourselves, and would open the path to writing our own visualization software (a la codespeed) to even using something that someone else has already written (Kibana, Grafana, etc...).

That's been my plan, and I'm partway toward it, but there are some holes, and I'm not married to my ideas, so if others have alternative plans I'd be happy to hear them and see how we can most efficiently move from where we are today, to where we want to be. I am under no illusions that I will be able to put a significant amount of work toward any proposal, so it's best if the discussion that comes out of this behemoth of a github post is centered around what others want to do, rather than what I want to do. Either that, or we just have patience until I can get around to this.

tkelman · 2015-11-08T08:06:31Z

Sounds about right. I'd prefer a tagged comment listener hook (@nanosoldier perftest foo) rather than having to put specific things in the commit message.

hayd · 2015-11-08T08:16:05Z

Another option might be a "run perftests" github label. Edit: ah, but you wouldn't be able to specify foo.

jrevels · 2015-11-09T15:35:43Z

Thanks for the write-up, @staticfloat. I've definitely been keeping in mind the things we've discussed when working on BenchmarkTrackers. I'd love it if you could check out the package when you have the time.

It's not unthinkable that we could have a [perftest] tag in a git commit message (that was committed by a contributor) to trigger a perf test, or some other kind of github integration.

I've been advocating the trigger-via-comment strategy because I think it will encourage more explicitly targeted benchmark cycles that will make better use of our hardware compared to a trigger-via-push strategy. One is still be able to trigger per-commit runs by commenting on the commit with the appropriate trigger phrase, and that way you don't have to clutter up your commit messages with benchmark-related jargon.

In my nanosoldier/Perftest.jl world, my next step would be to write a set of Julia scripts that parse the .csv files generated by Perftests.jl, distill them down into the metrics that we would want to keep around forever (which is a much smaller set of data than what is generated in the .csv files right now) and upload them to an InfluxDB instance. I've got the uploading part done, I just haven't made a way to parse the .csv files and get the relevant bits out.

The logging component that BenchmarkTrackers uses for history management is designed to be swappable so that we can support third-party databases in the future. It currently only supports JSON and JLD serialization/deserialization, but there's nothing stopping us from extending that once we get the basic CI cycle going.

StefanKarpinski · 2015-11-09T19:19:05Z

I've been advocating the trigger-via-comment strategy because I think it will encourage more explicitly targeted benchmark cycles that will make better use of our hardware compared to a trigger-via-push strategy. One is still be able to trigger per-commit runs by commenting on the commit with the appropriate trigger phrase, and that way you don't have to clutter up your commit messages with benchmark-related jargon.

Yes, please. The whole push-triggered model is so broken. Just because I pushed something doesn't mean I want to test it or benchmark it. And if I do, posting a comment is not exactly hard. I do think that we should complement comment-triggered CI and benchmarking with periodic tests on master and each release branch.

jakebolewski · 2015-11-11T15:56:12Z

I added a benchmark tag so we can tag performance related PR's that need objective benchmarks.

IainNZ · 2015-11-11T17:23:52Z

@jrevels I don't have the bandwidth to meaningfully contribute to this, but my BasePerfTests.jl package serves a similar purpose to @staticfloat's, in that it was a thought experiment for disconnecting performance tests from the Julia version, and what a culture of adding a performance regression test for performance issues would look like (analogous to adding regression tests for bugs)

jrevels · 2016-01-04T23:04:54Z

CI performance tracking is now enabled! There are still some rough edges to work out, and features that could be added, but I've been testing the system on my Julia fork for a couple of weeks now and it's been stable. Here's some info on how to use this new system.

The Benchmark Suite

The CI benchmark suite is located in the BaseBenchmarks.jl package. These benchmarks are written and tagged using BenchmarkTrackers.jl. Currently, I've only populated the suite with a couple of the array indexing benchmarks that are already in Base. Those benchmarks were enough to test the CI server, but we'll obviously want more variety as soon as possible. The suite now contains all the benchmarks currently in test/perf. Filling out the suite will be my priority for the next few weeks. As always, PRs are welcome!

Triggering Jobs

Benchmark jobs are submitted to MIT's hardware by commenting in pull requests or on commits. Only repository collaborators can submit jobs. To submit a job, post a comment containing the trigger phrase runbenchmarks(tag_predicate, vs = ref). The proper syntax for this trigger phrase can be demonstrated by a few examples (each quoted block is an example of a comment):

I want to run benchmarks tagged "array" on the current commit.

`runbenchmarks("array")`

If this comment is on a specific commit, benchmarks will run on that commit. If
it's in a PR, they will run on the head/merge commit. If it's on a diff, they will
run on the commit associated with the diff.

I want to run benchmarks tagged "array" on the current commit, and compare the
results with those of commit 858dee2b09d6a01cb5a2e4fb2444dd6bed469b7f.

`runbenchmarks("array", vs = "858dee2b09d6a01cb5a2e4fb2444dd6bed469b7f")`

I want to run benchmarks tagged "array", but not "simd" or "linalg", on the
current commit. I want to compare the results against those of the release-0.4
branch.

`runbenchmarks("array" && !("simd" || "linalg"), vs = "JuliaLang/julia:release-0.4")`

I could've compared against a fork by specifying a different repository (e.g.
replace "JuliaLang/julia:release-0.4" with "jrevels/julia:release-0.4").

I want to run benchmarks tagged "array", but not "simd" or "linalg", on the
current commit. I want to compare the results against a fork's commit.

`runbenchmarks("array" && !("simd" || "linalg"), vs = "jrevels/julia@c70ab26bb677c92f0d8e0ae41c3035217a4b111f")`

The allowable syntax for the tag predicate matches the syntax accepted by BenchmarkTrackers.@tagged.

Examining Results

The CI server communicates back to the GitHub UI by posting statuses to the commit on which a job was triggered (similarly to Travis). Here are the states a commit status might take:

Pending: The job is queued up. Another pending status is posted once the job begins running.
Failure: The job completed, and performance regressions were found.
Success: The job completed, and no performance regressions were found.
Error: The job could not be completed. User-facing reasons for this include build errors and malformed trigger phrase syntax.

Failure and success statuses will include a link back to a report stored in the BaseBenchmarkReports repository. The reports are formatted in markdown and look like this. That's from a job I ran on my fork, which compared the master branch against the release-0.4 branch (I haven't trawled through the regressions caught there yet).

Note that GitHub doesn't do a very good job of displaying commit statuses outside of PRs. If you want to check the statuses of a commit directly, I usually use GitHub.jl's statuses method, or you could go to the commit's status page via your browser at api.github.com/repos/JuliaLang/julia/statuses/:sha.

Rough Edges/Usage Tips

Only one job submission can be made per comment (e.g. only the first use of runbenchmarks(...) is picked up).
If your trigger phrase syntax is malformed, you may not get a status reply at all, since the CI server won't pick up the comment.
I'm still working on measuring and reducing the noise/variance on our hardware. The regression determination method might be improved (or at least more finely tuned) once we have more data.
Storing/reporting results in a GitHub repository is fine initially, but eventually we'll want a proper database for long-term storage (and probably a better front end).
Obviously, you shouldn't trigger jobs on commits that don't build successfully on Travis.
Jobs triggered on commits/branches that introduce changes which break BaseBenchmarks.jl will necessarily result in error (I'm going to try to keep that package as forward-compatible as possible).

Finally, I'd like for anybody who triggers a build in the next couple of days to CC me when you do so, just so that I can keep track of how everything is going server-side and handle any bugs that may arise.

P.S. @shashi @ViralBShah @amitmurthy and anybody else who uses Nanosoldier: nanosoldier5, nanosoldier6, nanosoldier7, and nanosoldier8 should now be reserved exclusively for CI performance testing.

tkelman · 2016-01-04T23:52:14Z

Sounds great. How easy will it be to associate reports in https://github.com/JuliaCI/BaseBenchmarkReports to the commit/pr they came from? Posting a nanosoldier response comment (maybe one per thread with edits for adding future runs?) might be easier to access than statuses, though noisier.

jrevels · 2016-01-05T00:06:01Z

The reports link back to the triggering comment for the associated job, and also provide links to the relevant commits for the job.

Going the other way, clicking on a status's "Details" link takes you to the report page (just like clicking on the "Details" link for a Travis status takes you to a Travis CI page). That only works in PRs, though.

I'm onboard for getting @nanosoldier to post automated replies on commit comments (that's the last checkbox in this issue's description). I'm going to be messing around with that in the near future.

timholy · 2016-01-05T00:14:50Z

Exciting stuff, @jrevels!

staticfloat · 2016-01-06T21:02:32Z

This is really cool @jrevels, so glad you've taken this up.

jrevels · 2016-01-10T09:19:25Z

I just updated the CI tracking service to incorporate some recent changes regarding report readability and regression detection. I've started tagging BenchmarkTrackers.jl such that the latest tagged version corresponds to the currently deployed version.

The update also incorporates the recent LAPACK and BLAS additions to BaseBenchmarks.jl (they're basically the same as the corresponding benchmarks in Base). Similar to BenchmarkTrackers.jl, I've started tagging the repo so that it's easy to see what versions are currently deployed.

I depluralized the existing tags (e.g. "arrays" --> "array"), as I'm going to try to consistently make them singular in the future. Additionally, one can now use the keyword ALL to run all benchmarks (e.g. runbenchmarks(ALL, vs = "JuliaLang/julia:master")).

jrevels · 2016-01-10T20:04:56Z

Responding here to discussion in #14623.

I think I would rather see packages add a benchmarks/ directory and have the benchmark infrastructure be able to pull those all in.

This is exactly the intent of BenchmarkTrackers. Recently, my main focus has been setting up infrastructure for Base, but the end-goal is to have package benchmarks be runnable as part of PackageEvaluator. If they want to get a head start on things, package authors can begin using BenchmarkTrackers to write benchmarks for their own package.

Yeah, depends on how much control you want over the benchmarks. If we go that route it feels that it should be included in Base somehow á la Pkg.benchmark().

After we use the existing infrastructure for a while, we could consider folding some unified version of the benchmarking stack (BaseBenchmarks.jl + Benchmarks.jl + BenchmarkTrackers.jl) into Base, and have a Base.Benchmarks module that formalizes this stuff across the board for the language. The two reasons why that would be useful:

Benchmark breakage as a result of language changes could be resolved within the PRs causing the breakage. Probably the biggest flaw with the current system is that CI tracking won't work on a PR that breaks BaseBenchmarks.jl.
Automatically creating a benchmarks directory in new packages that could be run with Pkg.benchmark() might help establish performance tracking as a norm for Julia packages.

tkelman · 2016-01-10T20:16:18Z

That's a lot of code to bring into base, and the wrong direction w.r.t. #5155. I think we can add more automation and levels of testing that aren't exactly Base or its own tests or CI, but would run and flag breakages frequently.

hayd · 2016-01-10T20:19:53Z

So it would be Benchmarks.pkg("Foo") would look for benchmarks directory in package and run etc. ?

tkelman · 2016-01-29T20:33:59Z

PackageEvaluator now lives at https://github.com/JuliaCI/PackageEvaluator.jl, a little bit of refactoring would be needed there to accept commits to test against programmatically. I've been patching that manually for my own runs but shouldn't be too bad to make more flexible.

jrevels · 2016-01-29T21:42:48Z

It wouldn't be out of the question to separate the infrastructure from the benchmark-specific stuff in BenchmarkTrackers.jl, and put it in a "Nanosoldier.jl" package that could be used to handle multiple kinds of requests delivered via comment. That way all job submissions to @nanosoldier could easily share the same scheduler/job queue.

tkelman · 2016-01-29T21:45:37Z

That might mean less new semi-duplicated code to write. You apparently already had BenchmarkTrackers set up to be able to use the same nanosoldier node I've been using manually, right?

jrevels · 2016-01-29T21:48:08Z

For testing new versions of the package, yeah. The master/slave nodes it uses are easily configurable.

jrevels · 2016-01-29T21:52:55Z

I'll be doing a ForwardDiff.jl sprint next week, but after that I'd be down to work on this - I pretty much know how to do it on the CI side of things. The challenging part might be learning how PackageEvaluator works under the hood, but that doesn't seem like it will be overly difficult.

tkelman · 2016-01-29T21:54:24Z

I'll help on that side since I'll want to use this right away.

IainNZ · 2016-01-29T21:57:10Z

I'll also help in terms of providing advice drawn on any lessons I've learnt

hayd · 2016-02-07T20:05:32Z

@jrevels Does runbenchmarks against a branch (e.g. master) run against the merge-base (e.g. of master and the current commit) or the tip of master?

jrevels · 2016-02-07T20:12:49Z

If the job is triggered in a PR, benchmarks will run on that PR's merge commit (i.e. the result of the head commit of the PR merged into the PR's base). If there's a merge conflict, and the merge commit doesn't exist, then the head commit of the PR is used instead.

Comparison builds (specified by the vs keyword argument) are always exactly what you specify; either the commit of the given SHA, or the head commit of the given branch.

tkelman · 2016-03-19T00:47:30Z

We really need to be running this against master on a regular schedule and saving the results somewhere visible. Only getting a comparison when you specifically request one is not a very reliable way to track regressions.

jrevels · 2016-03-21T19:44:08Z

It's definitely been the plan for the on-demand benchmarking service to be supplemented with data taken at regularly scheduled intervals.

Armed with the data we have from running this system for a while, I've been busy rewriting the execution process to deliver more reliable results, and that work is close to completion (I'm at the fine-tuning and doc-writing phase of development).

After switching over to this new backend, the next step in the benchmarking saga will be to end our hacky usage of GitHub as the public interface to the data and set up an actual database instance, as @staticfloat originally suggested. We can then set up a cron job that benchmarks against master every other day or so and dumps the results to the database.

tkelman · 2016-04-29T22:55:51Z

Ref #16128, I'm reopening until this runs on an automated schedule.

jrevels · 2016-05-05T14:28:39Z

An update: @nanosoldier will be down for a day or two while I reconfigure our cluster hardware.

When it comes back up, the CI benchmarking service will utilize the new BenchmarkTools.jl + Nanosoldier.jl stack I've been working on for the past couple of months. The BenchmarkTools package is a replacement for the Benchmarks + BenchmarkTrackers stack, while the Nanosoldier package provides an abstract job submission framework that we can use to add features to our CI bot (e.g. we can build the "run pkgeval by commenting" feature on top of this).

A practical note for collaborators: Moving forward, you'll have to explicitly at-mention @nanosoldier before the trigger phrase when submitting a job. For example, instead of your comment containing this:

`runbenchmarks(tag_predicate, vs = "ref")`

...you'll need this:

@nanosoldier `runbenchmarks(tag_predicate, vs = "ref")`

More @nanosoldier documentation can be found in the Nanosoldier.jl repo.

tkelman · 2016-06-02T16:38:38Z

What needs to be done to get this running nightly and putting up a report somewhere people can see it?

jrevels · 2016-06-02T18:24:15Z

The easiest thing to do would just be to set up a cron job that causes @nanosoldier to submit CI jobs to itself on a daily basis. My work during the week has to be devoted to paper-writing at the moment, but I can try to set something up this weekend.

jrevels · 2016-06-10T04:13:47Z

Starting today, @nanosoldier will automatically execute benchmarks on master on a daily basis. The generated report compares the current day's results with the previous day's results. All the raw data (formatted as JLD) is compressed and uploaded along with the report, so you can easily clone the report repository and use BenchmarkTools to compare any day's results with any other day's results.

jrevels · 2016-06-11T14:21:44Z

The first daily comparison against a previous day's build has executed sucessfully, so I'm going to consider this issue resolved.

There is definitely still work to be done here - switching over to a real database instead of abusing git, adding more benchmarks to BaseBenchmarks.jl, and making a site that visualizes the benchmark data in a more discoverable way are things that I'd love to see happen eventually. Any subsequent issues - errors, improvements, etc. - can simply be raised in the appropriate project repositories in JuliaCI. As before, we can still use this thread for PSAs to the wider community when user-facing changes are made.

KristofferC · 2016-06-11T18:17:14Z

Does it make sense to run against previous release as well. I'm thinking else, regressions could be introduced bit by bit were each part is small enough to dissappear in the noise.

tkelman · 2016-06-11T18:31:33Z

That does make sense to me, though if we're collecting absolute numbers and expect the hardware to remain consistent, probably no need to run the exact same benchmarks against the exact same release version of Julia every day? Maybe re-run release absolute numbers a little less often, once or a handful of times per week?

jrevels · 2016-06-11T18:51:03Z

Let's continue this discussion in JuliaCI/Nanosoldier.jl#5.

KristofferC mentioned this issue Nov 7, 2015

Try to activate BenchmarksTrackers.jl KristofferC/NearestNeighbors.jl#3

Closed

tkelman mentioned this issue Nov 11, 2015

Speed up BitArray packing #13946

Merged

jiahao added the potential benchmark Could make a good benchmark in BaseBenchmarks label Nov 11, 2015

This was referenced Dec 2, 2015

Speed and accuracy improvements to java benchmark #14229

Closed

Tag GitHub v2.0.0 JuliaLang/METADATA.jl#4142

Merged

Expose a basic API for accessing benchmark results johnmyleswhite/Benchmarks.jl#26

Closed

jrevels mentioned this issue Jan 10, 2016

Upgrade to LLVM 3.7.1 and switch over CI #14623

Merged

3 tasks

jrevels mentioned this issue Jan 29, 2016

Revive performance regression testing? #11709

Closed

tkelman reopened this Apr 29, 2016

jrevels mentioned this issue Jun 6, 2016

add support for daily benchmark jobs JuliaCI/Nanosoldier.jl#3

Merged

2 tasks

jrevels closed this as completed Jun 11, 2016

jrevels mentioned this issue Jun 11, 2016

Reports we should generate for periodic benchmark jobs JuliaCI/Nanosoldier.jl#5

Open

jrevels mentioned this issue Jul 7, 2016

Add a function to generate nice reports JuliaCI/BenchmarkTools.jl#1

Open

ahwillia mentioned this issue Jul 7, 2016

Seemlessly comparing benchmark results across versions JuliaCI/BenchmarkTools.jl#16

Closed

tkelman mentioned this issue Jul 18, 2017

Add fast path for merge with Dicts #22737

Merged

jrevels mentioned this issue Jul 24, 2017

stop abusing git for timing data storage JuliaCI/Nanosoldier.jl#36

Closed

KristofferC removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Oct 8, 2018

CI Performance Tracking for v0.5 #13893

CI Performance Tracking for v0.5 #13893

Comments

jrevels commented Nov 6, 2015

Desired features

Existing work

Resolving this issue

Next steps

Regression Examples

musm commented Nov 6, 2015

jakebolewski commented Nov 6, 2015

xianyi commented Nov 6, 2015

jrevels commented Nov 6, 2015

StefanKarpinski commented Nov 6, 2015

tkelman commented Nov 6, 2015

jrevels commented Nov 6, 2015

staticfloat commented Nov 8, 2015

tkelman commented Nov 8, 2015

hayd commented Nov 8, 2015

jrevels commented Nov 9, 2015

StefanKarpinski commented Nov 9, 2015

jakebolewski commented Nov 11, 2015

IainNZ commented Nov 11, 2015

jrevels commented Jan 4, 2016

The Benchmark Suite

Triggering Jobs

Examining Results

Rough Edges/Usage Tips

tkelman commented Jan 4, 2016

jrevels commented Jan 5, 2016

timholy commented Jan 5, 2016

staticfloat commented Jan 6, 2016

jrevels commented Jan 10, 2016

jrevels commented Jan 10, 2016

tkelman commented Jan 10, 2016

hayd commented Jan 10, 2016

tkelman commented Jan 29, 2016

jrevels commented Jan 29, 2016

tkelman commented Jan 29, 2016

jrevels commented Jan 29, 2016

jrevels commented Jan 29, 2016

tkelman commented Jan 29, 2016

IainNZ commented Jan 29, 2016

hayd commented Feb 7, 2016

jrevels commented Feb 7, 2016

tkelman commented Mar 19, 2016

jrevels commented Mar 21, 2016

tkelman commented Apr 29, 2016

jrevels commented May 5, 2016

tkelman commented Jun 2, 2016

jrevels commented Jun 2, 2016

jrevels commented Jun 10, 2016 • edited Loading

jrevels commented Jun 11, 2016

KristofferC commented Jun 11, 2016

tkelman commented Jun 11, 2016

jrevels commented Jun 11, 2016

jrevels commented Jun 10, 2016 •

edited

Loading