-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking harness for Julia #9456
Comments
+1 for this |
fwiw, without a concrete action, I'm going to recommend that this issue be close as being non-implementable / not being an issue. most functionality (like this) is probably best developed in a package anyways |
I prefer this being in a package, like the one above, where it can be actively developed. |
Functionality such as I have a concrete suggestion that I was going to put either into a gist or into a package, but given that Here is an example of what I currently have in mind. This would benchmark integer addition.
The output would look like:
where "time" is the time to run each kernel, and "variation" is a measure of the timing inaccuracy. I'm aware of the usual pitfalls of benchmarking, i.e. ensuring that the benchmark is not optimized away, not being drowned by looping overhead or timing noise, etc. I'm also aware of (some of?) the issues regarding defining such a variation -- I have an idea for how I would do it, but the community may decide differently. |
How would this define CSV output? I like how Benchmarks.jl let me commit performance logs using dataframes: https://github.com/twadleigh/Meshes.jl/blob/38fa5fd48d083fc832c9c121fd70e419d9115330/perf/data/binary_stl_load_topology.csv. |
those aren't really the list of issues i was concerned about when I said "the pitfalls of benchmarking". much of that has already been covered on the julia mailing lists, however, so i won't repeat here. however, it's significant to note that while the example benchmark you gave above does provide a benchmark of many things (for example: function calls, compiler optimizations, language frameworks). the one thing it does not do is benchmark integer addition. there is way too much processor variability during execution of those statements to consider it a benchmark of "integer addition". i would instead argue that such a measure cannot actually exist in the context of a CPU. memory fetches, function overhead, pipelining, and context switches are just a few of the issues with the defining a measurement. and even if you have the best optimized code in the world, if you are using the wrong algorithm (or it doesn't pass a test suite), you lose (due to the tragedy of premature optimization). |
I feel benchmarks are only as valuable as the decisions/actions that people make based on them. We have a pretty extensive set of performance tests for Base already, but they don't get run or looked at very often lately, aside from the main numbers that are on the website. It would be incredibly valuable to bring back some continuous tracking like speed.julialang.org if it helps identify performance regressions or improvements more systematically. The new buildbot infrastructure has been working very well for the purposes of building binaries, but there might be too much variability due to the VM environment to get good performance data there. |
I think there's definitely room for some cleanup in the perf scripts/makefiles. Simply having a script that would take two git commit/tree-ishes, automatically build and run the code speed suite on each, and then output the differences, in order of most significant would be pretty straightforward and very useful. Heck, it could even plot them if a plotting package is installed. Seems like a good up-for-grabs/undergraduate project. |
@vtjnash Above, I was describing how the interface to a benchmarking harness should look like, not how it should be implemented. Of course one needs an optimizing compiler to avoid spurious overhead, needs to perform many operations to mitigate timing overhead, needs to ensure inlining to avoid function call overhead, etc. But that isn't the point here -- I chose "integer addition" only because that led to a small example. Replace it by "insert values into a dictionary", or "multiply two matrices", or "task creation overhead" if you like; those can definitely be benchmarked. I've seen some positive responses here, as well as pretty concrete suggestions about making benchmark results more easily available, e.g. "Seems like a good up-for-grabs/undergraduate project." Could you re-open this issue so that this doesn't get forgotten? |
i came across this paper on measurement bias, and thought the various individuals interested in benchmarking in this thread might find it a good read: also, one of the other publications by the same group offers some evidence that we should be doing statistical profiling with a non-uniform timeout: |
Nice set of links, @vtjnash! |
@vtjnash, yes, great link... I had to deal with that a lot on early Alpha chips (direct mapped cache instead of n-way associative cache... link order could make huge difference in performance... until we figured out what it was, we were going crazy trying to get consistent benchmark results... |
Julia as a language is not only about correctness, it is also about performance. Consequently, there should be a benchmarking harness in
Base
, equivalent to the testing harnessBase.Test
. This would allow several interesting things:@inbounds
,@fastmath
) is immediately clear@inbounds
inBase
) can be vetted; they would only be allowed if they actually show a performance benefitSince performance varies by architecture, it is probably necessary to set up a few dedicated testing machines where the benchmarks can be run regularly. These machines would need to keep a history of benchmark results for comparison. I've seen http://speed.julialang.org/ which looks interesting -- I wonder if it could be set up to look at potentially thousands of small benchmark results.
I imagine that many Julia packages will in the future contain
@bench
statements in addition to@test
statements.The text was updated successfully, but these errors were encountered: