-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: consider calling ReadMemStats less during benchmarking #20875
Comments
Sorry about that. I suspected it might be the case that ReadMemStats would still be noticeably expensive after the rollback, but I also suspected you would file this very bug. I would've been happy rolling forward a few weeks ago, but at this point we're trying to lock down for a release. |
How did you guess? :)
But of course; not a problem. Happy to fix in 1.10 instead. |
Interesting. Do you happen to know how many times ReadMemStats is called versus how much time each call takes? After I optimized it I wasn't able to get it to take more than a few microseconds even under stress testing. |
ReadMemStats gets called a lot--every iteration, via StartTimer/StopTimer, e.g. at https://github.com/golang/go/blob/master/src/sort/sort_test.go#L640. |
Oh, geez, okay. :) If there's a way to safely avoid ReadMemStats, that's fine, but I also don't think we should over-optimize for this one weirdness in this one benchmark |
I think there is, more or less described above. And for better or for worse, called StartTimer/StopTimer every iteration is not all that rare. :) Anyway, this is on my plate to try again for 1.10. |
How about an opt-in approach: a new |
I implemented the opt-in I suggested above and tried it out on the sort package tests. I found it made very little difference in the actual benchmark. This is on linux/amd64. The cpu profile of master on my system shows that ReadMemStats has negligible impact. This may just be an issue with the speed of ReadMemStats on some architectures. cc @josharian @aclements Benchmarks of sort when calling SkipMemStats compared to master:
|
@iand you should run each benchmark multiple times (see the |
@ALTree I did that too, but didn't post the results here since the profile showed that ReadMemStats was not affecting the run times on my system |
The approach I sketched (in the linked CLs) should not require any API. I still think it is worth exploring, and it remains in my queue. I originally found that:
@iand, you report:
That's an interesting mismatch. I will have to re-measure to see whether I can still reproduce. |
I agree with Josh that calling And this is a big problem when both of these costs are small. For example, right now I was benchmarking a function that took just under a microsecond, and the pprof cpu profile showed that ReadMemStats took over 60% of the CPU, while my function took less than 3%. Removing Start/StopTimer from my benchmark is a possibility, but then I'm not really benchmarking my function anymore. The numbers could get better or worse by only changing the setup/cleanup code. The cost of starting and stopping the timer can also throw off |
@mvdan what OS are you testing on? Are you able to repeat the results across Linux/Win/Mac? |
I am on |
It can completely mess up the benchmark numbers for init functions that were too small. Moreover, it could make the -benchtime estimates be way off. For example, 'benchinit cmd/go' was taking over a minute to run the benchmark, instead of the expected ~1s. The benchtime estimate being off is likely the upstream issue golang/go#27217. The fact that StartTimer and StopTimer are expensive is being tracked in golang/go#20875.
Probably irrelevant to most benchmarks, but: I suspect the impact may be very widely variable because ReadMemStats needs to stop the world. So single-threaded benchmarks might not be affected, but in parallel runs, you might see very weird impact, and a lot of that impact won't be directly attributable to ReadMemStats itself. |
Stopping the world is typically < 50us even in parallel runs, since it's also critical to GC latency. Of course, if a benchmark is usually measured in nanoseconds, this is still really slow by comparison. :) I wonder if testing's ReadMemStats is less of a problem on tip now that it no longer blocks during a GC. Also, it may be that @mknyszek's unstable runtime metrics API could help here. With that API, the testing package could ask for just the couple metrics it actually wants. That would be cheaper to collect and maybe we could even do just those without stopping the world. |
It should be a lot better, especially if the benchmark is allocating a lot, but as you say it's still going to be problematic for very small benchmarks. (See #19812 for details.)
Unfortunately the testing package measures the number of mallocs, which will require stopping the world for the foreseeable future. This is because we need to flush those stats out of each It's still probably worth reducing the number of calls if there's a way. (See #37112 for details about that API, though.) |
Change https://golang.org/cl/257647 mentions this issue: |
I recently came across this just by happenstance, and the Though, there's currently a monotonicity bug. The fix is up for 1.19. |
I just tried replacing I'm not sure if that's a deal-breaker. It might be for |
https://golang.org/cl/36791 reduced the number of times benchmarks call ReadMemStats. However, the implementation was incorrect (#20590, #20863), and it was rolled back (https://golang.org/cl/46612 and https://golang.org/cl/47350).
The rationale for the rollback is that ReadMemStats is now fast (https://golang.org/cl/34937). However, using tip as of June 30, 2017 (445652f), cpuprofiling of package sort's benchmarks still shows almost half of all execution time in ReadMemStats. See sort-cpu.pdf.
So for 1.10, either ReadMemStats should be made cheaper still, or we should re-roll https://golang.org/cl/36791 with a better implementation; see the initial patchsets of https://golang.org/cl/46612 for improvements.
cc @bradfitz @aclements @meirf @ALTree
The text was updated successfully, but these errors were encountered: