Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: benchmark is using low resolution time on Windows #31160

Closed
egonelbre opened this issue Mar 30, 2019 · 24 comments
Closed

testing: benchmark is using low resolution time on Windows #31160

egonelbre opened this issue Mar 30, 2019 · 24 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. Performance Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@egonelbre
Copy link
Contributor

egonelbre commented Mar 30, 2019

time.Now has a large resolution on Windows, so the benchmark results end up with a timing error.

As a demonstration here's code that measures QueryPerformanceCounter vs. time.Now: https://play.golang.org/p/FN4AtJ9b51m.

f:\Go\src\sandbox\benchmark>go version
go version devel +2bd767b102 Thu Mar 28 09:36:43 2019 +0000 windows/amd64

f:\Go\src\sandbox\benchmark>go run main.go
                             50%       90%       99%
time.Now    Z=9999935    1000000   1005100   1014600
QPC         Z=0              100       200       200

Which roughly says that my getting measurements at 1000000ns or 1ms granularity.

Fixing nanotime would be problematic (as seen in #8687), however benchmarks could use QueryPerformanceCounter instead.

@egonelbre egonelbre changed the title testing: benchmark is using time.Now on Windows for measuring time. testing: benchmark is using low resolution time on Windows for measuring time. Mar 30, 2019
@egonelbre
Copy link
Contributor Author

I could make an easy patch like I did in https://github.com/rakyll/hey/blob/master/requester/now_windows.go#L24. However since runtime already contains definitions for QPC was wondering whether it can be organized better.

@egonelbre
Copy link
Contributor Author

egonelbre commented Mar 30, 2019

As a more practical example:

func Benchmark(b *testing.B) {
	for i := 0; i < b.N; i++ {
		fmt.Sprintf("hello world")
	}
}

As we vary benchtime it shows quite a significant timing error:

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32                1000                 0.00 ns/op
PASS
ok      sandbox/benchmark       0.261s

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 100000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32              100000                80.4 ns/op
PASS
ok      sandbox/benchmark       0.285s

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32             1000000                67.0 ns/op
PASS
ok      sandbox/benchmark       0.342s

@egonelbre egonelbre changed the title testing: benchmark is using low resolution time on Windows for measuring time. testing: benchmark is using low resolution time on Windows Mar 30, 2019
@networkimprov
Copy link

Related: #29714

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

CC @mpvl @josharian

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Testing An issue that has been verified to require only test changes, not just a test failure. labels Apr 12, 2019
@bcmills bcmills added this to the Unplanned milestone Apr 12, 2019
@egonelbre
Copy link
Contributor Author

Ping.

I could make a PR and discuss a specific fix, if that's requires less effort from core maintainers.

@josharian
Copy link
Contributor

I don't have much to say about this. A higher resolution benchmarking timer certainly seems desirable. What are the downsides?

cc @alexbrainman for windows

@egonelbre
Copy link
Contributor Author

Some duplicated code for windows and handling for other platforms to be compatible with that.

Specifically the qpc part

// useQPCTime controls whether time.now and nanotime use QueryPerformanceCounter.
. Unless there's a nice way to share that code with testing package.

@josharian
Copy link
Contributor

We could use //go:linkname to get access to it. Or an internal package, perhaps.

@egonelbre
Copy link
Contributor Author

After poking it a bit it seems using nanotimeQPC at the moment works only for Wine, since the implementation converts a division into multiplication that doesn't work in the general case.

Usually the way to convert QPC value is result = qpcCounter * 1e9 / qpcFrequency. Currently the implementation uses qpcMultiplier = 1e9 / qpcFrequency and then result = qpcCounter * qpcMultiplier, which only works correctly when frequency is a power of 10.

https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L447

Ideas so far:

  1. add QPC funcs into internal/syscall/windows and use them rather than try to modify runtime to fit the needs
  2. use division inside nanotimeQPC to avoid the error and accept the cost of division
  3. find a faster way to calculate division by qpcFrequency, if there exists one

1 seems the smallest and easiest change. 2 requires adjusting things inside runtime... also, I'm not sure exactly how big the cost of additional division would be in practice. 3 seems slightly overkill, and I don't have a really good idea at the moment.

Based on these thoughts, my instincts tell me to go with 1.

@alexbrainman
Copy link
Member

As a demonstration here's code that measures QueryPerformanceCounter vs. time.Now: https://play.golang.org/p/FN4AtJ9b51m.

I agree. I also see QueryPerformanceCounter performers better than time.Now on my Windows 10 computer. Each qpcDeltas[i] is about 600-700 (ns), while most timeDeltas[i] are 0.

Fixing nanotime would be problematic (as seen in #8687)

I don't see why we cannot change nanotime to use QueryPerformanceCounter. Remind me, please. I just changed runtime

$ git diff
diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index d3e84fe..8f1373c 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -251,10 +251,10 @@ func loadOptionalSyscalls() {
                throw("WSAGetOverlappedResult not found")
        }

-       if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
+       //if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
                // running on Wine
                initWine(k32)
-       }
+       //}
 }

 //go:nosplit
$

and it seems to be working fine.

As we vary benchtime it shows quite a significant timing error:

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32                1000                 0.00 ns/op
PASS
ok      sandbox/benchmark       0.261s

I would not use -benchtime 1000x for this test on Windows. Just let your benchmark run for long enough (default of 1 second is good), and you should be good.

Ideas so far:

  1. add QPC funcs into internal/syscall/windows and use them rather than try to modify runtime to fit the needs
  2. use division inside nanotimeQPC to avoid the error and accept the cost of division
  3. find a faster way to calculate division by qpcFrequency, if there exists one

1 seems the smallest and easiest change. 2 requires adjusting things inside runtime... also, I'm not sure exactly how big the cost of additional division would be in practice. 3 seems slightly overkill, and I don't have a really good idea at the moment.

Based on these thoughts, my instincts tell me to go with 1.

I would (if I had free time) try 2 or 3 first. Maybe division is not expensive enough? Can we do division in every 100's nanotime call to adjust for the drift?

Making runtime.nanotime and time.Now more precise might be beneficial in other areas (not just for benchmarks).

I am also worried about creating too many functions that read time. They all use different approach to collect time. How do you know the time they return is all in sync? And different implementations might diverge.

Alex

@egonelbre
Copy link
Contributor Author

As far as I recall QPC is unreliable for measuring long periods of times.

Best reference mentioning issues I was able to find https://stackoverflow.com/a/5297163/192220. More in depth info on timing issues https://github.com/chromium/chromium/blob/08dbb44e81454b6d67c3b6f4989e7e58e88f4b0b/base/time/time_win.cc#L14. MSDN also only suggests it for measuring short periods of time https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps.

I would not use -benchtime 1000x for this test on Windows. Just let your benchmark run for long enough (default of 1 second is good), and you should be good.

Yeah, I wouldn't use that value either. This is to demonstrate the make the issue more obvious.

@alexbrainman
Copy link
Member

As far as I recall QPC is unreliable for measuring long periods of times.

Best reference mentioning issues I was able to find https://stackoverflow.com/a/5297163/192220. More in depth info on timing issues https://github.com/chromium/chromium/blob/08dbb44e81454b6d67c3b6f4989e7e58e88f4b0b/base/time/time_win.cc#L14. MSDN also only suggests it for measuring short periods of time https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps.

None of this looks authoritative enough for me. In fact, Microsoft suggests to use QPC for high resolution timers pretty much unconditionally

https://docs.microsoft.com/en-us/windows/desktop/winmsg/about-timers

https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps

Also, given #31528, we will, probably, change nanotime implementation anyway.

Maybe it is good time to address your issue and #31528 together? In particular, see my suggestion about QueryUnbiasedInterruptTimePrecise in #31528 (comment) ? In particular, if QueryUnbiasedInterruptTimePrecise is too slow / expensive to call from nanotime, maybe we could call it only from time.now, and call QueryUnbiasedInterruptTime from nanotime? Would that make Windows benchmarks resolution higher (I did not look at benchmarking code) ?

Thank you/

Alex

@egonelbre
Copy link
Contributor Author

@alexbrainman Yes, changing nanotime would fix this issue. The fix would definitely be better overall, but as I said, it is somewhat scary to me. Then again, maybe the unreliability situation has improved significantly.

@alexbrainman
Copy link
Member

Then again, maybe the unreliability situation has improved significantly

My feeling exactly. Things change. Do not forget QueryUnbiasedInterruptTimePrecise is only available on Windows 10. But I will take that.

Anyway, we have to come up with a solution for #31528, so lets wait for that before deciding what to do here.

Alex

@zx2c4
Copy link
Contributor

zx2c4 commented Aug 25, 2019

Please don't use QueryUnbiasedInterruptTimePrecise, per #31528 (comment)

@egonelbre
Copy link
Contributor Author

The #31528 was closed with a different solution.

I updated testing to use QueryPerformanceCounter in https://go-review.googlesource.com/c/go/+/227499, but I'm not sure whether this is the best way to go about it. Other ideas how to make benchmarking less noisy on quick runs would helpful.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/227499 mentions this issue: testing: use QueryPerformanceCounter for benchmarks

@alexbrainman
Copy link
Member

I updated testing to use QueryPerformanceCounter in https://go-review.googlesource.com/c/go/+/227499, but I'm not sure whether this is the best way to go about it. Other ideas how to make benchmarking less noisy on quick runs would helpful.

I spent some time investigating this. And, I agree, if we use QueryPerformanceCounter, we could improve our time.Duration precision from 1 millisecond to around few microsecond. Same with either QueryInterruptTimePrecise or QueryUnbiasedInterruptTimePrecise. But using syscall.Syscall to implement this code (similar to your CL 227499) is 10 times slower than current time.Now implementation. This is what I see

BenchmarkNow
BenchmarkNow-4                                  161259765                7.30 ns/op
BenchmarkQPC
BenchmarkQPC-4                                  11327251               106 ns/op
BenchmarkUnbiasedInterruptTime
BenchmarkUnbiasedInterruptTime-4                13963765                86.8 ns/op
BenchmarkUnbiasedInterruptPreciseTime
BenchmarkUnbiasedInterruptPreciseTime-4         11112705               107 ns/op

I don't think it is acceptable for time.Now. Maybe it is acceptable for testing package benchmark code. I am not sure.

Unrelated to this. Austin tried replacing our current time.Now implementation with asm version of QueryUnbiasedInterruptTime (see https://go-review.googlesource.com/c/go/+/211307 ) - and it is fast enough replacement. Maybe we could do something similar for *Precise version - if only for time.Now implementation, not runtime.nanotime implementation.

Alex

@egonelbre
Copy link
Contributor Author

Based on https://github.com/golang/go/wiki/MinimumRequirements#windows, the minimum required is Windows 2008R2. QueryInterruptTimePrecise is available starting from Windows 10 and Windows Server 2016 (https://docs.microsoft.com/en-us/windows/win32/api/realtimeapiset/nf-realtimeapiset-queryinterrupttimeprecise). The same goes for QueryInterruptTime.

With regards to using QueryInterruptTimePrecise as a time.Now source, I'm not sure what the effects will be.

I also agree that the performance of QPC (and similar) with syscall is not suitable for time.Now. With regards to benchmarking code, the performance is not ideal, however it is a significant improvement over a 1ms error in measurement.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/557315 mentions this issue: testing: use QueryPerformanceCounter on Windows

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 24, 2024
@alexbrainman
Copy link
Member

It appears the fix for this issue in CL 557315 broke all arm systems. I created #67077 to keep track of the breakage.

Let me know if I should revert the CL.

Thank you.

Alex

@egonelbre
Copy link
Contributor Author

I'm fine with reverting the CL to avoid blocking other work, but I don't think that's the underlying issue. If there's no immediate fix, then we can revert to get things building again. Anyways, let's continue the discussion in #67077.

@alexbrainman
Copy link
Member

I'm fine with reverting the CL to avoid blocking other work, but I don't think that's the underlying issue. If there's no immediate fix, then we can revert to get things building again.

I agree. Let's wait for others before we create a revert. We cannot submit the revert anyways without Googlers.

Anyways, let's continue the discussion in #67077.

SGTM.

Alex

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/582135 mentions this issue: testing: use time.Since in highPrecisionTime

gopherbot pushed a commit that referenced this issue Apr 27, 2024
time.Since has optimizations for measuring monotonic time.

For #31160.

Change-Id: I0529b9f69b9f008e3414b8e386b6faa64af4a008
Reviewed-on: https://go-review.googlesource.com/c/go/+/582135
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. Performance Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

8 participants