Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal code and possible loss of precision in Stopwatch.GetElapsedTime(long, long) #109685

Open
MineCake147E opened this issue Nov 10, 2024 · 8 comments
Labels
area-System.Runtime tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@MineCake147E
Copy link
Contributor

MineCake147E commented Nov 10, 2024

Description

Stopwatch.GetElapsedTime currently uses double-precision floating-point multiplication in order to convert the units of time.

private static readonly double s_tickFrequency = (double)TimeSpan.TicksPerSecond / Frequency;

public static TimeSpan GetElapsedTime(long startingTimestamp, long endingTimestamp) =>
new TimeSpan((long)((endingTimestamp - startingTimestamp) * s_tickFrequency));

This may result in a codegen that looks like this:

vzeroupper
sub rdx,rcx
vxorps    xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rdx
vmulsd    xmm0,xmm0,[7FFF0A2CABC0]
vfixupimmsd xmm0,xmm0,[7FFF0A2CABD0],0
vcmpgepd  k1,xmm0,[7FFF0A2CABE0]
vcvttsd2si rax,xmm0
vpbroadcastq xmm0,rax
vpblendmq xmm0{k1},xmm0,[7FFF0A2CABF0]
vmovq     rax,xmm0
ret

The bunch of double-precision floating-point instructions needed for conversion can be replaced with one of the following methods:

  • The TimeSpan.TicksPerSecond is equal to Stopwatch.Frequency
    • The conversion can be omitted entirely. It's obviously faster than current method.
  • The TimeSpan.TicksPerSecond is an integer multiple of Stopwatch.Frequency
    • The conversion can be done with integer multiplication.
  • The Stopwatch.Frequency is an integer multiple of TimeSpan.TicksPerSecond
    • The conversion can be done with constant integer division.
  • The Stopwatch.Frequency is not an integer multiple of TimeSpan.TicksPerSecond
    • The conversion can be done with some sort of constant integer fraction multiplication.
      • This one may involve multiple multiplication instructions, including Math.BigMul, so it might be slower than current method in certain environment.

It could improve not only performance, but also precision for long durations (if we're allowed).
Due to the long to double conversion, if the absolute value of the ticks is greater than $$2^{53}$$ (which is about 28.5 years), the lower bits of the ticks are unnecessarily rounded. Even if the rounding doesn't really matter for almost 100% of the applications of this API (because almost nobody wants to measure more than a decade with this API anyway), the performance improvements for trivial cases are worth doing.

Configuration

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4317/23H2/2023Update/SunValley3)
Intel Xeon w5-2455X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  DefaultJob : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Regression?

Unknown

Data


BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4317/23H2/2023Update/SunValley3)
Intel Xeon w5-2455X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  DefaultJob : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI


Method Mean Error StdDev Ratio RatioSD Code Size
IntegerAddLatency 0.2528 ns 0.0007 ns 0.0006 ns 1.00 0.00 81 B
Current 7.3734 ns 0.0592 ns 0.0554 ns 29.17 0.22 1,263 B
NoConversion 0.5060 ns 0.0018 ns 0.0014 ns 2.00 0.01 279 B
IntegerMultiply 1.2673 ns 0.0050 ns 0.0042 ns 5.01 0.02 353 B
IntegerConstantDivision 2.1699 ns 0.0054 ns 0.0048 ns 8.58 0.03 711 B
IntegerFraction 1.9243 ns 0.0052 ns 0.0049 ns 7.61 0.03 1,265 B
Benchmark Code
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;

namespace BenchmarkPlayground
{
    [SimpleJob(runtimeMoniker: RuntimeMoniker.HostProcess)]
    [DisassemblyDiagnoser(maxDepth: int.MaxValue)]
    public class GetElapsedTimeBenchmarks
    {
        const int OperationsPerInvoke = 1 << 20;

        [GlobalSetup]
        public void Setup()
        {
            Span<long> k = [0, 0, 0];
            RandomNumberGenerator.Fill(MemoryMarshal.AsBytes(k));
            l0 = k[0];
            l1 = k[1] | 1;
            l2 = k[2];
        }

        long l0, l1 = 1, l2;

        [SkipLocalsInit]
        [Benchmark(Baseline = true, OperationsPerInvoke = OperationsPerInvoke)]
        public long IntegerAddLatency()
        {
            var v0 = l0;
            var v1 = l1;
            var v2 = l2;
            for (int i = 0; i < OperationsPerInvoke; i += 16)
            {
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v0 += v1;
                v1 += v2;
            }
            return v0;
        }

        [SkipLocalsInit]
        [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
        public long Current()
        {
            var v0 = l0;
            var v1 = l1;
            var v2 = l2;
            var k = v2;
            const double M = double.Pi;
            for (int i = 0; i < OperationsPerInvoke; i += 16)
            {
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                k += (long)((k - v0) * M);
                v0 += v1;
                v1 += v2;
            }
            return k;
        }

        [SkipLocalsInit]
        [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
        public long NoConversion()
        {
            var v0 = l0;
            var v1 = l1;
            var v2 = l2;
            var k = v2;
            for (int i = 0; i < OperationsPerInvoke; i += 16)
            {
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                k += k - v0;
                v0 += v1;
                v1 += v2;
            }
            return k;
        }

        [SkipLocalsInit]
        [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
        public long IntegerMultiply()
        {
            var v0 = l0;
            var v1 = l1;
            var v2 = l2;
            var k = v2;
            const long M = 2611923443488327891;
            for (int i = 0; i < OperationsPerInvoke; i += 16)
            {
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                k += (k - v0) * M;
                v0 += v1;
                v1 += v2;
            }
            return k;
        }

        [SkipLocalsInit]
        [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
        public long IntegerConstantDivision()
        {
            var v0 = l0;
            var v1 = l1;
            var v2 = l2;
            var k = v2;
            const long M = 445;
            for (int i = 0; i < OperationsPerInvoke; i += 16)
            {
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                k += (k - v0) / M;
                v0 += v1;
                v1 += v2;
            }
            return k;
        }

        [SkipLocalsInit]
        [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
        public long IntegerFraction()
        {
            var v0 = l0;
            var v1 = l1;
            var v2 = l2;
            var k = v2;
            const long M = 2611923443488327891;
            const long Y = 55478262137326323;
            for (int i = 0; i < OperationsPerInvoke; i += 16)
            {
                var diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                diff = k - v0;
                k += diff * M + Math.BigMul(diff, Y, out _);
                v0 += v1;
                v1 += v2;
            }
            return k;
        }
    }
}
Benchmark Disassembly

.NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerAddLatency()
       mov       rax,[rcx+8]
       mov       rdx,[rcx+10]
       mov       rcx,[rcx+18]
       xor       r8d,r8d
       nop
M00_L00:
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rax,rdx
       add       rdx,rcx
       add       r8d,10
       cmp       r8d,100000
       jl        short M00_L00
       ret
; Total bytes of code 81

.NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; BenchmarkPlayground.GetElapsedTimeBenchmarks.Current()
       mov       rax,[rcx+8]
       mov       rdx,[rcx+10]
       mov       rcx,[rcx+18]
       mov       r8,rcx
       xor       r10d,r10d
       vmovsd    xmm0,qword ptr [7FF7A8B0B140]
M00_L00:
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       vxorps    xmm1,xmm1,xmm1
       vcvtsi2sd xmm1,xmm1,r9
       vmulsd    xmm1,xmm1,xmm0
       vfixupimmsd xmm1,xmm1,[7FF7A8B0B150],0
       vcmpgepd  k1,xmm1,[7FF7A8B0B160]
       vcvttsd2si r9,xmm1
       vpbroadcastq xmm1,r9
       vpblendmq xmm1{k1},xmm1,[7FF7A8B0B170]
       vmovq     r9,xmm1
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       add       rdx,rcx
       add       r10d,10
       cmp       r10d,100000
       jl        near ptr M00_L00
       mov       rax,r8
       ret
; Total bytes of code 1263

.NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; BenchmarkPlayground.GetElapsedTimeBenchmarks.NoConversion()
       mov       rax,[rcx+8]
       mov       rdx,[rcx+10]
       mov       rcx,[rcx+18]
       mov       r8,rcx
       xor       r10d,r10d
M00_L00:
       mov       r9,r8
       sub       r9,rax
       add       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       add       rdx,rcx
       add       r10d,10
       cmp       r10d,100000
       jl        near ptr M00_L00
       mov       rax,r8
       ret
; Total bytes of code 279

.NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerMultiply()
       mov       rax,[rcx+8]
       mov       rdx,[rcx+10]
       mov       rcx,[rcx+18]
       mov       r8,rcx
       xor       r10d,r10d
M00_L00:
       mov       r9,r8
       sub       r9,rax
       mov       r11,243F6A8885A308D3
       imul      r9,r11
       add       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       mov       r9,r8
       sub       r9,rax
       imul      r9,r11
       add       r9,r8
       mov       r8,r9
       add       rax,rdx
       add       rdx,rcx
       add       r10d,10
       cmp       r10d,100000
       jl        near ptr M00_L00
       mov       rax,r8
       ret
; Total bytes of code 353

.NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerConstantDivision()
       mov       r8,[rcx+8]
       mov       r10,[rcx+10]
       mov       rcx,[rcx+18]
       mov       r9,rcx
       xor       r11d,r11d
M00_L00:
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       mov       rdx,r9
       sub       rdx,r8
       mov       rax,49A2CDF358049A2D
       imul      rdx
       mov       rax,rdx
       shr       rax,3F
       sar       rdx,7
       add       rax,rdx
       add       rax,r9
       mov       r9,rax
       add       r8,r10
       add       r10,rcx
       add       r11d,10
       cmp       r11d,100000
       jl        near ptr M00_L00
       mov       rax,r9
       ret
; Total bytes of code 711

.NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

; BenchmarkPlayground.GetElapsedTimeBenchmarks.IntegerFraction()
       push      rsi
       push      rbx
       sub       rsp,88
       mov       rax,[rcx+8]
       mov       r8,[rcx+10]
       mov       rcx,[rcx+18]
       mov       r10,rcx
       xor       r9d,r9d
M00_L00:
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+80]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       r10,rbx
       sar       rdx,3F
       mov       rbx,0C5192F7B738AF3
       and       rdx,rbx
       sub       r11,rdx
       add       r10,r11
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       lea       r11,[rsp+78]
       mulx      rbx,rsi,rbx
       mov       [r11],rsi
       mov       r11,243F6A8885A308D3
       imul      r11,rdx
       add       r11,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       rbx,rdx
       lea       r10,[r11+rbx]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+70]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+68]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+60]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+58]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+50]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+48]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+40]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+38]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+30]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+28]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+20]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+18]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+10]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       mov       rdx,r10
       sub       rdx,rax
       mov       r11,0C5192F7B738AF3
       lea       rbx,[rsp+8]
       mulx      r11,rsi,r11
       mov       [rbx],rsi
       mov       rbx,243F6A8885A308D3
       imul      rbx,rdx
       add       rbx,r10
       mov       r10,rdx
       sar       r10,3F
       mov       rdx,0C5192F7B738AF3
       and       rdx,r10
       sub       r11,rdx
       lea       r10,[rbx+r11]
       add       rax,r8
       add       r8,rcx
       add       r9d,10
       cmp       r9d,100000
       jl        near ptr M00_L00
       mov       rax,r10
       add       rsp,88
       pop       rbx
       pop       rsi
       ret
; Total bytes of code 1265

Analysis

@MineCake147E MineCake147E added the tenet-performance Performance related issue label Nov 10, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jkotas jkotas added area-System.Runtime and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Nov 10, 2024
@jkotas jkotas changed the title Suboptimal codegen and possible loss of precision in Stopwatch.GetElapsedTime(long, long) Suboptimal code and possible loss of precision in Stopwatch.GetElapsedTime(long, long) Nov 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

the performance improvements for trivial cases are worth doing.

Your micro-benchmark numbers show improvement of less than 1 nanoseconds. I do not think that less than 1 nanosecond improvement is worth the added complexity for this API.

@AlgorithmsAreCool
Copy link
Contributor

AlgorithmsAreCool commented Nov 10, 2024

I have used this API in tight measurement loops before. Although the gains are <1ns, they are proportionally significant to baseline.

While I'm sure it can be different on different processors, isn't Stopwatch.Freqeuency == TimeSpan.TicksPerSecond a common case (on x86/x64 anyway).

What about adding a conditional to check if they are equal and special casing that scenario to just use subtraction? It should be eliminated by the JIT due to the static readonly promotion to const. Very little extra complexity for what could be used as a latency sensitive API?

EDIT
It looks like the ARM generic timer is typically fixed at 1Ghz, perhaps that could be special cased also as a common case?

@jkotas
Copy link
Member

jkotas commented Nov 10, 2024

It should be eliminated by the JIT due to the static readonly promotion to const.

It would not be eliminated for AOT, so the proposed change would be an improvement for JIT and regression for AOT (in some cases at least).

@AlgorithmsAreCool
Copy link
Contributor

I would not be eliminated for AOT...

I should certainly hope not!

@MineCake147E
Copy link
Contributor Author

I accidentally measured reciprocal throughput instead of latency.
I updated the result of the benchmarks, which now measure latencies.

@KalleOlaviNiemitalo
Copy link

If you change Stopwatch.GetElapsedTime, please consider changing TimeProvider.GetElapsedTime as well.

return new TimeSpan((long)((endingTimestamp - startingTimestamp) * ((double)TimeSpan.TicksPerSecond / timestampFrequency)));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

4 participants