Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance difference between 2 code patterns doing same thing #100493

Open
DeepakRajendrakumaran opened this issue Apr 1, 2024 · 1 comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@DeepakRajendrakumaran
Copy link
Contributor

Description

Ran into it as part of this PR - #99982 relevant comment thread here - #99982 (comment)

The following 2 Code patterns are logically the same

private static  bool HasMatch2(Vector256<byte> vector)
   {
       return ((vector & Vector256.Create((byte)0x80)) != Vector256<byte>.Zero);
   }
  private static  bool HasMatch3(Vector256<byte> vector)
  {
        return !((vector & Vector256.Create((byte)0x80)).Equals(Vector256<byte>.Zero));
   }

They seem to produce same assembly : https://godbolt.org/z/1rzEcj8ar

The PR referred above uses the pattern in HasMatch3. When I try the pattern in HasMatch2, the performance degrades

How to reproduce

  1. Check out this PR if it's not merged yet
  2. Create and compile the following benchmark on an ICX(I tested on ICX)
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System;
using System.Text;
using System.Diagnostics;

namespace ProfilingDocs
{
    class Program
    {


        private static byte[] _sourceBytes = Enumerable.Repeat((byte)'a', 5120).ToArray();

        static void Main()
        {

             var timer = new Stopwatch();
            timer.Start();

            for (int i = 0; i < 12_000_000; i++)
            {
                GetString();
            }

            timer.Stop();

         TimeSpan timeTaken = timer.Elapsed;
        string foo = "Time taken: " + timeTaken.ToString(@"m\:ss\.fff"); 
        Console.WriteLine(foo);



        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static string GetString() => Encoding.UTF8.GetString(_sourceBytes);




    }
}
  1. Run this benchmark with local build of PR
  2. Change the following in PR(https://github.com/dotnet/runtime/pull/99982/files#diff-6b4906abc01dc4699f348f7c1df72e2f640f240aa31ea67cd47642221b2021f5R2204) to

((vector & Vector256.Create((byte)0x80)) != Vector256<byte>.Zero);

  1. Recompile repo and rerun the benchmark

Data

image

@DeepakRajendrakumaran DeepakRajendrakumaran added the tenet-performance Performance related issue label Apr 1, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 1, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 1, 2024
@teo-tsirpanis teo-tsirpanis added area-System.Runtime.Intrinsics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 1, 2024
@EgorBo
Copy link
Member

EgorBo commented Apr 1, 2024

Dup of #93174 (same issue)

@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Runtime.Intrinsics untriaged New issue has not been triaged by the area owner labels Apr 1, 2024
@EgorBo EgorBo added this to the Future milestone Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants