Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean operations generated IL #635

Closed
thinkbeforecoding opened this issue Sep 16, 2015 · 14 comments · Fixed by #5116
Closed

Boolean operations generated IL #635

thinkbeforecoding opened this issue Sep 16, 2015 · 14 comments · Fixed by #5116

Comments

@thinkbeforecoding
Copy link
Contributor

I was doing perf test today and noticed that F# didn't use intrinsic boolean operator with && and ||.
C# also makes some optimizations in release, but the F# version is a bit slower that C#.

The test case uses the following function:

let isAlphaNum (s:string) pos =
    let c = s.[pos]
    (c >= '0' && c<='9') || (c>='A' && c<='Z') || (c>='a' && c <= 'z')

when decompiled with dotPeek it looks like:

public static bool isAlphaNum(string s, int pos) {
    char ch = s[pos];
    if ((((int) ch < 48 ? 0 : ((int) ch <= 57 ? 1 : 0)) == 0 ? ((int) ch < 65 ? 0 : ((int) ch <= 90 ? 1 : 0)) : 1) != 0)
      return true;
    if ((int) ch >= 97)
      return (int) ch <= 122;
    return false; }

The following equivalent C# implementation which is highly similar:

        static bool IsAlphaNum(string s, int pos) {
            var c = s[pos];
            return c >= '0' && c <= '9' || c >= 'A' && c <= 'Z' || c >= 'a' && c <= 'z';  }

Results in release in:

   private static bool IsAlphaNum(string s, int pos)  {
      char ch = s[pos];
      if ((int) ch >= 48 && (int) ch <= 57 || (int) ch >= 65 && (int) ch <= 90)
        return true;
      if ((int) ch >= 97)
        return (int) ch <= 122;
      return false;    }

If the last lines look the same, F# uses a ternary construct with 1s and 0s instead of using && and || ...

A micro benchmark on 100.000.000 iterations doing the full path (always testing a '}') gives
~417ms for the F# version
~280ms for the C# version

Tests with other charts give an advantage for C#.

The question is: is there a good reason to use ?: == 0 == 1 instead of Boolean && and || native operators ?

@thinkbeforecoding
Copy link
Contributor Author

The il for the F# version:

.method public static bool  isAlphaNum(string s,
                                       int32 pos) cil managed
{
  .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = ( 01 00 02 00 00 00 01 00 00 00 01 00 00 00 00 00 ) 
  // Code size       71 (0x47)
  .maxstack  4
  .locals init ([0] char c)
  IL_0000:  nop
  IL_0001:  ldarg.0
  IL_0002:  ldarg.1
  IL_0003:  callvirt   instance char [mscorlib]System.String::get_Chars(int32)
  IL_0008:  stloc.0
  IL_0009:  ldloc.0
  IL_000a:  ldc.i4.s   48
  IL_000c:  blt.s      IL_0019
  IL_000e:  ldloc.0
  IL_000f:  ldc.i4.s   57
  IL_0011:  cgt
  IL_0013:  ldc.i4.0
  IL_0014:  ceq
  IL_0016:  nop
  IL_0017:  br.s       IL_001b
  IL_0019:  ldc.i4.0
  IL_001a:  nop
  IL_001b:  brfalse.s  IL_0021
  IL_001d:  ldc.i4.1
  IL_001e:  nop
  IL_001f:  br.s       IL_0033
  IL_0021:  ldloc.0
  IL_0022:  ldc.i4.s   65
  IL_0024:  blt.s      IL_0031
  IL_0026:  ldloc.0
  IL_0027:  ldc.i4.s   90
  IL_0029:  cgt
  IL_002b:  ldc.i4.0
  IL_002c:  ceq
  IL_002e:  nop
  IL_002f:  br.s       IL_0033
  IL_0031:  ldc.i4.0
  IL_0032:  nop
  IL_0033:  brfalse.s  IL_0037
  IL_0035:  ldc.i4.1
  IL_0036:  ret
  IL_0037:  ldloc.0
  IL_0038:  ldc.i4.s   97
  IL_003a:  blt.s      IL_0045
  IL_003c:  ldloc.0
  IL_003d:  ldc.i4.s   122
  IL_003f:  cgt
  IL_0041:  ldc.i4.0
  IL_0042:  ceq
  IL_0044:  ret
  IL_0045:  ldc.i4.0
  IL_0046:  ret
} // end of method Program::isAlphaNum

While the C# version is:

.method private hidebysig static bool  IsAlphaNum(string s,
                                                  int32 pos) cil managed
{
  // Code size       46 (0x2e)
  .maxstack  2
  .locals init ([0] char c)
  IL_0000:  ldarg.0
  IL_0001:  ldarg.1
  IL_0002:  callvirt   instance char [mscorlib]System.String::get_Chars(int32)
  IL_0007:  stloc.0
  IL_0008:  ldloc.0
  IL_0009:  ldc.i4.s   48
  IL_000b:  blt.s      IL_0012
  IL_000d:  ldloc.0
  IL_000e:  ldc.i4.s   57
  IL_0010:  ble.s      IL_002c
  IL_0012:  ldloc.0
  IL_0013:  ldc.i4.s   65
  IL_0015:  blt.s      IL_001c
  IL_0017:  ldloc.0
  IL_0018:  ldc.i4.s   90
  IL_001a:  ble.s      IL_002c
  IL_001c:  ldloc.0
  IL_001d:  ldc.i4.s   97
  IL_001f:  blt.s      IL_002a
  IL_0021:  ldloc.0
  IL_0022:  ldc.i4.s   122
  IL_0024:  cgt
  IL_0026:  ldc.i4.0
  IL_0027:  ceq
  IL_0029:  ret
  IL_002a:  ldc.i4.0
  IL_002b:  ret
  IL_002c:  ldc.i4.1
  IL_002d:  ret
} // end of method Program::IsAlphaNum

@thinkbeforecoding thinkbeforecoding changed the title A Boolean operations generated IL Sep 17, 2015
@thinkbeforecoding
Copy link
Contributor Author

Ok forget about native Boolean operators, they're just conditional jump in IL, and dotPeek is trying to represent it as Boolean operations..

That being said, the IL generated by F# is longer and slower than the one generated by C#...

Comparing the IL..

  • F# needs 4 stack locations.
  • It introduces an extra nop at the top.. for alignments ?
  • Things are similar then until the comparison to 57 ('9'), but then, C# simply conditionaly branches to the end, while F# does:
  IL_0011:  cgt
  IL_0013:  ldc.i4.0
  IL_0014:  ceq
  IL_0016:  nop
  IL_0017:  br.s       IL_001b

so instead on conditionally branching, it pushes the result on the stack, then compare it to 0 before branching to IL_001b where there is a conditional branch on false:

IL_001b:  brfalse.s  IL_0021

This looks quite convoluted compared to C# straight forward implementation. Any reason for this ?

@thinkbeforecoding
Copy link
Contributor Author

Ok I've seen where it comes from..
``let inline (>=) (x:int) (y:int) = not(# "clt" x y : bool #)`
and

             let (&&) x y = if x then y else false 
             let (||) x y = if x then true else y

And there seem to be no further optimization.

@thinkbeforecoding
Copy link
Contributor Author

I've seen that the if then else statement is converted to an Expr.Match internally through a MatchBuilder.
The Match is then converted to IL in ilxGen using sequence point.

Maybe it is possible to combine Matches together in an optimization pass to remove redundant code..
c >= '0' && c<= '9' || c=>'a' && c<= 'z'
is currently converted to :

let isDigit =
   if c >='0' then
      c<='9'
  else
      false
if isDigit = true then
    true
else
   If c >= 'a' then
      c <= 'z'
   else
     false

Or something close to it...

@TIHan
Copy link
Contributor

TIHan commented Sep 19, 2015

This is indeed interesting. @thinkbeforecoding you probably already seen this https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/Optimizer.fs#L2848 wondering if you have thoughts on it.

The nops are there for debugging, they don't get generated if you use the flag "--debug-" in compilation. For some reason, when compiling in release using Visual Studio, it will still generate debug information which in turn will generate the nops. You have to put in the "--debug-" flag to stop it. I'm wondering what happens to performance when you use that flag.

I'm taking a wild guess, I wish I knew what to do, but I don't. Maybe something can be done here for CmpThenBrOrContinue: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/IlxGen.fs#L4242

@thinkbeforecoding
Copy link
Contributor Author

True, the --debug- removes the nops.. but it doesn't change the result significantly..
Looking at Optimizer.fs and the conditional jump generation to see if something can be done..

@asik
Copy link
Contributor

asik commented Jan 12, 2016

If it can be of any help, here's commented optimized assembly from RyuJIT for C# and F# compared:

(c >= 15 && c <= 34) || (c >= 67 && c <= 70)
C#:
012B34FF  cmp         edx,0Fh  // if c < 15
012B3502  jl          012B3509 // goto 1
012B3504  cmp         edx,22h  // if c <= 34
012B3507  jle         012B351F // goto 2
012B3509  cmp         edx,43h  // label 1: if c < 67
012B350C  jl          012B351B // goto 3
012B350E  cmp         edx,46h  // . 
012B3511  setle       al       // ..
012B3514  movzx       eax,al   // ...
012B3517  mov         edi,eax  // result = (c <= 70)
012B3519  jmp         012B3524 // break out
012B351B  xor         edi,edi  // label 3: result = false
012B351D  jmp         012B3524 // break out
012B351F  mov         edi,1    // label 2: result = true

F#:
00CC04B6  cmp         edx,0Fh  // if c < 15
00CC04B9  jl          00CC04C6 // goto 1
00CC04BB  cmp         edx,22h  // .
00CC04BE  setle       al       // ..
00CC04C1  movzx       eax,al   // eax = (c <= 34)
00CC04C4  jmp         00CC04C8 // goto 2
00CC04C6  xor         eax,eax  // label 1: eax = false
00CC04C8  test        eax,eax  // label 2: if eax = false
00CC04CA  je          00CC04D3 // goto 3
00CC04CC  mov         eax,1    // eax = true
00CC04D1  jmp         00CC04E5 // goto 5
00CC04D3  cmp         edx,43h  // label 3: if c < 67
00CC04D6  jl          00CC04E3 // goto 4
00CC04D8  cmp         edx,46h  // .
00CC04DB  setle       al       // ..
00CC04DE  movzx       eax,al   // eax = (c <= 70)
00CC04E1  jmp         00CC04E5 // goto 5
00CC04E3  xor         eax,eax  // label 4: eax = false
00CC04E5  movzx       edi,al   // label 5: result = eax

So the JIT is closely following the IL, can't rely on it to optimize this case.

@dsyme
Copy link
Contributor

dsyme commented Jan 13, 2016

@asik - could you add instructions to DEVGUIDE.md about how to get that generated native code for a method? That's very useful!

@dsyme dsyme added Bug Area-Compiler Impact-Medium (Internal MS Team use only) Describes an issue with moderate impact on existing code. labels Jul 12, 2016
@dsyme
Copy link
Contributor

dsyme commented Jul 12, 2016

I've been taking a glance through issues and noticed this one, where we can definitely improve codegen here though finding the general rule needs some work. I'll mark it as a bug to aid with tracking even though it's really a perf improvement.

I think this is going to best boil down to an Optimizer.fs rewrite that merges the switching/choice logic of one decision tree into another in some situations.

The general situation seems to be close to this (these are rough notes). First use notation

switch Input Choices | Targets

for TAST decision trees made up only of TDSwitch and TDSuccess nodes and with no values bound in the transfer to the targets (that is the expression list is empty in the TDSuccess nodes).

Assume for now that the TDSwitch nodes are always switching over boolean values (that is, each TCase has a Test which is a Const boolean).

So the natural form of things like (E1 && E2) || E3 is something like this:

switch (switch E1 [ Const1 -> N1; Const2 -> N2 ] | Targets1) Choices2 | Targets2

The aim seems to be to rewrite this to

switch E1 [ Const1 -> switch Targets1[N1] Choices2; Const2 -> switch Targets1[N2] Choices2 ] | Targets2

Note that this duplicates the decision trees implied by Choices2, but decision trees are small in any case. And for && and || operations cases, subsequent rewrites would always boil one of the cases away, bringing us back to something of the same size.

Slightly more generally we have

switch (switch E1 Choices1 Targets1) Choices2 Targets2

which will be rewritten to

switch Input1 (Choices1/[switch Targets1[N] Choices2]) Targets2

The tricky thing is to determine the exact situation in which this rule applies and to formulate this as code.

(aside: is this more useful than just boolean switches? Is it useful for other inputs like option or enumeration or integer matching, e.g. match (match optionValue with Some x -> true | None -> b) with true -> ... | false -> ...? Note the above is all in the case where no variables get bound during pattern matching. Generalizing to cases like option matching would mean dealing with variables getting bound)

@thinkbeforecoding
Copy link
Contributor Author

Nice to see some move on this :)

@dsyme dsyme removed the discussion label Aug 3, 2016
@thinkbeforecoding
Copy link
Contributor Author

thinkbeforecoding commented Oct 19, 2016

I've been digging a bit more to understand where optimizations are done...

I traced the compiler optimization on the following exemple:

module Switch
let sw x = match x with
           | true -> true
           | false -> false

The generated IL is:

 IL_0000: ldarg.0      // x
 IL_0001: ret    

as expected.
But the reduction of the expression doesn't happen in OptimizeSwitch (Optimizer.fs) nor in mkAndSimplifyMatch (TastOps.fs)..
At the end of this pass, the TAST is like (simplified for clarity)

Lambda( ["x"],
   body: 
     Switch( "x",
        cases: [ Case(True, 0)]
        default: Some (1))
        targets: [ (0, Const True)
                       (1, Const False)] ))

Where it could probably be:
Lambda(["x"], body: Value "x")
so the optimization seem to be done in the next pass, the IL generation... Am I right ?
Would it be more interesting to match this kind of patterns earlier to proceed to better optimization ?

@dsyme dsyme added Tenet-Performance and removed Bug Impact-Medium (Internal MS Team use only) Describes an issue with moderate impact on existing code. labels Nov 14, 2016
@thinkbeforecoding
Copy link
Contributor Author

So I made a draft implementation of this...
and the following code:

let isAlphaNum c =
    (c >= '0' && c<='9') || (c>='A' && c<='Z') 

is now compiled like:

public static bool isAlphaNum(char c)
  {
    if ((int) c >= 48) //  >='0'
    {
      if ((int) c <= 57) // and <= '9' -> true it a digit 
        return true;
      if ((int) c >= 65) // it may be a letter if c >= 'A'
        return (int) c <= 90; // verify it is
      return false; // it was not >= 'A' so no.
    }
    if ((int) c >= 65) 
      return (int) c <= 90;
    return false;
  }

There is no comparison to constants anymore.

The problem is that the right part (c >='A' && c <= 'Z') is duplicated in the output...
While rewriting the AST, (A && B) || C , is changed if to ( if A then (if B then true else C) else C) .. and C is duplicated...
Is there a way to avoid the duplication in the IL ?

@dsyme
Copy link
Contributor

dsyme commented Mar 17, 2018

Can you share a link to your branch? Thanks

@thinkbeforecoding
Copy link
Contributor Author

Here is the branch:
https://github.com/thinkbeforecoding/visualfsharp/tree/booloptim

And here is the sample I use for tests:
https://github.com/thinkbeforecoding/booltest

it builds the fs sample using the fsc version under deployment and usual fsc version as well as a C# version of the same code.
It also runs a basic unit test on generated assembly to check we did not break boolean logic.
To run it, set a environment variable devfscpath to the directory where you compile fsc.

@cartermp cartermp added this to the Unknown milestone Aug 25, 2018
@cartermp cartermp modified the milestones: Unknown, 16.0 Sep 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants