Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better lowering of [start..finish] & [|start..finish|] #16577

Closed

Conversation

brianrourkeboll
Copy link
Contributor

@brianrourkeboll brianrourkeboll commented Jan 23, 2024

Description

This is an experimental take at better lowering of some simple computed collection expression forms; I was mostly just curious how hard it would be. They do seem to work, though, and they're straightforward enough that they probably don't need a lang suggestion/RFC... (They aren't visible in quotations, for example.)

New lowerings:

  • Computed lists:

    • For constant $start$ and $finish$ when $start > finish$:
      [5..1][].

      Generated IL
      module Test
      let test () = [5..1]
      .maxstack  8
      IL_0000:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32>::get_Empty()
      IL_0005:  ret
    • For constant $start$ and $finish$ when $start &lt; finish$:
      [1..5]List.init (5 - 1 + 1) ((+) 1)List.init 5 ((+) 1).

      Generated IL
      module Test
      let test () = [1..5]
      // Initializer lambda body.
      .maxstack  8
      IL_0000:  ldc.i4.1
      IL_0001:  ldarg.1
      IL_0002:  add
      IL_0003:  ret
      .maxstack  8
      IL_0000:  ldc.i4.5
      IL_0002:  ldsfld     class Test/test@1 Test/test@1::@_instance
      IL_0007:  tail.
      IL_0009:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Initialize<int32>(int32,
                                                                                                                                                                     class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,!!0>)
      IL_000e:  ret
    • For dynamic $start$ and $finish$:
      [start..finish]if finish < start then [] else List.init (finish - start + 1) ((+) start).

      Generated IL
      module Test
      let test start finish = [start..finish]
      // Initializer lambda body.
      .maxstack  8
      IL_0000:  ldarg.0
      IL_0001:  ldfld      int32 Test/test@1::start
      IL_0006:  ldarg.1
      IL_0007:  add
      IL_0008:  ret
      .maxstack  8
      IL_0000:  ldarg.1
      IL_0001:  ldarg.0
      IL_0002:  bge.s      IL_000a
      
      IL_0004:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32>::get_Empty()
      IL_0009:  ret
      
      IL_000a:  ldarg.1
      IL_000b:  ldarg.0
      IL_000c:  sub
      IL_000d:  ldc.i4.1
      IL_000e:  add
      IL_000f:  ldarg.0
      IL_0010:  newobj     instance void Test/test@1::.ctor(int32)
      IL_0015:  tail.
      IL_0017:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Initialize<int32>(int32,
                                                                                                                                                                     class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,!!0>)
      IL_001c:  ret
    • For dynamic $start$ and $finish$ where $start$ and/or $finish$ are not constants or already bound to a value (field, local variable, etc.):

      [start..finishExpr] →
          let finish = finishExpr in
          if finish < start then [] else List.init (finish - start + 1) ((+) start)
      
      [startExpr..finish] →
          let start = startExpr in
          if finish < start then [] else List.init (finish - start + 1) ((+) start)
      
      [startExpr..finishExpr] →
          let start = startExpr in
          let finish = finishExpr in
          if finish < start then [] else List.init (finish - start + 1) ((+) start)
      
      Generated IL
      module Test
      let a () = (Array.zeroCreate 10).Length
      let b () = (Array.zeroCreate 20).Length
      let test start finish = [a ()..b ()]
      // Initializer lambda body.
      .maxstack  8
      IL_0000:  ldarg.0
      IL_0001:  ldfld      int32 Test/test@1::start
      IL_0006:  ldarg.1
      IL_0007:  add
      IL_0008:  ret
      .maxstack  4
      .locals init (int32 V_0,
               int32 V_1)
      IL_0000:  ldc.i4.s   10
      IL_0002:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::ZeroCreate<object>(int32)
      IL_0007:  ldlen
      IL_0008:  conv.i4
      IL_0009:  stloc.0
      IL_000a:  ldc.i4.s   20
      IL_000c:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::ZeroCreate<object>(int32)
      IL_0011:  ldlen
      IL_0012:  conv.i4
      IL_0013:  stloc.1
      IL_0014:  ldloc.1
      IL_0015:  ldloc.0
      IL_0016:  bge.s      IL_001e
      
      IL_0018:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<int32>::get_Empty()
      IL_001d:  ret
      
      IL_001e:  ldloc.1
      IL_001f:  ldloc.0
      IL_0020:  sub
      IL_0021:  ldc.i4.1
      IL_0022:  add
      IL_0023:  ldloc.0
      IL_0024:  newobj     instance void Test/test@1::.ctor(int32)
      IL_0029:  tail.
      IL_002b:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Initialize<int32>(int32,
                                                                                                                                                                     class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,!!0>)
      IL_0030:  ret
  • Computed arrays:

    • For constant $start$ and $finish$ when $start &gt; finish$:
      [|5..1|][||].

      Generated IL
      module Test
      let test () = [|5..1|]
      .maxstack  8
      IL_0000:  call       !!0[] [runtime]System.Array::Empty<int32>()
      IL_0005:  ret
    • For constant $start$ and $finish$ when $start &lt; finish$:
      [|1..5|]Array.init (5 - 1 + 1) ((+) 1)Array.init 5 ((+) 1).

      Generated IL
      module Test
      let test () = [|1..5|]
      // Initializer lambda body.
      .maxstack  8
      IL_0000:  ldc.i4.1
      IL_0001:  ldarg.1
      IL_0002:  add
      IL_0003:  ret
      .maxstack  8
      IL_0000:  ldc.i4.5
      IL_0005:  ldsfld     class Test/test@1 Test/test@1::@_instance
      IL_000a:  tail.
      IL_000c:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::Initialize<int32>(int32,
                                                                                                          class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,!!0>)
      IL_0011:  ret
    • For dynamic $start$ and $finish$:
      [|start..finish|]if finish < start then [||] else Array.init (finish - start + 1) ((+) start).

      Generated IL
      module Test
      let test start finish = [|start..finish|]
      // Initializer lambda body.
      .maxstack  8
      IL_0000:  ldarg.0
      IL_0001:  ldfld      int32 Test/test@1::start
      IL_0006:  ldarg.1
      IL_0007:  add
      IL_0008:  ret
      .maxstack  8
      IL_0000:  ldarg.1
      IL_0001:  ldarg.0
      IL_0002:  bge.s      IL_000a
      
      IL_0004:  call       !!0[] [runtime]System.Array::Empty<int32>()
      IL_0009:  ret
      
      IL_000a:  ldarg.1
      IL_000b:  ldarg.0
      IL_000c:  sub
      IL_000d:  ldc.i4.1
      IL_000e:  add
      IL_000f:  ldarg.0
      IL_0010:  newobj     instance void Test/test@1::.ctor(int32)
      IL_0015:  tail.
      IL_0017:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::Initialize<int32>(int32,
                                                                                                          class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,!!0>)
      IL_001c:  ret
    • For dynamic $start$ and $finish$ where $start$ and/or $finish$ are not constants or already bound to a value (field, local variable, etc.):

      [|start..finishExpr|] →
          let finish = finishExpr in
          if finish < start then [||] else Array.init (finish - start + 1) ((+) start)
      
      [|startExpr..finish|] →
          let start = startExpr in
          if finish < start then [||] else Array.init (finish - start + 1) ((+) start)
      
      [|startExpr..finishExpr|] →
          let start = startExpr in
          let finish = finishExpr in
          if finish < start then [||] else Array.init (finish - start + 1) ((+) start)
      
      Generated IL
      module Test
      let a () = (Array.zeroCreate 10).Length
      let b () = (Array.zeroCreate 20).Length
      let test start finish = [|a ()..b ()|]
      // Initializer lambda body.
      .maxstack  8
      IL_0000:  ldarg.0
      IL_0001:  ldfld      int32 Test/test@1::start
      IL_0006:  ldarg.1
      IL_0007:  add
      IL_0008:  ret
      .maxstack  4
      .locals init (int32 V_0,
               int32 V_1)
      IL_0000:  ldc.i4.s   10
      IL_0002:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::ZeroCreate<object>(int32)
      IL_0007:  ldlen
      IL_0008:  conv.i4
      IL_0009:  stloc.0
      IL_000a:  ldc.i4.s   20
      IL_000c:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::ZeroCreate<object>(int32)
      IL_0011:  ldlen
      IL_0012:  conv.i4
      IL_0013:  stloc.1
      IL_0014:  ldloc.1
      IL_0015:  ldloc.0
      IL_0016:  bge.s      IL_001e
      
      IL_0018:  call       !!0[] [runtime]System.Array::Empty<int32>()
      IL_001d:  ret
      
      IL_001e:  ldloc.1
      IL_001f:  ldloc.0
      IL_0020:  sub
      IL_0021:  ldc.i4.1
      IL_0022:  add
      IL_0023:  ldloc.0
      IL_0024:  newobj     instance void Test/test@1::.ctor(int32)
      IL_0029:  tail.
      IL_002b:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.ArrayModule::Initialize<int32>(int32,
                                                                                                          class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<int32,!!0>)
      IL_0030:  ret
Original questions

Questions

  • Should there be a size limit for const range unrolling? Probably. E.g., should [|1..65_536|] be lowered to a blob? What about [|-2147483648..2147483647|]? Does C# have a limit for their ReadOnlySpan<byte> Xs => [1, 2, 3];, which similarly generates a blob? In C#'s case, though, a developer would need to manually enumerate a huge number of elements in source, since C# doesn't have a range shorthand like [-2147483648..2147483647] (just like the existing optimization in F# for array literals, since the range shorthand didn't affect that before).

    Basically, not having a size limit risks the compiler using up massive amounts of memory at build time (and generating potentially gigantic assemblies), while having one implies a hidden runtime performance cliff (although falling back to Array.init closes the gap pretty closely).

Additional potential optimizations

  • Inlined/non-capturing closures for initializer for the call to Array.init/List.init. (The inlining is not happening because LowerComputedListOrArrayExpr is called in IlxGen.fs, after most optimization.)
  • Don't call List.init/Array.init at all; just manually emit for/while-loops, since we don't need the non-negative check for length/count.
  • Handle all integral types (int64, byte, etc.).
  • Handle arbitrary steps (e.g., [start..step..finish]let count = (finish - start) / step + 1 in if count = 0 then [] else List.init count ((*) step >> (+) start)).
  • Handle simple [for … in … -> …] and [|for … in … -> …|]:
    • [for n in 1..5 -> f n][f 1; f 2; f 3; f 4; f 5].
    • [for n in start..finish -> f n]List.init (finish - start + 1) ((+) start >> f).
    • [for x in xs -> f x]List.map f xs (?) when xs is a list.
    • [|for n in 1..5 -> f n|][|f 1; f 2; f 3; f 4; f 5|].
    • [|for n in start..finish -> f n|]Array.init (finish - start + 1) ((+) start >> f).
    • [|for x in xs -> f x|]Array.map f xs (?) when xs is an array.

Benchmarks

The difference is much more dramatic for arrays than for lists, but the difference for lists can still be significant, especially for smaller ranges.

| Categories                                            | Mean            | Error         | StdDev         | Ratio | RatioSD | Gen0     | Gen1     | Gen2     | Allocated | Alloc Ratio |
|------------------------------------------------------ |----------------:|--------------:|---------------:|------:|--------:|---------:|---------:|---------:|----------:|------------:|
| Array,1,[|10..1|]                                     |      18.2607 ns |     0.2396 ns |      0.2241 ns |  1.00 |    0.00 |   0.0076 |        - |        - |      96 B |        1.00 |
| Array,1,[|10..1|]                                     |       0.4375 ns |     0.0036 ns |      0.0033 ns |  0.02 |    0.00 |        - |        - |        - |         - |        0.00 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| Array,2,[|1..10|]                                     |      58.3847 ns |     0.1979 ns |      0.1653 ns |  1.00 |    0.00 |   0.0261 |        - |        - |     328 B |        1.00 |
| Array,2,[|1..10|]                                     |      10.8323 ns |     0.1045 ns |      0.0977 ns |  0.19 |    0.00 |   0.0051 |        - |        - |      64 B |        0.20 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| Array,3,[|1..50|]                                     |     152.9031 ns |     1.3775 ns |      1.2885 ns |  1.00 |    0.00 |   0.0732 |        - |        - |     920 B |        1.00 |
| Array,3,[|1..50|]                                     |      33.8829 ns |     0.3367 ns |      0.2812 ns |  0.22 |    0.00 |   0.0178 |        - |        - |     224 B |        0.24 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| Array,4,[|1..256|]                                    |     430.2353 ns |     6.5800 ns |      6.1549 ns |  1.00 |    0.00 |   0.1817 |        - |        - |    2280 B |        1.00 |
| Array,4,[|1..256|]                                    |     115.8702 ns |     0.9050 ns |      0.8022 ns |  0.27 |    0.00 |   0.0834 |        - |        - |    1048 B |        0.46 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| Array,5,[|1..257|]                                    |     539.9688 ns |     3.1164 ns |      2.6023 ns |  1.00 |    0.00 |   0.4301 |   0.0029 |        - |    5408 B |        1.00 |
| Array,5,[|1..257|]                                    |     114.7077 ns |     0.3495 ns |      0.3098 ns |  0.21 |    0.00 |   0.0842 |        - |        - |    1056 B |        0.20 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| Array,6,[|start..finish|] (start = 1, finish = 65536) | 210,957.6139 ns | 2,526.6296 ns |  2,363.4109 ns |  1.00 |    0.00 | 124.7559 | 124.7559 | 124.7559 |  524754 B |        1.00 |
| Array,6,[|start..finish|] (start = 1, finish = 65536) |  93,992.5838 ns | 1,658.9811 ns |  1,551.8120 ns |  0.45 |    0.01 |  83.2520 |  83.2520 |  83.2520 |  262220 B |        0.50 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| List,1,[10..1]                                        |      16.2660 ns |     0.1699 ns |      0.1590 ns |  1.00 |    0.00 |   0.0076 |        - |        - |      96 B |        1.00 |
| List,1,[10..1]                                        |       1.0550 ns |     0.0050 ns |      0.0042 ns |  0.06 |    0.00 |        - |        - |        - |         - |        0.00 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| List,2,[1..10]                                        |      54.8521 ns |     0.3753 ns |      0.3327 ns |  1.00 |    0.00 |   0.0318 |        - |        - |     400 B |        1.00 |
| List,2,[1..10]                                        |      36.7022 ns |     0.3463 ns |      0.2892 ns |  0.67 |    0.01 |   0.0255 |        - |        - |     320 B |        0.80 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| List,3,[1..50]                                        |     208.8935 ns |     2.2519 ns |      2.1064 ns |  1.00 |    0.00 |   0.1338 |   0.0007 |        - |    1680 B |        1.00 |
| List,3,[1..50]                                        |     167.1170 ns |     1.5154 ns |      1.4175 ns |  0.80 |    0.01 |   0.1273 |   0.0007 |        - |    1600 B |        0.95 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| List,4,[1..100]                                       |     376.5863 ns |     2.1164 ns |      1.8761 ns |  1.00 |    0.00 |   0.2613 |   0.0029 |        - |    3280 B |        1.00 |
| List,4,[1..100]                                       |     322.1434 ns |     3.4085 ns |      3.1883 ns |  0.86 |    0.01 |   0.2546 |   0.0029 |        - |    3200 B |        0.98 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| List,5,[1..101]                                       |     389.5236 ns |     2.3078 ns |      1.8018 ns |  1.00 |    0.00 |   0.2637 |   0.0029 |        - |    3312 B |        1.00 |
| List,5,[1..101]                                       |     327.6451 ns |     2.8857 ns |      2.4096 ns |  0.84 |    0.00 |   0.2575 |   0.0029 |        - |    3232 B |        0.98 |
|                                                       |                 |               |                |       |         |          |          |          |           |             |
| List,6,[start..finish] (start = 1, finish = 65536)    | 447,357.7881 ns | 8,867.0740 ns | 13,804.9659 ns |  1.00 |    0.00 | 166.9922 | 154.7852 |        - | 2097232 B |        1.00 |
| List,6,[start..finish] (start = 1, finish = 65536)    | 436,104.9072 ns | 8,637.6954 ns | 19,319.4435 ns |  0.97 |    0.04 | 166.9922 | 154.7852 |        - | 2097176 B |        1.00 |

Checklist

  • Test cases added.
  • Performance benchmarks added in case of performance changes.
  • Release notes entry updated.

@brianrourkeboll brianrourkeboll requested a review from a team as a code owner January 23, 2024 20:35
Copy link
Contributor

github-actions bot commented Jan 23, 2024

❗ Release notes required


✅ Found changes and release notes in following paths:

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/8.0.300.md

@T-Gro
Copy link
Member

T-Gro commented Jan 24, 2024

Could you please show the impact on the .dll size for the let constRangeInt32 () = [|1..65520|] construct ? Or with longs, to increase the size even more.
The only worry I would have is codegen increase in this situation. Not just for space purposes, but also for loading/startup time.

If we want to be really safe, this could be a numeric optional setting with a default.
Either number of elements, or size threshold.

@T-Gro
Copy link
Member

T-Gro commented Jan 24, 2024

Second question would be also with regards to codesize in case of inlining a function that produces [|1..largeNumber|].
Could this be added as a test please ?

@brianrourkeboll
Copy link
Contributor Author

@T-Gro

Could you please show the impact on the .dll size for the let constRangeInt32 () = [|1..65520|] construct ? Or with longs, to increase the size even more.
The only worry I would have is codegen increase in this situation. Not just for space purposes, but also for loading/startup time.

The added code size in the DLL should be the size of the blob, like:

.data cil I_000028A7 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0A 00 00 00)

for

So yes, it could be quite large if the constant range is large (especially if we added support for int64), and we should have some limit on that. The memory and CPU usage of the compiler also quickly become a problem if we don't have a limit (e.g., without a limit, for [|-2147483648..2147483647|] the compiler would try to generate 232 Expr.Consts…).

I wonder where the performance curves for

Array.init (finish - start + 1) ((+) start)

and

var arr = new int[bigNum];
System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray(arr, blobHandle);

intersect. If it's for a small enough $n$, that might be a good number to use.

Second question would be also with regards to codesize in case of inlining a function that produces [|1..largeNumber|].
Could this be added as a test please ?

Interesting. Given the behavior of the existing optimization for array literals, it looks like the code size would increase linearly with usage in that case as well:

open System

let inline f () = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]

let xs =
    ignore (f ())
    ignore (f ())
    ignore (f ())
    ignore (f ())
    ignore (f ())
Generated IL:

SharpLab

.assembly _
{
    .custom instance void [FSharp.Core]Microsoft.FSharp.Core.FSharpInterfaceDataVersionAttribute::.ctor(int32, int32, int32) = (
        01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
    )
    .hash algorithm 0x00008004 // SHA1
    .ver 0:0:0:0
}

.class private auto ansi '<Module>'
    extends [System.Runtime]System.Object
{
} // end of class <Module>

.class public auto ansi abstract sealed _
    extends [System.Runtime]System.Object
{
    .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
        01 00 07 00 00 00 00 00
    )
    // Fields
    .field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771579@ at I_00002B18
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .data cil I_00002B18 = bytearray (
        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
        09 00 00 00 0a 00 00 00
    )
    .field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771580@ at I_00002B40
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .data cil I_00002B40 = bytearray (
        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
        09 00 00 00 0a 00 00 00
    )
    .field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771581@ at I_00002B68
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .data cil I_00002B68 = bytearray (
        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
        09 00 00 00 0a 00 00 00
    )
    .field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771582@ at I_00002B90
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .data cil I_00002B90 = bytearray (
        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
        09 00 00 00 0a 00 00 00
    )
    .field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771583@ at I_00002BB8
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .data cil I_00002BB8 = bytearray (
        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
        09 00 00 00 0a 00 00 00
    )
    .field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771584@ at I_00002BE0
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .data cil I_00002BE0 = bytearray (
        01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
        05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
        09 00 00 00 0a 00 00 00
    )

    // Methods
    .method public static 
        int32[] f () cil managed 
    {
        // Method begins at RVA 0x2050
        // Code size 19 (0x13)
        .maxstack 8

        IL_0000: ldc.i4.s 10
        IL_0002: newarr [System.Runtime]System.Int32
        IL_0007: dup
        IL_0008: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771579@
        IL_000d: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
        IL_0012: ret
    } // end of method _::f

    .method public specialname static 
        class [FSharp.Core]Microsoft.FSharp.Core.Unit get_xs () cil managed 
    {
        // Method begins at RVA 0x2064
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldsfld class [FSharp.Core]Microsoft.FSharp.Core.Unit '<StartupCode$_>.$_'::xs@5
        IL_0005: ret
    } // end of method _::get_xs

    .method assembly specialname static 
        int32[] get__arg1@1 () cil managed 
    {
        // Method begins at RVA 0x206c
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::_arg1@1
        IL_0005: ret
    } // end of method _::get__arg1@1

    .method assembly specialname static 
        int32[] 'get__arg1@1-1' () cil managed 
    {
        // Method begins at RVA 0x2074
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-1'
        IL_0005: ret
    } // end of method _::'get__arg1@1-1'

    .method assembly specialname static 
        int32[] 'get__arg1@1-2' () cil managed 
    {
        // Method begins at RVA 0x207c
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-2'
        IL_0005: ret
    } // end of method _::'get__arg1@1-2'

    .method assembly specialname static 
        int32[] 'get__arg1@1-3' () cil managed 
    {
        // Method begins at RVA 0x2084
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-3'
        IL_0005: ret
    } // end of method _::'get__arg1@1-3'

    .method assembly specialname static 
        int32[] 'get__arg1@1-4' () cil managed 
    {
        // Method begins at RVA 0x208c
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-4'
        IL_0005: ret
    } // end of method _::'get__arg1@1-4'

    .method private specialname rtspecialname static 
        void .cctor () cil managed 
    {
        // Method begins at RVA 0x2094
        // Code size 13 (0xd)
        .maxstack 8

        IL_0000: ldc.i4.0
        IL_0001: stsfld int32 '<StartupCode$_>.$_'::init@
        IL_0006: ldsfld int32 '<StartupCode$_>.$_'::init@
        IL_000b: pop
        IL_000c: ret
    } // end of method _::.cctor

    // Properties
    .property class [FSharp.Core]Microsoft.FSharp.Core.Unit xs()
    {
        .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
            01 00 09 00 00 00 00 00
        )
        .get class [FSharp.Core]Microsoft.FSharp.Core.Unit _::get_xs()
    }
    .property int32[] _arg1@1()
    {
        .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
            01 00 09 00 00 00 00 00
        )
        .get int32[] _::get__arg1@1()
    }
    .property int32[] '_arg1@1-1'()
    {
        .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
            01 00 09 00 00 00 00 00
        )
        .get int32[] _::'get__arg1@1-1'()
    }
    .property int32[] '_arg1@1-2'()
    {
        .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
            01 00 09 00 00 00 00 00
        )
        .get int32[] _::'get__arg1@1-2'()
    }
    .property int32[] '_arg1@1-3'()
    {
        .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
            01 00 09 00 00 00 00 00
        )
        .get int32[] _::'get__arg1@1-3'()
    }
    .property int32[] '_arg1@1-4'()
    {
        .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
            01 00 09 00 00 00 00 00
        )
        .get int32[] _::'get__arg1@1-4'()
    }

} // end of class _

.class private auto ansi abstract sealed '<StartupCode$_>.$_'
    extends [System.Runtime]System.Object
{
    // Fields
    .field assembly static initonly class [FSharp.Core]Microsoft.FSharp.Core.Unit xs@5
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .field assembly static initonly int32[] _arg1@1
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .field assembly static initonly int32[] '_arg1@1-1'
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .field assembly static initonly int32[] '_arg1@1-2'
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .field assembly static initonly int32[] '_arg1@1-3'
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .field assembly static initonly int32[] '_arg1@1-4'
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .field assembly static int32 init@
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
        01 00 00 00 00 00 00 00
    )
    .custom instance void [System.Runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
        01 00 00 00
    )
    .custom instance void [System.Runtime]System.Diagnostics.DebuggerNonUserCodeAttribute::.ctor() = (
        01 00 00 00
    )

    // Methods
    .method private specialname rtspecialname static 
        void .cctor () cil managed 
    {
        // Method begins at RVA 0x20a4
        // Code size 122 (0x7a)
        .maxstack 5

        IL_0000: ldc.i4.s 10
        IL_0002: newarr [System.Runtime]System.Int32
        IL_0007: dup
        IL_0008: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771580@
        IL_000d: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
        IL_0012: stsfld int32[] '<StartupCode$_>.$_'::_arg1@1
        IL_0017: ldc.i4.s 10
        IL_0019: newarr [System.Runtime]System.Int32
        IL_001e: dup
        IL_001f: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771581@
        IL_0024: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
        IL_0029: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-1'
        IL_002e: ldc.i4.s 10
        IL_0030: newarr [System.Runtime]System.Int32
        IL_0035: dup
        IL_0036: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771582@
        IL_003b: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
        IL_0040: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-2'
        IL_0045: ldc.i4.s 10
        IL_0047: newarr [System.Runtime]System.Int32
        IL_004c: dup
        IL_004d: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771583@
        IL_0052: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
        IL_0057: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-3'
        IL_005c: ldc.i4.s 10
        IL_005e: newarr [System.Runtime]System.Int32
        IL_0063: dup
        IL_0064: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771584@
        IL_0069: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
        IL_006e: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-4'
        IL_0073: ldnull
        IL_0074: stsfld class [FSharp.Core]Microsoft.FSharp.Core.Unit '<StartupCode$_>.$_'::xs@5
        IL_0079: ret
    } // end of method $_::.cctor

} // end of class <StartupCode$_>.$_

.class private auto ansi abstract sealed beforefieldinit '<PrivateImplementationDetails$_>'
    extends [System.Runtime]System.Object
{
    // Nested Types
    .class nested assembly explicit ansi sealed beforefieldinit T771578_40Bytes@
        extends [System.Runtime]System.ValueType
    {
        .pack 0
        .size 40

    } // end of class T771578_40Bytes@


} // end of class <PrivateImplementationDetails$_>

It seems like the existing optimization could be improved so that it doesn't generate multiple copies of the blob, which would help both array literals and constant ranges.

If we want to be really safe, this could be a numeric optional setting with a default.

Hmm, I'm not sure if it would be worth making this configurable. If someone really wanted the compiler to make a dangerously big blob for them right now (and had a machine with enough RAM), they could already just hand-write or generate a giant array literal.

We could also just avoid this problem by skipping

[|1..5|][|1; 2; 3; 4; 5|]

and always doing

[|start..finish|]Array.init (finish - start + 1) ((+) start)

instead, which would still be a solid improvement.

@T-Gro
Copy link
Member

T-Gro commented Jan 24, 2024

@brianrourkeboll :

Thanks for diving into the detail.

A fair comparison might need more than just BDN if we should also measure time needed to load the .dll from disk (to factor in the increased size).

I know it is not a scientific conclusion, but I would prefer the threshold number to be set low enough to not be noticeable on the output size, e.g. 1kB limit.

RE: the inlining, would that be via compile time caching encountered (type,start,end) expressions and reusing their output BLOB ?

@brianrourkeboll
Copy link
Contributor Author

@TGro

A fair comparison might need more than just BDN if we should also measure time needed to load the .dll from disk (to factor in the increased size).

Yes, that's true.

I know it is not a scientific conclusion, but I would prefer the threshold number to be set low enough to not be noticeable on the output size, e.g. 1kB limit.

That seems reasonable. I'll set it low and see how that benchmarks against Array.init.

RE: the inlining, would that be via compile time caching encountered (type,start,end) expressions and reusing their output BLOB ?

Well it depends whether we want it to apply for the existing way array literals are handled as well (as in here):

if
elems.Length <= 5
|| not cenv.options.emitConstantArraysUsingStaticDataBlobs
|| (isEnumTy cenv.g elemTy)
then
GenNewArraySimple cenv cgbuf eenv (elems, elemTy, m) sequel
else
// Try to emit a constant byte-blob array
let elemsArray = Array.ofList elems

if
elemsArray
|> Array.forall (function
| Expr.Const(c, _, _) -> test c
| _ -> false)
then
let ilElemTy = GenType cenv m eenv.tyenv elemTy
GenConstArray cenv cgbuf eenv ilElemTy elemsArray (fun buf ->
function
| Expr.Const(c, _, _) -> write buf c
| _ -> failwith "unreachable")

Deduping constant ranges sounds simple enough, as you describe; to dedupe arbitrary array literals I guess you could use some kind of hash of the contents.

If we keep the const array size limit small enough in this PR, though, I think such deduping could probably be its own separate optimization.

@brianrourkeboll
Copy link
Contributor Author

brianrourkeboll commented Jan 25, 2024

I fixed the branching and updated the benchmarks in the description.

(Updated benchmarks repeated here)
| Method                     | Categories              | Mean            | Error         | StdDev         | Ratio | RatioSD | Gen0     | Gen1     | Gen2     | Allocated | Alloc Ratio |
|--------------------------- |------------------------ |----------------:|--------------:|---------------:|------:|--------:|---------:|---------:|---------:|----------:|------------:|
| Array_Old_10To1            | Array,[|10..1|]         |      23.6098 ns |     0.1240 ns |      0.1099 ns |  1.00 |    0.00 |   0.0076 |        - |        - |      96 B |        1.00 |
| Array_New_10To1            | Array,[|10..1|]         |       0.4375 ns |     0.0028 ns |      0.0026 ns |  0.02 |    0.00 |        - |        - |        - |         - |        0.00 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| Array_Old_1To10            | Array,[|1..10|]         |      59.2091 ns |     0.3471 ns |      0.3077 ns |  1.00 |    0.00 |   0.0261 |        - |        - |     328 B |        1.00 |
| Array_New_1To10            | Array,[|1..10|]         |       5.0770 ns |     0.0549 ns |      0.0429 ns |  0.09 |    0.00 |   0.0051 |        - |        - |      64 B |        0.20 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| Array_Old_1To50            | Array,[|1..50|]         |     164.8347 ns |     1.4401 ns |      1.2766 ns |  1.00 |    0.00 |   0.0732 |        - |        - |     920 B |        1.00 |
| Array_New_1To50            | Array,[|1..50|]         |       9.4800 ns |     0.0319 ns |      0.0283 ns |  0.06 |    0.00 |   0.0179 |        - |        - |     224 B |        0.24 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| Array_Old_1To256           | Array,[|1..256|]        |     430.2849 ns |     2.6394 ns |      2.3397 ns |  1.00 |    0.00 |   0.1817 |        - |        - |    2280 B |        1.00 |
| Array_New_1To256           | Array,[|1..256|]        |      41.7861 ns |     0.5105 ns |      0.4775 ns |  0.10 |    0.00 |   0.0835 |        - |        - |    1048 B |        0.46 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| Array_Old_1To257           | Array,[|1..257|]        |     564.4246 ns |     3.9136 ns |      3.4693 ns |  1.00 |    0.00 |   0.4301 |   0.0029 |        - |    5408 B |        1.00 |
| Array_New_1To257           | Array,[|1..257|]        |     115.5624 ns |     0.9958 ns |      0.8827 ns |  0.20 |    0.00 |   0.0842 |        - |        - |    1056 B |        0.20 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| Array_Old_Dynamic_1To65536 | Array,[|start..finish|] | 194,617.6697 ns | 1,341.7077 ns |  1,047.5171 ns |  1.00 |    0.00 | 124.7559 | 124.7559 | 124.7559 |  524754 B |        1.00 |
| Array_New_Dynamic_1To65536 | Array,[|start..finish|] |  82,301.7290 ns |   475.3916 ns |    421.4223 ns |  0.42 |    0.00 |  83.2520 |  83.2520 |  83.2520 |  262220 B |        0.50 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| List_Old_10To1             | List,[10..1]            |      16.3486 ns |     0.3685 ns |      0.4386 ns |  1.00 |    0.00 |   0.0076 |        - |        - |      96 B |        1.00 |
| List_New_10To1             | List,[10..1]            |       1.0736 ns |     0.0102 ns |      0.0080 ns |  0.07 |    0.00 |        - |        - |        - |         - |        0.00 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| List_Old_1To10             | List,[1..10]            |      53.4059 ns |     0.4100 ns |      0.3424 ns |  1.00 |    0.00 |   0.0318 |        - |        - |     400 B |        1.00 |
| List_New_1To10             | List,[1..10]            |      30.6254 ns |     0.2588 ns |      0.2161 ns |  0.57 |    0.00 |   0.0255 |        - |        - |     320 B |        0.80 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| List_Old_1To50             | List,[1..50]            |     210.8844 ns |     2.0126 ns |      1.6806 ns |  1.00 |    0.00 |   0.1338 |   0.0007 |        - |    1680 B |        1.00 |
| List_New_1To50             | List,[1..50]            |     147.7232 ns |     1.4591 ns |      1.3649 ns |  0.70 |    0.00 |   0.1273 |   0.0002 |        - |    1600 B |        0.95 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| List_Old_1To100            | List,[1..100]           |     387.9171 ns |     2.5653 ns |      2.2741 ns |  1.00 |    0.00 |   0.2613 |   0.0029 |        - |    3280 B |        1.00 |
| List_New_1To100            | List,[1..100]           |     304.6029 ns |     5.8287 ns |      5.1670 ns |  0.79 |    0.02 |   0.2546 |   0.0014 |        - |    3200 B |        0.98 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| List_Old_1To101            | List,[1..101]           |     412.2919 ns |     8.2508 ns |     20.3938 ns |  1.00 |    0.00 |   0.2637 |   0.0029 |        - |    3312 B |        1.00 |
| List_New_1To101            | List,[1..101]           |     334.5175 ns |     5.9638 ns |      9.6305 ns |  0.78 |    0.04 |   0.2575 |   0.0029 |        - |    3232 B |        0.98 |
|                            |                         |                 |               |                |       |         |          |          |          |           |             |
| List_Old_Dynamic_1To65536  | List,[start..finish]    | 473,499.7168 ns | 8,943.5261 ns | 10,299.3779 ns |  1.00 |    0.00 | 166.9922 | 154.2969 |        - | 2097232 B |        1.00 |
| List_New_Dynamic_1To65536  | List,[start..finish]    | 423,602.8193 ns | 8,419.1710 ns | 17,007.1407 ns |  0.89 |    0.03 | 166.9922 | 154.7852 |        - | 2097176 B |        1.00 |

You can see that there is a perf cliff at the byte size threshold for arrays (now 1,024 bytes, so 256 int32s), but the fallback to Array.init is still a significant improvement over the status quo.

Looks like the branching still needs fixing...

@brianrourkeboll brianrourkeboll force-pushed the collection-lowering branch 2 times, most recently from 9b26084 to 6b87cfd Compare January 25, 2024 22:44
@brianrourkeboll brianrourkeboll marked this pull request as draft January 26, 2024 01:45
@brianrourkeboll brianrourkeboll force-pushed the collection-lowering branch 2 times, most recently from dadcf90 to f176fd0 Compare January 26, 2024 03:04
* Use branchless `max` in call to `Array.init`/`List.init`.
  Getting the sequel to be appended to each branch correctly in all
  cases looked like a nontrivial undertaking.

* Lower the size thresholds for const ranges (temporarily?).
@T-Gro
Copy link
Member

T-Gro commented Jan 26, 2024

Thanks for the extensive summary @brianrourkeboll .
I think the new limits are sensible as they are implemented now, approving.

@brianrourkeboll
Copy link
Contributor Author

Well @T-Gro even though I was able to get the branching IL (for if start <= finish then Array.init …… else [||]) to be correct in some cases in 0136a96, there were other cases where the branching wasn't getting woven into the outer context correctly (e.g., depending on whether the result of each branch should be returned, assigned to a local variable or field, etc.) and was resulting in invalid IL.

It also appears that even with very low limits for constant array and list sizes (I tried 40 bytes for arrays and 10 elements for lists), the compiler consistently ran into OOM exceptions in CI:

error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown. [D:\a\_work\1\s\tests\FSharp.Core.UnitTests\FSharp.Core.UnitTests.fsproj::TargetFramework=net472]
error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown. [D:\a\_work\1\s\tests\FSharp.Core.UnitTests\FSharp.Core.UnitTests.fsproj::TargetFramework=net8.0]

There are a fair amount of list/array range initialization expressions with constant args in those tests, but I don't think I understand the exact cause of the OOM. I don't know whether the culprit was simply too many Expr.Consts, something downstream of that in the compiler, the resulting assembly size, or something else. In the latest commit, I commented out the emitting of constant arrays/lists altogether, and I no longer see OOM exceptions.

I also removed the if start <= finish then Array.init …… else [||] branching, so that it now unconditionally calls Array.init/List.init, but with a branchless version of max range 0 to ensure that a negative count/length is never passed in. I'm sure it's possible to emit correct code for the branching version for all contexts, but it didn't seem like it would be easy.

I think I'll run some new benchmarks with only the [||]/[] optimization for const empty ranges and the unconditional Array.init/List.init call for all other ranges, and then update the baselines. We can revisit the const array/list optimizations for smaller constant ranges or the branching for dynamic/bigger ranges later if we want.

@brianrourkeboll
Copy link
Contributor Author

@T-Gro: I think I ended up figuring everything out with the branching. I needed to manually bind the start and/or finish expressions to locals if they weren't already vals or consts.

@brianrourkeboll
Copy link
Contributor Author

@T-Gro sorry to spam your notifications, but I went ahead and extended this PR to work for all built-in integral types, as well as for [|start..step..finish|]/[start..step..finish], in brianrourkeboll#1. That could be merged into this, or it could be its own separate PR after this. Let me know which you prefer.

Otherwise, I believe this PR is ready.

@brianrourkeboll
Copy link
Contributor Author

...Actually, it would probably make more sense just to add something like the following to TypedTreeOps.fs that lowers integral ranges to fast while-loops:

val mkOptimizedRangeLoop :
    g         : TcGlobals ->
    rangeTy   : TType     ->
    overallTy : TType     ->
    start     : Expr      ->
    step      : Expr      ->
    finish    : Expr      ->
    body      : Expr      -> Expr

That could then be used to optimize:

  • standalone for … in start..finish do …, for … in start..step..finish do … — such loops are currently only optimized if the type is int32 and the step is 1 or −1
  • [start..finish]/[start..step..finish]/[|start..finish|]/[|start..step..finish|]
  • [for … in start..step..finish -> …], [|for … in start..step..finish -> …|]

I think I'll close this for now and open a new PR for the alternative approach when I have time.

@psfinaki
Copy link
Member

psfinaki commented Feb 1, 2024

I was just going to review this :) But yeah, the new approach also seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants