-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better lowering of [start..finish]
& [|start..finish|]
#16577
Conversation
❗ Release notes required
|
Could you please show the impact on the .dll size for the If we want to be really safe, this could be a numeric optional setting with a default. |
Second question would be also with regards to codesize in case of inlining a function that produces |
The added code size in the DLL should be the size of the blob, like: fsharp/tests/fsharp/Compiler/Language/ComputedCollectionLoweringTests.fs Lines 157 to 160 in 4c3da4a
for
So yes, it could be quite large if the constant range is large (especially if we added support for I wonder where the performance curves for Array.init (finish - start + 1) ((+) start) and var arr = new int[bigNum];
System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray(arr, blobHandle); intersect. If it's for a small enough
Interesting. Given the behavior of the existing optimization for array literals, it looks like the code size would increase linearly with usage in that case as well: open System
let inline f () = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]
let xs =
ignore (f ())
ignore (f ())
ignore (f ())
ignore (f ())
ignore (f ()) Generated IL:.assembly _
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.FSharpInterfaceDataVersionAttribute::.ctor(int32, int32, int32) = (
01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
)
.hash algorithm 0x00008004 // SHA1
.ver 0:0:0:0
}
.class private auto ansi '<Module>'
extends [System.Runtime]System.Object
{
} // end of class <Module>
.class public auto ansi abstract sealed _
extends [System.Runtime]System.Object
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 07 00 00 00 00 00
)
// Fields
.field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771579@ at I_00002B18
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.data cil I_00002B18 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0a 00 00 00
)
.field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771580@ at I_00002B40
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.data cil I_00002B40 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0a 00 00 00
)
.field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771581@ at I_00002B68
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.data cil I_00002B68 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0a 00 00 00
)
.field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771582@ at I_00002B90
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.data cil I_00002B90 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0a 00 00 00
)
.field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771583@ at I_00002BB8
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.data cil I_00002BB8 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0a 00 00 00
)
.field assembly static valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ field771584@ at I_00002BE0
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.data cil I_00002BE0 = bytearray (
01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
09 00 00 00 0a 00 00 00
)
// Methods
.method public static
int32[] f () cil managed
{
// Method begins at RVA 0x2050
// Code size 19 (0x13)
.maxstack 8
IL_0000: ldc.i4.s 10
IL_0002: newarr [System.Runtime]System.Int32
IL_0007: dup
IL_0008: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771579@
IL_000d: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_0012: ret
} // end of method _::f
.method public specialname static
class [FSharp.Core]Microsoft.FSharp.Core.Unit get_xs () cil managed
{
// Method begins at RVA 0x2064
// Code size 6 (0x6)
.maxstack 8
IL_0000: ldsfld class [FSharp.Core]Microsoft.FSharp.Core.Unit '<StartupCode$_>.$_'::xs@5
IL_0005: ret
} // end of method _::get_xs
.method assembly specialname static
int32[] get__arg1@1 () cil managed
{
// Method begins at RVA 0x206c
// Code size 6 (0x6)
.maxstack 8
IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::_arg1@1
IL_0005: ret
} // end of method _::get__arg1@1
.method assembly specialname static
int32[] 'get__arg1@1-1' () cil managed
{
// Method begins at RVA 0x2074
// Code size 6 (0x6)
.maxstack 8
IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-1'
IL_0005: ret
} // end of method _::'get__arg1@1-1'
.method assembly specialname static
int32[] 'get__arg1@1-2' () cil managed
{
// Method begins at RVA 0x207c
// Code size 6 (0x6)
.maxstack 8
IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-2'
IL_0005: ret
} // end of method _::'get__arg1@1-2'
.method assembly specialname static
int32[] 'get__arg1@1-3' () cil managed
{
// Method begins at RVA 0x2084
// Code size 6 (0x6)
.maxstack 8
IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-3'
IL_0005: ret
} // end of method _::'get__arg1@1-3'
.method assembly specialname static
int32[] 'get__arg1@1-4' () cil managed
{
// Method begins at RVA 0x208c
// Code size 6 (0x6)
.maxstack 8
IL_0000: ldsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-4'
IL_0005: ret
} // end of method _::'get__arg1@1-4'
.method private specialname rtspecialname static
void .cctor () cil managed
{
// Method begins at RVA 0x2094
// Code size 13 (0xd)
.maxstack 8
IL_0000: ldc.i4.0
IL_0001: stsfld int32 '<StartupCode$_>.$_'::init@
IL_0006: ldsfld int32 '<StartupCode$_>.$_'::init@
IL_000b: pop
IL_000c: ret
} // end of method _::.cctor
// Properties
.property class [FSharp.Core]Microsoft.FSharp.Core.Unit xs()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 09 00 00 00 00 00
)
.get class [FSharp.Core]Microsoft.FSharp.Core.Unit _::get_xs()
}
.property int32[] _arg1@1()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 09 00 00 00 00 00
)
.get int32[] _::get__arg1@1()
}
.property int32[] '_arg1@1-1'()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 09 00 00 00 00 00
)
.get int32[] _::'get__arg1@1-1'()
}
.property int32[] '_arg1@1-2'()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 09 00 00 00 00 00
)
.get int32[] _::'get__arg1@1-2'()
}
.property int32[] '_arg1@1-3'()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 09 00 00 00 00 00
)
.get int32[] _::'get__arg1@1-3'()
}
.property int32[] '_arg1@1-4'()
{
.custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationMappingAttribute::.ctor(valuetype [FSharp.Core]Microsoft.FSharp.Core.SourceConstructFlags) = (
01 00 09 00 00 00 00 00
)
.get int32[] _::'get__arg1@1-4'()
}
} // end of class _
.class private auto ansi abstract sealed '<StartupCode$_>.$_'
extends [System.Runtime]System.Object
{
// Fields
.field assembly static initonly class [FSharp.Core]Microsoft.FSharp.Core.Unit xs@5
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.field assembly static initonly int32[] _arg1@1
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.field assembly static initonly int32[] '_arg1@1-1'
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.field assembly static initonly int32[] '_arg1@1-2'
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.field assembly static initonly int32[] '_arg1@1-3'
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.field assembly static initonly int32[] '_arg1@1-4'
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.field assembly static int32 init@
.custom instance void [System.Runtime]System.Diagnostics.DebuggerBrowsableAttribute::.ctor(valuetype [System.Runtime]System.Diagnostics.DebuggerBrowsableState) = (
01 00 00 00 00 00 00 00
)
.custom instance void [System.Runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
01 00 00 00
)
.custom instance void [System.Runtime]System.Diagnostics.DebuggerNonUserCodeAttribute::.ctor() = (
01 00 00 00
)
// Methods
.method private specialname rtspecialname static
void .cctor () cil managed
{
// Method begins at RVA 0x20a4
// Code size 122 (0x7a)
.maxstack 5
IL_0000: ldc.i4.s 10
IL_0002: newarr [System.Runtime]System.Int32
IL_0007: dup
IL_0008: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771580@
IL_000d: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_0012: stsfld int32[] '<StartupCode$_>.$_'::_arg1@1
IL_0017: ldc.i4.s 10
IL_0019: newarr [System.Runtime]System.Int32
IL_001e: dup
IL_001f: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771581@
IL_0024: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_0029: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-1'
IL_002e: ldc.i4.s 10
IL_0030: newarr [System.Runtime]System.Int32
IL_0035: dup
IL_0036: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771582@
IL_003b: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_0040: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-2'
IL_0045: ldc.i4.s 10
IL_0047: newarr [System.Runtime]System.Int32
IL_004c: dup
IL_004d: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771583@
IL_0052: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_0057: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-3'
IL_005c: ldc.i4.s 10
IL_005e: newarr [System.Runtime]System.Int32
IL_0063: dup
IL_0064: ldtoken field valuetype '<PrivateImplementationDetails$_>'/T771578_40Bytes@ _::field771584@
IL_0069: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_006e: stsfld int32[] '<StartupCode$_>.$_'::'_arg1@1-4'
IL_0073: ldnull
IL_0074: stsfld class [FSharp.Core]Microsoft.FSharp.Core.Unit '<StartupCode$_>.$_'::xs@5
IL_0079: ret
} // end of method $_::.cctor
} // end of class <StartupCode$_>.$_
.class private auto ansi abstract sealed beforefieldinit '<PrivateImplementationDetails$_>'
extends [System.Runtime]System.Object
{
// Nested Types
.class nested assembly explicit ansi sealed beforefieldinit T771578_40Bytes@
extends [System.Runtime]System.ValueType
{
.pack 0
.size 40
} // end of class T771578_40Bytes@
} // end of class <PrivateImplementationDetails$_> It seems like the existing optimization could be improved so that it doesn't generate multiple copies of the blob, which would help both array literals and constant ranges.
Hmm, I'm not sure if it would be worth making this configurable. If someone really wanted the compiler to make a dangerously big blob for them right now (and had a machine with enough RAM), they could already just hand-write or generate a giant array literal. We could also just avoid this problem by skipping
and always doing
instead, which would still be a solid improvement. |
Thanks for diving into the detail. A fair comparison might need more than just BDN if we should also measure time needed to load the .dll from disk (to factor in the increased size). I know it is not a scientific conclusion, but I would prefer the threshold number to be set low enough to not be noticeable on the output size, e.g. 1kB limit. RE: the inlining, would that be via compile time caching encountered (type,start,end) expressions and reusing their output BLOB ? |
Yes, that's true.
That seems reasonable. I'll set it low and see how that benchmarks against
Well it depends whether we want it to apply for the existing way array literals are handled as well (as in here): fsharp/src/Compiler/CodeGen/IlxGen.fs Lines 3575 to 3583 in 1b50168
fsharp/src/Compiler/CodeGen/IlxGen.fs Lines 3672 to 3683 in 1b50168
Deduping constant ranges sounds simple enough, as you describe; to dedupe arbitrary array literals I guess you could use some kind of hash of the contents. If we keep the const array size limit small enough in this PR, though, I think such deduping could probably be its own separate optimization. |
tests/fsharp/Compiler/Language/ComputedCollectionLoweringTests.fs
Outdated
Show resolved
Hide resolved
tests/fsharp/Compiler/Language/ComputedCollectionLoweringTests.fs
Outdated
Show resolved
Hide resolved
I (Updated benchmarks repeated here)| Method | Categories | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|--------------------------- |------------------------ |----------------:|--------------:|---------------:|------:|--------:|---------:|---------:|---------:|----------:|------------:|
| Array_Old_10To1 | Array,[|10..1|] | 23.6098 ns | 0.1240 ns | 0.1099 ns | 1.00 | 0.00 | 0.0076 | - | - | 96 B | 1.00 |
| Array_New_10To1 | Array,[|10..1|] | 0.4375 ns | 0.0028 ns | 0.0026 ns | 0.02 | 0.00 | - | - | - | - | 0.00 |
| | | | | | | | | | | | |
| Array_Old_1To10 | Array,[|1..10|] | 59.2091 ns | 0.3471 ns | 0.3077 ns | 1.00 | 0.00 | 0.0261 | - | - | 328 B | 1.00 |
| Array_New_1To10 | Array,[|1..10|] | 5.0770 ns | 0.0549 ns | 0.0429 ns | 0.09 | 0.00 | 0.0051 | - | - | 64 B | 0.20 |
| | | | | | | | | | | | |
| Array_Old_1To50 | Array,[|1..50|] | 164.8347 ns | 1.4401 ns | 1.2766 ns | 1.00 | 0.00 | 0.0732 | - | - | 920 B | 1.00 |
| Array_New_1To50 | Array,[|1..50|] | 9.4800 ns | 0.0319 ns | 0.0283 ns | 0.06 | 0.00 | 0.0179 | - | - | 224 B | 0.24 |
| | | | | | | | | | | | |
| Array_Old_1To256 | Array,[|1..256|] | 430.2849 ns | 2.6394 ns | 2.3397 ns | 1.00 | 0.00 | 0.1817 | - | - | 2280 B | 1.00 |
| Array_New_1To256 | Array,[|1..256|] | 41.7861 ns | 0.5105 ns | 0.4775 ns | 0.10 | 0.00 | 0.0835 | - | - | 1048 B | 0.46 |
| | | | | | | | | | | | |
| Array_Old_1To257 | Array,[|1..257|] | 564.4246 ns | 3.9136 ns | 3.4693 ns | 1.00 | 0.00 | 0.4301 | 0.0029 | - | 5408 B | 1.00 |
| Array_New_1To257 | Array,[|1..257|] | 115.5624 ns | 0.9958 ns | 0.8827 ns | 0.20 | 0.00 | 0.0842 | - | - | 1056 B | 0.20 |
| | | | | | | | | | | | |
| Array_Old_Dynamic_1To65536 | Array,[|start..finish|] | 194,617.6697 ns | 1,341.7077 ns | 1,047.5171 ns | 1.00 | 0.00 | 124.7559 | 124.7559 | 124.7559 | 524754 B | 1.00 |
| Array_New_Dynamic_1To65536 | Array,[|start..finish|] | 82,301.7290 ns | 475.3916 ns | 421.4223 ns | 0.42 | 0.00 | 83.2520 | 83.2520 | 83.2520 | 262220 B | 0.50 |
| | | | | | | | | | | | |
| List_Old_10To1 | List,[10..1] | 16.3486 ns | 0.3685 ns | 0.4386 ns | 1.00 | 0.00 | 0.0076 | - | - | 96 B | 1.00 |
| List_New_10To1 | List,[10..1] | 1.0736 ns | 0.0102 ns | 0.0080 ns | 0.07 | 0.00 | - | - | - | - | 0.00 |
| | | | | | | | | | | | |
| List_Old_1To10 | List,[1..10] | 53.4059 ns | 0.4100 ns | 0.3424 ns | 1.00 | 0.00 | 0.0318 | - | - | 400 B | 1.00 |
| List_New_1To10 | List,[1..10] | 30.6254 ns | 0.2588 ns | 0.2161 ns | 0.57 | 0.00 | 0.0255 | - | - | 320 B | 0.80 |
| | | | | | | | | | | | |
| List_Old_1To50 | List,[1..50] | 210.8844 ns | 2.0126 ns | 1.6806 ns | 1.00 | 0.00 | 0.1338 | 0.0007 | - | 1680 B | 1.00 |
| List_New_1To50 | List,[1..50] | 147.7232 ns | 1.4591 ns | 1.3649 ns | 0.70 | 0.00 | 0.1273 | 0.0002 | - | 1600 B | 0.95 |
| | | | | | | | | | | | |
| List_Old_1To100 | List,[1..100] | 387.9171 ns | 2.5653 ns | 2.2741 ns | 1.00 | 0.00 | 0.2613 | 0.0029 | - | 3280 B | 1.00 |
| List_New_1To100 | List,[1..100] | 304.6029 ns | 5.8287 ns | 5.1670 ns | 0.79 | 0.02 | 0.2546 | 0.0014 | - | 3200 B | 0.98 |
| | | | | | | | | | | | |
| List_Old_1To101 | List,[1..101] | 412.2919 ns | 8.2508 ns | 20.3938 ns | 1.00 | 0.00 | 0.2637 | 0.0029 | - | 3312 B | 1.00 |
| List_New_1To101 | List,[1..101] | 334.5175 ns | 5.9638 ns | 9.6305 ns | 0.78 | 0.04 | 0.2575 | 0.0029 | - | 3232 B | 0.98 |
| | | | | | | | | | | | |
| List_Old_Dynamic_1To65536 | List,[start..finish] | 473,499.7168 ns | 8,943.5261 ns | 10,299.3779 ns | 1.00 | 0.00 | 166.9922 | 154.2969 | - | 2097232 B | 1.00 |
| List_New_Dynamic_1To65536 | List,[start..finish] | 423,602.8193 ns | 8,419.1710 ns | 17,007.1407 ns | 0.89 | 0.03 | 166.9922 | 154.7852 | - | 2097176 B | 1.00 | You can see that there is a perf cliff at the byte size threshold for arrays (now 1,024 bytes, so 256 int32s), but the fallback to Looks like the branching still needs fixing... |
9b26084
to
6b87cfd
Compare
dadcf90
to
f176fd0
Compare
* Use branchless `max` in call to `Array.init`/`List.init`. Getting the sequel to be appended to each branch correctly in all cases looked like a nontrivial undertaking. * Lower the size thresholds for const ranges (temporarily?).
f176fd0
to
283ad75
Compare
Thanks for the extensive summary @brianrourkeboll . |
Well @T-Gro even though I was able to get the branching IL (for It also appears that even with very low limits for constant array and list sizes (I tried 40 bytes for arrays and 10 elements for lists), the compiler consistently ran into OOM exceptions in CI:
There are a fair amount of list/array range initialization expressions with constant args in those tests, but I don't think I understand the exact cause of the OOM. I don't know whether the culprit was simply too many I also removed the I think I'll run some new benchmarks with only the |
@T-Gro: I think I ended up figuring everything out with the branching. I needed to manually bind the |
@T-Gro sorry to spam your notifications, but I went ahead and extended this PR to work for all built-in integral types, as well as for Otherwise, I believe this PR is ready. |
...Actually, it would probably make more sense just to add something like the following to TypedTreeOps.fs that lowers integral ranges to fast while-loops: val mkOptimizedRangeLoop :
g : TcGlobals ->
rangeTy : TType ->
overallTy : TType ->
start : Expr ->
step : Expr ->
finish : Expr ->
body : Expr -> Expr That could then be used to optimize:
I think I'll close this for now and open a new PR for the alternative approach when I have time. |
I was just going to review this :) But yeah, the new approach also seems reasonable. |
Description
This is an experimental take at better lowering of some simple computed collection expression forms; I was mostly just curious how hard it would be. They do seem to work, though, and they're straightforward enough that they probably don't need a lang suggestion/RFC... (They aren't visible in quotations, for example.)
New lowerings:
Computed lists:
For constant$start$ and $finish$ when $start > finish$ :
[5..1]
→[]
.Generated IL
For constant$start$ and $finish$ when $start < finish$ :
[1..5]
→List.init (5 - 1 + 1) ((+) 1)
→List.init 5 ((+) 1)
.Generated IL
For dynamic$start$ and $finish$ :
[start..finish]
→if finish < start then [] else List.init (finish - start + 1) ((+) start)
.Generated IL
For dynamic$start$ and $finish$ where $start$ and/or $finish$ are not constants or already bound to a value (field, local variable, etc.):
Generated IL
Computed arrays:
For constant$start$ and $finish$ when $start > finish$ :
[|5..1|]
→[||]
.Generated IL
For constant$start$ and $finish$ when $start < finish$ :
[|1..5|]
→Array.init (5 - 1 + 1) ((+) 1)
→Array.init 5 ((+) 1)
.Generated IL
For dynamic$start$ and $finish$ :
[|start..finish|]
→if finish < start then [||] else Array.init (finish - start + 1) ((+) start)
.Generated IL
For dynamic$start$ and $finish$ where $start$ and/or $finish$ are not constants or already bound to a value (field, local variable, etc.):
Generated IL
Original questions
Questions
Should there be a size limit for const range unrolling? Probably. E.g., should
[|1..65_536|]
be lowered to a blob? What about[|-2147483648..2147483647|]
? Does C# have a limit for theirReadOnlySpan<byte> Xs => [1, 2, 3];
, which similarly generates a blob? In C#'s case, though, a developer would need to manually enumerate a huge number of elements in source, since C# doesn't have a range shorthand like[-2147483648..2147483647]
(just like the existing optimization in F# for array literals, since the range shorthand didn't affect that before).Basically, not having a size limit risks the compiler using up massive amounts of memory at build time (and generating potentially gigantic assemblies), while having one implies a hidden runtime performance cliff (although falling back to
Array.init
closes the gap pretty closely).Additional potential optimizations
initializer
for the call toArray.init
/List.init
. (The inlining is not happening becauseLowerComputedListOrArrayExpr
is called in IlxGen.fs, after most optimization.)List.init
/Array.init
at all; just manually emit for/while-loops, since we don't need the non-negative check forlength
/count
.int64
,byte
, etc.).[start..step..finish]
→let count = (finish - start) / step + 1 in if count = 0 then [] else List.init count ((*) step >> (+) start)
).[for … in … -> …]
and[|for … in … -> …|]
:[for n in 1..5 -> f n]
→[f 1; f 2; f 3; f 4; f 5]
.[for n in start..finish -> f n]
→List.init (finish - start + 1) ((+) start >> f)
.[for x in xs -> f x]
→List.map f xs
(?) whenxs
is a list.[|for n in 1..5 -> f n|]
→[|f 1; f 2; f 3; f 4; f 5|]
.[|for n in start..finish -> f n|]
→Array.init (finish - start + 1) ((+) start >> f)
.[|for x in xs -> f x|]
→Array.map f xs
(?) whenxs
is an array.Benchmarks
The difference is much more dramatic for arrays than for lists, but the difference for lists can still be significant, especially for smaller ranges.
Checklist