-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better integral range lowering: start..finish
, start..step..finish
#16650
Conversation
* Lower for-loops over integral ranges to fast while-loops for all built-in integral types. * Lower `[start..finish]`, `[start..step..finish]`, `[|start..finish|]`, `[|start..step..finish|]` to fast integral while-loop initializers.
❗ Release notes required
|
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Ah. I see what the problem is. I'll put this back in draft mode until it's fixed. |
tests/fsharp/Compiler/Language/ComputedCollectionRangeLoweringTests.Int32.fs
Outdated
Show resolved
Hide resolved
Good question, I think running the tests both ways is important to ensure that we don't inadvertently regress the reachability of the values. But, whole il file comparison makes the tests vulnerable to lots of other ilgen modifications. The reason I provided both was because, I wanted to be able to ensure that the visibility feature, which was about as risky as any I have attempted in terms of "possible breaking changes" produced reviewable changes and that those changes were explainable. Now that the realsig feature is in, perhaps less proving is necessary, and mere regression proofing is the way to go. Perhaps a smoke test with it on and off and the remainder with it off. We will have to remember to turn it to on when we enable realsig+ by default :-) |
(There were 3 failures, but I suspect all 3 of them were for flaky reasons - retrying to see) |
I have a few more improvements ready —
— but I guess I'll wait to open a separate PR for those to avoid making this one any bigger than it already is... |
@brianrourkeboll yeah feel free to open followups - these look promising! |
Description
This subsumes #16577 (and #13573).
Fixes #938.
Fixes #9548.
TL;DR
5 – 6× speedup (for integral types other than
int32
, andint32
whenstep
is not a constant1
or-1
)2.5 – 8× speedup
1.25 – 5× speedup
Before
The compiler currently only optimizes
for
-loops over integral ranges when:int
/int32
.1
or-1
.for
-loops over ranges for other integral types (int64
,uint32
, etc.), or when the step is not1
or-1
, use a relatively slow,IEnumerable<_>
-based implementation.List and array comprehension expressions initialized by an integral range also use the slow
IEnumerable<_>
-based implementation, and, in the case of arrays, incur additional unnecessary array allocations and copying.After
With this PR, the compiler now lowers
for
-loops for all built-in integral types — with any arbitrary step — down to fast integralwhile
-loops.This optimization is in turn applied to the lowering of computed list collection expressions where an integral range is used to initialize the list, as in
[start..finish]
and[start..step..finish]
; a similar technique is used for arrays in[|start..finish|]
and[|start..step..finish|]
, where the total size of the array is computed from thestart
,step
, andfinish
, at compile-time if possible. Such initialization expressions are now significantly faster and allocate significantly less, especially for arrays and for smaller lists.for
-loops over integral ranges of non-int32
-types, and/or with arbitrary steps, are about 5 – 6 times as fast and no longer allocate at allThese optimizations are not visible in quotations.
The following are the approximate high-level transformations applied to
for
-loops and list and array comprehensions over integral ranges.The basic idea is to precompute the iteration count like this:
start..finish
(start..step..finish
This lets us handle any potential overflows just once, instead of in each iteration of the loop. The loop then becomes simply:
If the range type is already 64-bit, or might be (for native ints), we have nothing to widen the count to — so we check whether computing the full count would overflow, and, if it would, we start at 0 and loop until we hit 0 again, indicating that we have wrapped around.
Additional optimizations are applied for various scenarios:
start
,step
, andfinish
are constants, we can compute the count at build-time.step
is constant, or if the integral type is unsigned, we can simplify the code we emit for computing the count at runtime: we potentially don't need to emit code to check whetherExamples:
for
-loops overint64
for
-loops overint16
for
-loops overuint32
int32
rangesuint64
ranges (SharpLab doesn't always import types from FSharp.Core correctly)for
-loopsfor loopVar in start..finish do …
→
for loopVar in start..step..finish do …
→
Lists
[start..finish]
→
[start..step..finish]
→
Arrays
[|start..finish|]
→
[|start..step..finish|]
→
Benchmarks
Source:
for
-loopsLists & arrays
Checklist
Design/implementation notes
In theory, these optimizations could subsume the existing optimization that applies only to
for
-loops overint32
ranges with constant1
or-1
steps. The performance should be comparable, if not identical (and is, in a quick spot-check). I have not replaced or removed the existing transformation in this PR. Note that, for compatibility's sake, if we ever did want to switch int ranges over to use this codegen, we'd need to do it only whenOptimizeForExpressionOptions = OptimizeAllForExpressions
, since the existing lowering toTOp.IntegerForLoop
is visible in quotations.Alternatively, the existing
TOp.IntegerForLoop
construct itself could have been extended to support arbitrary steps and integer types, although that may have compatibility implications since changes to or new usage ofTOp.IntegerForLoop
would be visible in quotations. Compare Optimize integer for loop code gen #13573.The emitted IL looks reasonably good to me, but additional optimizations could be made in certain cases:
The returns begin to diminish relatively quickly relative to the complexity they add to the compiler, though.
I'm emitting IL instructions for arithmetic, comparison, etc., rather than their library equivalents. That's mainly because the call to
LowerComputedCollections.LowerComputedListOrArrayExpr
is made in IlxGen.fs, which happens after type-checking and optimization, which means that calls to library-level operators won't work. If we extended theTOp.IntegerForLoop
construct instead, though, perhaps we could defer the lower-level codegen to IlxGen.fs.Rather than directly emitting loops for collection initialization, we could instead just emit code to compute the count, and then call
Array.init
/List.init
as in Better lowering of[start..finish]
&[|start..finish|]
#16577. This wouldn't really save on code size, though, since closure types would end up being emitted instead. If we really wanted to minimize code size while keeping the performance boost, we could add some kind of new library method for fast collection initialization from ranges and just emit calls to that. I'm not exactly sure where that would go, though, or how it would be exposed.