-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling causes OutOfMemoryException and IDE unresponsive/crash when pattern matching against record with many nested DU fields #4691
Comments
@cmeeren Must it be an (Is it reproducable with a simple |
Not reproducible with This all seems to fit with the SO answer I linked to, which commented that when pattern matching, the compiler sometimes produces a decision tree that grows exponentially with the number of cases. |
Type inference also exhibits similar symptoms, when you compose a large
expression you can cause a stack overflow (unless you add annotations )
…On Fri, 6 Apr 2018 at 12:18, Christer van der Meeren < ***@***.***> wrote:
Not reproducible with int option. Furthermore, if you match against Some i
instead of Some (Some i), it also works fine.
This all seems to fit with the SO answer I linked to, which commented that
when pattern matching, the compiler sometimes produces a decision tree that
grows exponentially with the number of cases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4691 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAj7yiPjNoF7239wqxi5l_tYki61Pfh4ks5tl08PgaJpZM4TJi8j>
.
|
What happens here, if you look at the compiled code, is that it gets compiled into nested if-statements for situations that are too difficult for the compiler to rewrite more efficiently (currently). First I ran some numbers with your code as example (but Vitaliy's code from the SO question probably has the same characteristic), scenario: just this code in a single .NET Framework project, compiling for Release Build, but I assume .NET Standard and .NET Core would give the same results: module Bloated =
type ManyFields =
{
Field1: int option option // 10kB
Field2: int option option
Field3: int option option
Field4: int option option // 12kB
Field5: int option option
Field6: int option option
Field7: int option option
Field8: int option option // 24kB
Field9: int option option // 36kB
Field10: int option option // 60kB
Field11: int option option // 106kB
Field12: int option option // 200kB
Field13: int option option // 387kB, 2s
Field14: int option option // 761kB, 3.5s
Field15: int option option // 1509kB, 7s
Field16: int option option // 3005kB, 16s
Field17: int option option // 6MB, 35s -- ~800MB
Field18: int option option // 12MB, 1m -- ~1200MB
Field19: int option option // probably 24MB, -- after ~1500MB BOOM! CRASH!
}
let getFirstField manyFields =
match manyFields with
| { Field1 = Some (Some i) } -> Some i
| { Field2 = Some (Some i) } -> Some i
| { Field3 = Some (Some i) } -> Some i
| { Field4 = Some (Some i) } -> Some i
| { Field5 = Some (Some i) } -> Some i
| { Field6 = Some (Some i) } -> Some i
| { Field7 = Some (Some i) } -> Some i
| { Field8 = Some (Some i) } -> Some i
| { Field9 = Some (Some i) } -> Some i
| { Field10 = Some (Some i) } -> Some i
| { Field11 = Some (Some i) } -> Some i
| { Field12 = Some (Some i) } -> Some i
| { Field13 = Some (Some i) } -> Some i
| { Field14 = Some (Some i) } -> Some i
| { Field15 = Some (Some i) } -> Some i
| { Field16 = Some (Some i) } -> Some i
| { Field17 = Some (Some i) } -> Some i
| { Field18 = Some (Some i) } -> Some i
| { Field19 = Some (Some i) } -> Some i
| _ -> None Since I have 48GB available, the OutOfMemoryException must be caused by some 32-bit or other limit. After each change, VS's memory also spiked, but then calmed down after a minute or so to a steady 1GB. The numbers clearly show a correlation between build time, size and memory usage, each growing exponentially. This suggests that the same algorithm is at fault here that @tpetricek analyzed in that SO question. If I take a look at the decompiled C# code for only 3 fields (otherwise it gets unreadable), it quickly becomes clear what's happening: public static FSharpOption<int> getFirstField(ManyFields manyFields)
{
FSharpOption<FSharpOption<int>> option;
int num;
FSharpOption<FSharpOption<int>> option2;
if (manyFields.Field1@ == null)
{
if (manyFields.Field2@ != null)
{
option = manyFields.Field2@;
if (option.get_Value() != null)
{
num = option.get_Value().get_Value();
goto Label_0050;
}
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
option2 = manyFields.Field3@;
if (option2.get_Value() == null)
{
goto Label_0081;
}
num = option2.get_Value().get_Value();
}
else
{
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
option = manyFields.Field3@;
if (option.get_Value() == null)
{
goto Label_0081;
}
num = option.get_Value().get_Value();
}
goto Label_007A;
}
option = manyFields.Field1@;
if (option.get_Value() != null)
{
return FSharpOption<int>.Some(option.get_Value().get_Value());
}
if (manyFields.Field2@ == null)
{
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
option2 = manyFields.Field3@;
if (option2.get_Value() == null)
{
goto Label_0081;
}
num = option2.get_Value().get_Value();
goto Label_007A;
}
option2 = manyFields.Field2@;
if (option2.get_Value() == null)
{
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
if (option3.get_Value() == null)
{
goto Label_0081;
}
num = option3.get_Value().get_Value();
goto Label_007A;
}
num = option2.get_Value().get_Value();
Label_0050:
return FSharpOption<int>.Some(num);
Label_007A:
return FSharpOption<int>.Some(num);
Label_0081:
return null;
} In other words, it looks like the compiler cannot ascertain that if a first match succeeds partially, that other parts need no tests anymore, hence the recursive repetition of all cases under each main if-branch. In Vitaliy's question there was something to say for the way the compiler treats this, since it was about interface matching, and any class could have multiple interfaces (which doesn't mean it cannot be optimized, but it is harder). But in this case, that shouldn't be necessary at all. I remembered this situation and I have some places where I have put in some exclamation marks "Do not simplify this code from nested patterns, it creates bloated IL!!!". Though it never occurred to me that the OOM is related. Furthermore, we sometimes have mysterious crashes of Visual Studio simply disappearing, or FCS crashing behind the scenes. It is very well possible that some of these issues are related to this. One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem. |
Yeah vs is 32bit so you will only really have 2GBish to play with
…On Fri, 6 Apr 2018 at 14:33, Abel Braaksma ***@***.***> wrote:
What happens here, if you look at the compiled code, is that it gets
compiled into nested if-statements for situations that are too difficult
for the compiler to rewrite more efficiently (currently).
First I ran some numbers with your code as example (but Vitaliy's code
from the SO question probably has the same characteristic), scenario: just
this code in a single .NET Framework project, but I assume .NET Standard
and .NET Core would give the same results:
module Bloated =
type ManyFields =
{
Field1: int option option // 10kB
Field2: int option option
Field3:
int option option
Field4: int option option // 12kB
Field5: int option option
Field6: int option option
Field7:
int option option
Field8: int option option // 24kB
Field9: int option option // 36kB
Field10: int option option // 60kB
Field11: int option option // 106kB
Field12: int option option // 200kB
Field13: int option option // 387kB, 2s
Field14: int option option // 761kB, 3.5s
Field15: int option option // 1509kB, 7s
Field16: int option option // 3005kB, 16s
Field17: int option option // 6MB, 35s -- ~800MB
Field18: int option option // 12MB, 1m -- ~1200MB
Field19: int option option // probably 24MB, -- after ~1500MB BOOM! CRASH!
}
let getFirstField manyFields =
match manyFields with
| { Field1 = Some (Some i) } -> Some i
| { Field2 = Some (Some i) } -> Some i
| { Field3 = Some (Some i) } -> Some i
| { Field4 = Some (Some i) } -> Some i
| { Field5 = Some (Some i) } -> Some i
| { Field6 = Some (Some i) } -> Some i
| { Field7 = Some (Some i) } -> Some i
| { Field8 = Some (Some i) } -> Some i
| { Field9 = Some (Some i) } -> Some i
| { Field10 = Some (Some i) } -> Some i
| { Field11 = Some (Some i) } -> Some i
| { Field12 = Some (Some i) } -> Some i
| { Field13 = Some (Some i) } -> Some i
| { Field14 = Some (Some i) } -> Some i
| { Field15 = Some (Some i) } -> Some i
| { Field16 = Some (Some i) } -> Some i
| { Field17 = Some (Some i) } -> Some i
| { Field18 = Some (Some i) } -> Some i
| { Field19 = Some (Some i) } ->
Some i
| _ -> None
Since I have 48GB available, the OutOfMemoryException must be caused by
some 32-bit or other limit. After each change, VS's memory also spiked, but
then calmed down after a minute or so to a steady 1GB.
The numbers clearly show a correlation between build time, size and memory
usage, each growing exponentially. This suggests that the same algorithm is
at fault here that @tpetricek <https://github.com/tpetricek> analyzed in
that SO question. If I take a look at the decompiled C# code for *only 3
fields* (otherwise it gets unreadable), it quickly becomes clear what's
happening:
public static FSharpOption<int> getFirstField(ManyFields manyFields)
{
FSharpOption<FSharpOption<int>> option;
int num;
FSharpOption<FSharpOption<int>> option2;
if (manyFields.Field1@ == null)
{
if (manyFields.Field2@ != null)
{
option = manyFields.Field2@;
if (option.get_Value() != null)
{
num = option.get_Value().get_Value();
goto Label_0050;
}
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
option2 = manyFields.Field3@;
if (option2.get_Value() == null)
{
goto Label_0081;
}
num = option2.get_Value().get_Value();
}
else
{
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
option = manyFields.Field3@;
if (option.get_Value() == null)
{
goto Label_0081;
}
num = option.get_Value().get_Value();
}
goto Label_007A;
}
option = manyFields.Field1@;
if (option.get_Value() != null)
{
return FSharpOption<int>.Some(option.get_Value().get_Value());
}
if (manyFields.Field2@ == null)
{
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
option2 = manyFields.Field3@;
if (option2.get_Value() == null)
{
goto Label_0081;
}
num = option2.get_Value().get_Value();
goto Label_007A;
}
option2 = manyFields.Field2@;
if (option2.get_Value() == null)
{
if (manyFields.Field3@ == null)
{
goto Label_0081;
}
FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
if (option3.get_Value() == null)
{
goto Label_0081;
}
num = option3.get_Value().get_Value();
goto Label_007A;
}
num = option2.get_Value().get_Value();
Label_0050:
return FSharpOption<int>.Some(num);
Label_007A:
return FSharpOption<int>.Some(num);
Label_0081:
return null;
}
In other words, it looks like the compiler cannot ascertain that if a
first match succeeds partially, that other parts need no tests anymore,
hence the recursive repetition of all cases under each main if-branch.
In Vitaliy's question there was something to say for the way the compiler
treats this, since it was about interface matching, and any class could
have multiple interfaces (which doesn't mean it cannot be optimized, but it
is harder). But in this case, that shouldn't be necessary at all.
I remembered this situation and I have some places where I have put in
some exclamation marks "Do not simplify this code from nested patterns, it
creates bloated IL!!!". Though it never occurred to me that the OOM is
related.
Furthermore, we sometimes have mysterious crashes of Visual Studio simply
disappearing, or FCS crashing behind the scenes. It is very well possible
that some of these issues are related to this.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4691 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAj7yvm8AI4lb1xu3BGwcowWuMUW9jXNks5tl26vgaJpZM4TJi8j>
.
|
It might also suggest that it simply hasn't been prioritized for whichever reason. (I wouldn't know, of course, but it's a possibility.) |
I can't speak towards this issue since it may be related to the initial design of the compiler, but there are all sorts of little weird things that can happen in compilers which lead to OOM or other bad behavior. For example, getting clever with generics is a good way to hose the C# compiler: class Class<A, B, C, D, E, F>
{
class Inner : Class<Inner, Inner, Inner, Inner, Inner, Inner>
{
Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner inner;
}
} (If this works just fine for you, just add some more generics) The C# compiler also solves the 3-SAT problem for overload resolution, so if you're clever with lambdas you may be in for a surprise as well. Haskell and other ML languages also have awful worst-case scenarios for type inference that you can run into from time to time. Swift can also stop compiling after a while on very trivial code and say the expression is too complex. |
@cartermp, thanks for mentioning the 3-SAT problem, reading up on that gives some hints on how this problem can be problematic. Though I don't think it is unsolvable, because DU matching does not necessarily need to be compiled this way. For instance, it could be compiled as a list of As an optimization to the |
@cmeeren Thanks for the report. There are a couple of ways that pattern matching can cause slow compilation, but this looks like one we should be able to address |
@dsyme very glad to hear it. Especially if @abelbraaksma is correct in that
|
I like to think I am ;). The code above can easily be tested, just check the compiled binaries (and watch the clock and memory usage by the compiler & VS). |
The majority of pattern matching code compiles well. I suspect the problem here is something combination of things, I suspect that implicit extra record fields may be being expanded and bound in the record patterns, even if they are wildcards, though that shouldn't ultimately cause the extra code. |
Issue #5212 suggests that exponential growth can happen really easy and diesn't require record fields. Looking at the IL I don't think this is caused by implicit record fields per se. It basically looks as if all permutations are being created by the compiler, while that isn't necessarily required (technically, it could compile to a list of |
The example code didn't cause SO on my machine but it still took ages to compile and made the machine unusable for some time. On the other hand, given all the notes in this discussion (e.g. that it needs this much cases or that it needs to be not options but option options) - I wonder how acute this is. Probably still worth looking at it at least to see if there are some glaring misses in the pattern matching impl. |
I think I've fixed the SO some time ago, but yes, it is slow, which is kinda expected (not desired though). |
I expecience OutOfMemoryException when compiling code that pattern matches many cases on a record with many fields that are nested DU (e.g.
int option option
as shown in the repro below, but I originally experienced it with other DU types). Furthermore, the IDE/Intellisense becomes unresponsive and may crash.Repro steps
Test.zip
Either try the test solution, or simply create a new .NET Standard library (others may produce the error too, but I haven't tested), paste the following code and compile/observe IDE behavior.
Expected behavior
The code compiles successfully.
Actual behavior
Compilation fails with the following output (the number of exceptions may vary):
Known workarounds
Related information
This SO answer by Tomas Petricek looks relevant. If it's the same underlying problem, then it has been around since at least 2012. About time it's fixed :)
Versions:
The text was updated successfully, but these errors were encountered: