-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserving operator methods necessary for System.Linq.Expressions #1821
Comments
Looking at the code the C# compiler generates for We could mark this API as APIs with optional parameters (that have expensive fallbacks for when the parameter wasn't provided) are not very trimming friendly - we should keep that in mind when introducing new APIs in the future. One aspect is situations like this (if this was a separate overload, we could just mark that one as unsafe without having to specialcase in linker). The other aspect is trimming in general - we'll never be able to get rid of the code that tries to find the right operator even if all callers provide a MethodInfo. |
We could even go full "nullable annotations" on this and introduce a:
Illink could potentially have an aggressive mode where it would assume method was not null and eliminate the branch that tries to handle the "method == null" case. |
I think the problem is for example this: Expression<Func<int, int, bool>> x = ((int i, int j) => i == j); This compiles to something like: Expression.Equal(
Expression.Parameter(typeof(int), "i"),
Expression.Parameter(typeof(int), "j")); There's no method in this case (it will default to Even if we somehow ask everybody to pass in the method, there would be no way to get to the intrinsic handling then, which I assume covers by far the most cases today. I would not be surprised if changing this would break The compatibility is another large part of this problem - even if the caller constructs the expression tree in a way which is trim-safe, it doesn't mean the The existing |
That makes things more complicated. It makes me wonder though whether we should at least consult how expensive it would be if:
If Roslyn has enough type information, this might not be an expensive feature to ask for, and the produced code would be trim safe without acrobatics. The line between "try to trim as much as possible" and "make sure a lot of code can work" is blurry, but the litmus test I use is whether a layperson would be able to understand why something was kept from our logs. "Expressions were used, therefore we kept all your operators" might be hard to follow. "An unknown method was accessed on this type, therefore we kept all methods on the type" follows more straightforward logic. |
I'm not sure this approach will be feasible, for the following reasons:
public struct TC1
{
public string Name;
public int data;
public TC1(string name, int data)
{
Name = name;
this.data = data;
}
public static TC1 operator &(TC1 t1, TC1 t2) { return new TC1("And", 01); }
public static TC1 operator |(TC1 t1, TC1 t2) { return new TC1("Or", 02); }
public static TC1 Meth1(TC1 t1, TC1 t2) { return new TC1(); }
public static bool operator true(TC1? a) { return true; }
public static bool operator true(TC1 a) { return true; }
public static bool operator false(TC1? a) { return false; }
public static bool operator false(TC1 a) { return false; }
}
TC1? tc1 = new TC1("lhs", 324589);
TC1? tc2 = new TC1("rhs", 324589);
Expression<Func<TC1?>> e = () => tc1 && tc2; That last line compiles into:
So the |
Yup, I know about the "no longer evolving" part, but if were to make an API addition that is reasonably easy to review (it just does a subset of an existing API and that's all), it might still be acceptable. It's not like we don't touch the assembly at all. Looks like there are a bunch of unique problems with the rewriting, so lets see if we can make the rewriting safe at all. If the answer to that ends up being no, then not being able to safely rewrite shouldn't preclude us from being able to safely construct expression trees. The op_True/op_False issue could be solved by intrinsics (we know what method was passed and these are on the same type). If the method is not known, this wouldn't have been safe either way. |
While I appreciate wanting to make this trim compatible for real, there's also a question of "Is it really worth it?". Just out of curiosity I tried to write a simple version of "mark all operators on all marked types" and ran it on a Blazor template app. It marks 243 operators (which are probably not marked for other reasons - not sure). The compressed size grows by 6KB. That is not ignorable difference in size, on the other hand we would only do this if If this is all it takes to make expression trees work with trimming - I don't think it's that bad. If the app wants to use expression trees, then the price is not crazy high. It will obviously grow with more types which have operators, but still. |
I'm more worried about things like conversion operators and not the ones we use in framework. Conversion operators are likely to new up types the app wouldn't ever need. Decision to root them whenever we see Linq Expressions might come back to bite us. Here's a bunch of conversion operators that new up things: https://github.com/dotnet/wpf/search?q=%22implicit+operator%22 |
We could also consider making it an option (default it either way, my vote would be "make it safe by default"). That way apps who (1) need to use System.Linq.Expressions and (2) have to trim every bit of possible code out, have a way to turn this off. Of course they would need to be responsible for ensuring their app still behaves correctly after turning the option off. And they would need to manually "mark" any operators that need to be preserved. |
We could make the marking of the operators a bit more clever. If class |
Well, as long as the hack is not more complicated than the Roslyn fix would be, it works for me (if the answer from the Roslyn team would be "we totally know the types of all operands to binary/unary operators" the fix that doesn't doesn't require hardcoding a bunch of Expressions knowledge into illink would probably be less code to maintain). |
@jaredpar - thoughts in this area? Would the Roslyn team be open to adding new APIs to System.Linq.Expressions and changing how the compiler emits Expressions in order to support trimming? |
Haven't totally processed the thread but ...
The problem with expression trees is that APIs which consume expression trees are not tolerant to changes in how the expression trees are emitted. Even tiny changes like adding in a silent conversion node, or when we do or don't cast to make interface calls, irrevocably breaks many of our expression tree consumers. We learned this lesson with great pain in Roslyn because, for lots of thorny reasons, the expression trees we emitted differed ever so slightly than the ones emitted by the native compiler. This subtle differences broke practically everyone that consumed expression trees. The most notable consumer was the Entity Framework team. Changing these in place is not really feasible. Even teams like EF Core, who would really like us to expand the set of C# operations that have expression trees, agree we can't change them in place. It requires some sort of API opt-in to acknowledge that the set of trees may be different. Else it would just lead to lots of compat breaks. |
For the place that we're discussing, this change wouldn't affect the shape of the tree - only the API used to construct it would be different, but the result would be the same. The problem right now is that binary and unary operators are constructed using an API with the shape described in the top post: public static BinaryExpression Equal(Expression left, Expression right, bool liftToNull, MethodInfo? method)
{
ExpressionUtils.RequiresCanRead(left, nameof(left));
ExpressionUtils.RequiresCanRead(right, nameof(right));
if (method == null)
{
// This path might be a problem for trimming since we might start looking for a method named
// op_Equality on a bunch of pretty much random types (from the perspective of the illinker)
return GetEqualityComparisonOperator(ExpressionType.Equal, "op_Equality", left, right, liftToNull);
}
// This path is trimming safe since MethodInfo was already provided and we won't look for it by name
return GetMethodBasedBinaryOperator(ExpressionType.Equal, left, right, method, liftToNull);
} Faced with an API like this, we only have a couple options when it comes to trimming:
The reason why the second option is problematic is because not all types have operators - for example primitive types don't. The private static BinaryExpression GetEqualityComparisonOperator(ExpressionType binaryType, string opName, Expression left, Expression right, bool liftToNull)
{
// known comparison - numeric types, bools, object, enums
if (left.Type == right.Type && (left.Type.IsNumeric() ||
left.Type == typeof(object) ||
left.Type.IsBool() ||
left.Type.GetNonNullableType().IsEnum))
{
if (left.Type.IsNullableType() && liftToNull)
{
return new SimpleBinaryExpression(binaryType, left, right, typeof(bool?));
}
else
{
return new LogicalBinaryExpression(binaryType, left, right);
}
}
// look for user defined operator
BinaryExpression? b = GetUserDefinedBinaryOperator(binaryType, opName, left, right, liftToNull);
if (b != null)
{
return b;
}
if (TypeUtils.HasBuiltInEqualityOperator(left.Type, right.Type) || IsNullComparison(left, right))
{
if (left.Type.IsNullableType() && liftToNull)
{
return new SimpleBinaryExpression(binaryType, left, right, typeof(bool?));
}
else
{
return new LogicalBinaryExpression(binaryType, left, right);
}
}
throw Error.BinaryOperatorNotDefined(binaryType, left.Type, right.Type);
} The problem is the |
Perhaps I'm stating obvious but a fix in Roslyn would require every library which uses ET with binary/unary operations to be recompiled with .NET6 csc version and the linker would still need to recognize that null method is used correctly for primitive types (it can be tricky for nullable) and warn for the rest. |
Yes, it would only be trimming safe if recompiled with the latest Roslyn targetting .NET 6. Illink doesn't have facilities to track types of expression trees so it doesn't know whether a null There are no ideal solutions to this. |
Reiterating the ask in my own words to make sure I'm understanding this correctly:
Does that sum up the ask correctly? |
Yes, that's correct. I think we just want to know whether it would be doable with relative ease. For an outsider it looks like it should be possible because Roslyn already appears to be aware of the rules (trying to construct a tree with binary/unary operators for which S.L.Expressions doesn't have built-in rules or there are no user defined operators is considered a source compilation failure - Roslyn doesn't leave it up to S.L.Expressions to produce an exception at runtime).
There are ways how to create trimming friendly MethodInfo (like through |
Believe this is relatively doable in the compiler. There are a few choke points in the expression rewriter that we could leverage to call the new methods. Probably the biggest issue is validating the assumption that whenever there is a user defined operator that we do indeed pass a Whenever making expression tree changes though, even if it's intended to be a no-op, we need to involve the EF Core team. They have the heaviest dependency on expression trees and can validate if the new structure does / doesn't work for them. Probably a small / medium feature in the compiler. I haven't looked at the actual expression tree code though so unsure how much it will cost to have trimmable friendly versions of all of the methods + add tests for them. |
@ajcvickers for EF Core context |
I think @jaredpar expressed the EF concerns pretty well; that is, generation of different expression trees will break existing code that parses expression trees. That being said, on the EF team we have been thinking about the evolution of expression trees, and indeed earlier in the week @roji shared some ideas about versioning of the generated expression trees to make adding new Expression types doable without breaking existing LINQ providers. This will allow the compiler to generate trees that it didn't before at these if the LINQ provider can handle those trees. This is by no means fully fleshed out--Shay is doing additional research and prototyping. When we have something more concrete the plan is to bring this to the compiler folks and get further feedback. The goal is ultimately for the EF team to put resources into improving expressions support at the lower levels, including our experience working with expression trees (both for dynamic code gen and LINQ). We believe that if we work together with the Roslyn team, then we may be able to un-mothball the Expressions library and evolve it. Lots of hand-waving there, but it seems like there may be paths forward here. /cc @dotnet/efteam |
In addition to what @ajcvickers wrote... Referring to @MichalStrehovsky's comment above:
That already reduces a lot of risk/work on the EF side. In fact, if I'm understanding the above correctly, unless EF Core wants to be 100% trimming friendly (tracked by dotnet/efcore#21894, probably not 100% doable for 6.0), then this change shouldn't have any impact on us - is that correct? We'd continue to be able to call Expression.Equal and similar without any behavioral changes, right? Probably worth including @smitpatel who's the expert on any interaction between user equality/cast operators and the way EF processes expression trees. |
Yes, my expectation is that Roslyn-generated code was already passing I just tried some "tricky" cases like when the operands are of type I think especially if we're considering evolving expressions further, it might be the best if illink doesn't try to go into the business of trying to analyze this and hardcoding how the various Expressions APIs operate (which is what we would be left with if we don't introduce a trim-safe API and try to mirror what |
There's also the case of non-Roslyn generated expression trees, i.e. trees built by invoking Expression.Equal and the like. These would also need to continue produce the same tree. |
Those shouldn't be impacted as @MichalStrehovsky isn't proposing to change existing behavior here. Instead he's proposing to add new overloads. But that also means the existing calls would not be IL trimmer friendly. It's the type of API that, if we had an analyzer that flagged trim unfriendly APIs, would flag pretty much all the existing expression tree binary APIs (based on my understanding of the issue).
I have the same expectation and a number of tests I ran, as well as inspecting the code, backed up that assumption. The lookup of the |
The case @roji mentioned above is equally important. Every library out there which produces expression tree in some way uses the default |
I agree with @smitpatel - requiring providers to move away from APIs such as the existing Expression.Equal would probably hamper triming-friendly adoption in general. |
The trimming story that we have assumes one might need to change their code to be trimming friendly. This would be one of such changes; we're in the process of making changes in the framework to make the base libraries trim friendly - changes like this are what we do. If we go and introduce the new API, we would put a warning on the old API that shows up if someone enables trimming and has a call to this, advising people to use the other API (and come up with a MethodInfo in a way that illink can analyze). They might choose not do it and live with the risks (the warnings can be suppressed, but then it's no longer our (.NET org) bug if it doesn't work after trimming) |
This will keep custom operators on marked types whenever System.Linq.Expressions is used, and the operator input types are marked. The behavior is enabled by default, and can be disabled by passing --disable-operator-discovery. Addresses dotnet#1821
* Preserve custom operators This will keep custom operators on marked types whenever System.Linq.Expressions is used, and the operator input types are marked. The behavior is enabled by default, and can be disabled by passing --disable-operator-discovery. Addresses #1821 * Fix behavior for operators on nullable types * Cleanup and PR feedback - Avoid processing pending operators Dictionary if Linq.Expressions is unused - Allocate this possibly-unused Dictionary lazily - Use readonly field for always-used HashSet - Rename markOperators -> seenLinqExpressions - Clean up ProcessCustomOperators call to make intent more clear - Add comments - Check MetadataType.Int32 instead of searching BCL for Int32 * Remove unnecessary parens * PR feedback - seenLinqExpressions -> _seenLinqExpressions - use List for pending operators instead of HashSet
Suppress ILLink warnings for operator methods now that dotnet/linker#1821 is resolved. Add TrimmingTests for Linq.Expressions operators. Fix dotnet#45623
* Resolve ILLink warnings in System.Linq.Expressions (Final) Suppress ILLink warnings for operator methods now that dotnet/linker#1821 is resolved. Add TrimmingTests for Linq.Expressions operators. Fix #45623
Closing since this is fixed by #2125 and dotnet/runtime#55856. |
* Preserve custom operators This will keep custom operators on marked types whenever System.Linq.Expressions is used, and the operator input types are marked. The behavior is enabled by default, and can be disabled by passing --disable-operator-discovery. Addresses dotnet/linker#1821 * Fix behavior for operators on nullable types * Cleanup and PR feedback - Avoid processing pending operators Dictionary if Linq.Expressions is unused - Allocate this possibly-unused Dictionary lazily - Use readonly field for always-used HashSet - Rename markOperators -> seenLinqExpressions - Clean up ProcessCustomOperators call to make intent more clear - Add comments - Check MetadataType.Int32 instead of searching BCL for Int32 * Remove unnecessary parens * PR feedback - seenLinqExpressions -> _seenLinqExpressions - use List for pending operators instead of HashSet Commit migrated from dotnet/linker@6b0da00
Background
There are currently 2 ILLink warnings left in System.Linq.Expressions that both boil down to the same pattern.
When you create a
Binary
orUnary
Expression, and don't specify aMethodInfo
,System.Linq.Expressions
will try to find an operator on the Type corresponding to the ExpressionType you specified.For example, if you call
Expression.Equal(left, right)
, will call the following codeGetEqualityComparisonOperator
will look for theop_Equality
operator on both theleft
andright
types. This cannot be expressed in the existing trimming annotations because we don't have aSystem.Type
parameter to annotate.left
andright
areSystem.Linq.Expressions.Expression
instances.This same pattern applies to pretty much all Binary and Unary expressions:
Proposal
We should add a new rule in the
mono/linker
that whenSystem.Linq.Expressions
is used in an app, it should preserve all user defined operators on Types that are being preserved. This will allow the above Expression code to keep working in a trimmed application.In the
System.Linq.Expressions
library, we can then suppress the remaining warnings saying "The trimmer won't trim operators on preserved Types when Expressions are involved. It is safe to try to find these operator methods."cc @vitek-karas @MichalStrehovsky @marek-safar
The text was updated successfully, but these errors were encountered: