-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial const evaluations! (targetted const recompilation at runtime) #16524
Comments
I think that the following example sums it all:
Basically you want a super smart compiler than can understand what the rather complicated serializer code is doing and simplify it. The main problem is that nobody figured out how to write such a super smart compiler. Any takers? |
Your example with Because internally It's have even less sense, because we are about to get Roslyn post-compile transformation (see link) which allow you to do such things without hacking C# internals. |
I don't intend to distract or disagree with the above proposal. These are just a few of my more immediate thoughts on the subject. I think that use cases 1 and 2 could be covered by source generators. For example, a specialized serializer could be defined as the following: public partial PersonSerializer : JsonSerializer<Person> { } And the source generator would then interrogate the generic type argument As for the compiler hints, that's a little further reaching and honestly I believe it belongs more to the runtime/JIT compiler than to Roslyn. There are already a number of attributes which can be used to influence the JIT compiler, I could see how that could be expanded with more attributes (or other mechanisms) that the runtime could use as hints to optimize the generated machine code for given IL. |
@mikedn @Pzixel
Correct, CurrentCulture would have to be considered a parameter to the "system" (where system means the function being specialized + all the functions it is calling + all the functions they are calling + ...). But that does not mean that I could not write code in a way that there are no "hidden parameters". It is true that in this day and age, libraries are written in a way where they almost certainly use some kind of global state. But if library authors were aware of such a system, they could write their code in a different way.
I am not convinced that is true for all cases. And I argue that people can engineer their code in ways that make it possible, even trivial to do this. |
Exactly, thats why I'm such a big fan of code generators. I'm sure that it will be most likely too hard to implement this, but I'd still like to explore this idea.
Is there a different github project for the jit compiler / runtime?
"a number of attributes" ? |
https://github.com/dotnet/coreclr
Well, 1 is a number. 😁 |
That's practically impossible for a serializer, you have to use reflection and that counts as "outside". Now, a compiler could treat reflection operations as intrinsics and then maybe, just maybe, the compiler could do something if the serializer code is simple enough. Trouble is, if you write a very simple serializer to please the compiler then that simple serializer won't perform well when the compiler fails to optimize the code as intended. Wait a moment, didn't I just assume that the compiler can optimize the code? Yes, I did, but only when the object type is known at compile time. When the object type is not known (and that's a rather common situation) there's nothing that the compiler can do and your simple but inherently inefficient serialize will run at runtime. |
Exactly :)
Waaaait a minute, we have a huge misunderstanding here! I'm sorry! First, when a type is known at compile time, then I could just use code generators! So that case (type to serialize known at compile time) is irrelevant. The whole point of this thing is that the compiler (or something similar, maybe working on IL instead of c#-source-code) runs at runtime. Now, at runtime you have some new Type The While it is doing that it will also check each property access for known hints. and so on... Sure, the code will most likely not be as pretty or small as in my ideal example, but it will definitely be smaller because many |
That takes it completely outside of the realm of the compiler as at runtime you're dealing strictly with IL and not C# (or any other specific language). Roslyn only exists to convert C# into IL, it doesn't have a function beyond that point. That said, exactly when do you propose this happen at runtime? Trying to analyze the IL to determine all potential targets for serialization is probably no easier than doing so in C#. The serialization library is better off detecting when a new type is being serialized/deserialized and at that point emitting the specialized IL for handling that type which is then cached and reused. You'd take that hit at first serialization but then you'd follow the optimized path. I'm pretty sure that describes most of the commonly used serialization libraries already. |
You are completely right. But I don't think there's any better place to talk about this then here, right? People here know enough about the details of the language, the compiler, the possible advantages and disadvantages.
On request, the programmer would call make their own specialized delegates as needed. You explicitly tell the system "generate specialized code for me, here are the assumptions you can make, the parameters you can treat as constants, ..." And you get back a delegate that only works under the assumptions you have given. (Or the system would just return a delegate with the original signature, ignoring parameters that have assumptions made on them) |
If you're looking for the JIT to specialize code based on hints provided at runtime then I think that the coreclr repo is the more appropriate place. What you're describing sounds like either something built into the JIT or something built on top of dynamic assemblies. |
Doing such a thing at runtime is very expensive. It's also problematic because the number of specializations can grow out of control so there have to be some limits. But such limits may imply that the same scenario may end up being specialized multiple times and that only add to the runtime cost. There are compilers that do this kind of stuff but on very small scale compared to what you suggest here. JavaScript compilers do this but they have a comparatively very small problem - how to access object members efficiently. They don't optimize arbitrary amounts of code in the way that you suggest here. |
@mikedn Maybe even on the order of seconds when the call tree gets big enough, I mean if you take a look at all the stuff that gets done in a good JsonSerializer...
But I am optimistic because the thing I suggest will only create a new specialized method when the programmer calls the Specialize function. For example if I know my server application will need to serialize millions objects to json, then I would want to let the system generate a specialized method for my usecase from the existing code.
I am only vaguely aware of the optimizations in js engines, do you have some example / link / source on that? I'd love to see how that is done. |
The funny thing is that this happens already, but in a different way. Serializers (at least good ones) usually generate code at runtime that's specific to the type you're serializing. They know exactly what code needs to be generated and that puts them at an advantage compared to a compiler that would need to try to understand what the serializer code is doing and try to generate better code for a given type. Basically you have specialized code producing other specialized code compared to general purpose code attempting to produce specialized code.
I don't have much interesting in JS optimizations so I'm don't have any handy links. The basic idea is that they attempt to create types on the fly and generate code specialized for those types. Such specialized code includes type checks to ensure that it only runs on the types it was generated for, otherwise non-specialized code needs to be run instead or more specialized code needs to be generated. In any case, the idea is that what JS engines do is very focused. They deal with random pieces of code but only look for and optimize certain aspects of the code. |
@asdfgasdfsafgsdfa here is some information about it |
Why?
The discussions and ideas in code generators and
constexpr
( #16503 and#15079 ) always leave me with some bland aftertaste... I always had the feeling that there is a LOT more than can be done to drastically improve performance. At first I thought that code-generators would be the silver bullet to save us from bad performance; but it quickly became apparent to me that we are not at the core of the problem yet.
Fortunately I have figured out some good examples.
What?
Most of the time as programmers we are perfectly aware of what code is time-critical and will be the bottleneck in our applications, but improving that can be hard, really hard actually.
I propose a combination of compiler and language features that have the potential to increase performance drastically.
By telling the compiler and runtime that some arguments to a function will remain constant for some time, the compiler and/or jit compiler can recompile a method (and all the methods it calls in turn).
Example 1 - Serialization
Here we'll serialize some type from/to JSON.
Traditionally we'd use Newtonsoft.Json or some other library and then
JsonConvert.SerializeObject(obj)
.This is bad. At every
SerializeObject
call the program executes code again even though there is no reason to.Tons of if comparisons if some options are set, null checks, ... all sorts of things that we as a human would instantly know to be always true or always false in a given context.
If there would be a way to "specialize" code it could improve performance a lot.
For example, if it were possible to do the following instead of the JsonConvert call above:
Now why is it not possible for the code inside
specializedSerializeMethod
to be something likeThe
NeverNullHint
tells theSpecialize
method that theName
member will never be null, soif(... != null)
checks can be safely replaced withif(false)
, which can be completely optimized away.I imagine that 90% of the code inside SerializeObject could be removed just because the Type is known beforehand (plus some hints).
The "targetted const" part from the title of this issue would be the genericTypeArgs, methodArgs and hints in the Specialize method.
Example 2 - Regex
More general than just serialization would be regex, where you have patterns.
System.Text.RegularExpressions.Regex already does something just like that, exactly for the reason that interpreted code is too slow.
You provide 'code' in the form of a regex string, then pass
RegexOptions.Compiled
and it will use dynamic methods to generate a specialized version.The only difference is that the specialized version is not generated from the regex interpreter + some "assumed to be constant" parameter.
Example 3 - Search and advanced pattern matching on data
Even more general than Regex in Example2 would be matching all sorts of patterns in all sorts of data!
Just like Regex searches for patterns in text, there are tons of other search patterns that people use every day.
For example searching for binary patterns when patching software by using delta-patches.
Also when compressing/decompressing data!
The same is true for image recognition.
To give an example without going into too much detail:
In classical image recognition and feature detection algorithms you often have loops that iterate over thousands or millions of points (pixels) and try to match patterns there.
Now if you could just pre-compile a known pattern into specialized code, you'd get enormous performance benefits. Just like in regex.
Example 4 - All sorts of interpreters!
The most general case I can think of at all, would be interpreting code, not just patterns (like regex).
Emulators for game consoles do it all the time (Emulators for the GameBoy, N64, Playstation1/2, Wii, and many many more).
They call it "dynamic recompilation".
But the same can be done for javascript and other classical programming languages.
And surprise surprise, and all major Js engines are doing just that, they generate code from known inputs.
The input being a string in javascript syntax, and the output being code instead of data. (a pointer to code that you can call).
How?
Just like generics generate new "specialized" code for different generic types the .NET runtime could generate new optimized/specialized delegates when some parameters are known.
Or even when just parts of the parameters are known, for example
new ConstantValueHint(".Age", 123)
.Now the API would be pretty difficult to design to allow for all sorts of hints.
When the created delegate is not in use anymore it would be collected by the GC eventually, that would then free the jited code as well...
Disclaimer
The only thing I do know for a fact is that there are a number of situations (some of which I listed above) where recompiling code with assumptions would help performance enourmously,
especially in the pattern image recognition part. And I know that people get performance gains of 1000% and more by "compiling" code in emulators, or in javascript.
All things that require interpreters in some form can profit from this.
The text was updated successfully, but these errors were encountered: