-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for GDScript VM #70838
Conversation
96c403f
to
1cf2f75
Compare
1cf2f75
to
3258dac
Compare
If anyone's curious, I ported the benchmark code to PHP, Python and Ruby. Benchmark times were measured on an i7-6700K @ 4.4 GHz on Fedora 36. PHP
Code<?php
const elems = 65536;
function benchmark() {
$arr = [];
for ($i = 0; $i < elems; $i++) {
$arr[] = mt_rand(0, 2147483647) % elems;
}
$t = microtime(true);
$acc = 0.0;
foreach ($arr as $e) {
$e2 = $arr[$e];
$e3 = $arr[$e2];
$acc += $e * $e2;
$acc *= $e3 + $e;
$acc = sqrt($acc);
}
echo "Time taken (PHP): " . (microtime(true) - $t) * 1000 . " msec.";
}
benchmark(); Python
Codeimport math
import random
from time import time
elems = 65536
def benchmark():
arr = []
for x in range(elems):
arr.append(random.randint(0, 2147483647) % elems)
t = time()
acc = 0.0
for e in arr:
e2 = arr[e]
e3 = arr[e2]
acc += e * e2
acc *= e3 + e
acc = math.sqrt(acc)
print("Time taken (Python):", (time() - t) * 1000, "msec.")
benchmark() Ruby
CodeELEMS = 65536
def benchmark()
arr = []
for i in 1..ELEMS
arr.append(rand(2147483647) % ELEMS)
end
t = Time.now
acc = 0.0
for e in arr
e2 = arr[e]
e3 = arr[e2]
acc += e * e2
acc *= e3 + e
acc = Math.sqrt(acc)
end
puts("Time taken (Ruby): " + ((Time.now - t) * 1000).to_s + " msec.")
end
benchmark() Footnotes |
Given you are on different Hardware it probably makes sense to also include GDScript Numbers on your 6700K. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, apart from the bit I commented about (which can be addressed later).
Some tests on M1 mac, using release editor binary (average of 10 runs):
|
I'm new to godot, so can someone pls explain why the comparisons are made against python, php and ruby instead of c# and c++ since these are the languages someone might actually use in a godot project? |
C# and C++ are languages compiled ahead-of-time, so these aren't fair comparisons to make. It's not possible for an interpreted language to even match an AOT-compiled language in terms of raw performance. Comparing interpreted languages with JIT-compiled languages is a stretch already, as JIT suffers from restrictions that interpreted languages don't have (such as iOS and consoles heavily restricting JIT usage). |
We've been doing a lot of transpiling gdscript to c++ for our game logic in some performance heavy areas... so I have a lot of experience here. We run a lot of godot gameservers on cloud infrastructure so we pay for every CPU cycle used. For GDScript vs C++ we can typically see close to two orders of magnitude of speedup, (usually 40-100X faster), which is why it's not a fair comparison but also illustrates how there can be so much to gain with an AOT language + optimization pass that compiles to machine instructions. The typed gdscript example code runs in The native c++ version below, executes in int elems = 65536;
std::vector<int> arr = {};
for (int i = 0; i < elems; i++) {
arr.push_back(rand() % elems);
}
uint64_t t = mach_absolute_time();
float acc = 0.0;
for (int e = 0; e < elems; e++) {
int e2 = arr[e];
int e3 = arr[e2];
acc += e * e2;
acc *= e3 + e;
acc = sqrt(acc);
}
printf("Time taken (typed): %.4f%s",(mach_absolute_time() - t)/1000.0,"msec."); Now we typically don't see a 1000X improvement. But in this trivial example the native optimization passes that a c++ compiler can do (vectorization, inlining, loop unrolling, et al) seem to really play a factor. In production code with typical game logic we usually see about a 40X-100X performance improvement. |
* Removed instruction argument count and instruction prefetching. This is now done on the fly. Reduces jumps. * OPCODE_DISPATCH now goes directly to the next instruction, like in Godot 3.x. I have nothing I can use to test performance, so if anyone wants to lend a hand and compare with master (both on debug and release), it would be very welcome.
3258dac
to
7211e04
Compare
@jordo We have been discussing this here - godotengine/godot-proposals#6031 and the general consensus seems to be that users would be happy with an optional transpiler to C, but this is obviously Godot 4.1+ material. |
This comment was marked as outdated.
This comment was marked as outdated.
I ran our game w/ nearly 30k lines of static typed code and a thousand assets and compared beta 10. I didn't expect much. Though we have a lot of code we're not bound by gdscript or the CPU. We're bound by the renderer and the GPU. I rarely see any scripts with significant performance issues on the monitors, and fix them if I do. Core i9-12900H Testing the PR build (7211e04), I do have a lot more renderer errors and visual artifacts (prob due to other PRs). I did not get any gdscript errors.
|
Why the big difference between 1st and 2nd run? |
|
Note that the way binaries are compiled has a big impact on performance comparisons. Official beta builds are a lot more optimized than CI builds, so to do an accurate comparison using the PR's CI build, I would suggest downloading a CI build from the |
Ran my project with around ~8K lines of statically typed GDScript in both debug and release. I don't have any benchmark numbers at the moment, but the procedural world generation felt faster - and felt much faster on MacOS. No errors anywhere in the project. |
With Godot 4 beta 10 I noticed that string concatenation seems to be 700 times slower compared to Python. func string() -> int:
var x : String = ""
for i in range(0, 300000):
x += " "
return x.length() Python 3.10 (0.023s): def string():
x = ""
for i in range(0, 300000):
x += " "
return len(x) |
@LinuxUserGD most likely Python is optimized for string processing (because that's a common use case for it) and their string class keeps track of all the additions to later merge them when you read from it. Godot is not designed with string processing in mind hence this is slow. We have a StringBuilder class we use on the C++ side that could be exposed to script if there is really demand, but probably there isn't. |
will this get backported to godot 3 |
@holzmb GDScript has been completely reworked for 4.x |
@holzmb
What is not there is the addressing mode optimization, but that won't happen because its too much work. |
Thanks! |
Performance seems to improve significantly (more than 40% faster on release on typed code vs master), here is some quick benchmark I made:
Master:
Debug:
Time taken (untyped): 33.708msec.
Time taken (typed): 20.464msec.
Release:
Time taken (untyped): 8.257msec.
Time taken (typed): 5.761msec.
This PR:
Debug:
Time taken (untyped): 29.234msec.
Time taken (typed): 16.027msec.
Release:
Time taken (untyped): 6.41msec.
Time taken (typed): 3.311msec.
Test source code:
I don't have much else I can use to test performance, so if anyone wants to lend a hand and compare with master (both on debug and release) or test in your production project, it would be very welcome.