Optimizations for GDScript VM #70838

reduz · 2023-01-02T14:32:35Z

Removed instruction argument count and instruction prefetching. This is now done on the fly. Reduces jumps.
Removed the address mode switch, addressing is now simple array indexing.
OPCODE_DISPATCH now goes directly to the next instruction, like in Godot 3.x.

Performance seems to improve significantly (more than 40% faster on release on typed code vs master), here is some quick benchmark I made:

Master:
Debug:
Time taken (untyped): 33.708msec.
Time taken (typed): 20.464msec.
Release:
Time taken (untyped): 8.257msec.
Time taken (typed): 5.761msec.

This PR:
Debug:
Time taken (untyped): 29.234msec.
Time taken (typed): 16.027msec.
Release:
Time taken (untyped): 6.41msec.
Time taken (typed): 3.311msec.

Test source code:

const elems = 65536

func benchmark():

	var arr = []
	for x in range(elems):
		arr.append(randi() % elems)


	var t = Time.get_ticks_usec()

	var acc = 0.0

	for e in arr:
		var e2 = arr[e]
		var e3 = arr[e2]
		acc += e * e2
		acc *= e3 + e
		acc = sqrt(acc)

	
	print("Time taken (untyped): ",(Time.get_ticks_usec() - t)/1000.0,"msec.")		

	var arr2 := PackedInt32Array()
	for x in range(elems):
		arr2.append(randi() % elems)


	t = Time.get_ticks_usec()

	var acc2 : float = 0.0

	for e in arr2:
		var e2 :int = arr2[e]
		var e3 :int = arr2[e2]
		acc2 += e * e2
		acc2 *= e3 + e
		acc2 = sqrt(acc)

	
	print("Time taken (typed): ",(Time.get_ticks_usec() - t)/1000.0,"msec.")

I don't have much else I can use to test performance, so if anyone wants to lend a hand and compare with master (both on debug and release) or test in your production project, it would be very welcome.

Calinou · 2023-01-02T17:27:02Z

If anyone's curious, I ported the benchmark code to PHP, Python and Ruby. Benchmark times were measured on an i7-6700K @ 4.4 GHz on Fedora 36.

PHP

Time taken (PHP 8.1.13 + XDebug): 16.055 msec
Time taken (PHP 8.1.13): 3.538 msec
Time taken (PHP 8.1.13 + JIT): 1.605 msec¹

Code

<?php

const elems = 65536;

function benchmark() {
    $arr = [];
    for ($i = 0; $i < elems; $i++) {
        $arr[] = mt_rand(0, 2147483647) % elems;
    }

    $t = microtime(true);

    $acc = 0.0;

    foreach ($arr as $e) {
        $e2 = $arr[$e];
        $e3 = $arr[$e2];
        $acc += $e * $e2;
        $acc *= $e3 + $e;
        $acc = sqrt($acc);
    }

    echo "Time taken (PHP): " . (microtime(true) - $t) * 1000 . " msec.";
}

benchmark();

Python

Time taken (Python 3.9.0): 12.492 msec
~~Time taken (Pyston 2.3.4): 8.459 msec¹~~
Time taken (Pyston 2.3.5): 6.686 msec¹

Code

import math
import random
from time import time

elems = 65536

def benchmark():
    arr = []
    for x in range(elems):
        arr.append(random.randint(0, 2147483647) % elems)

    t = time()

    acc = 0.0

    for e in arr:
        e2 = arr[e]
        e3 = arr[e2]
        acc += e * e2
        acc *= e3 + e
        acc = math.sqrt(acc)

    print("Time taken (Python):", (time() - t) * 1000, "msec.")

benchmark()

Ruby

Time taken (Ruby 3.2.0): 7.357 msec
Time taken (Ruby 3.2.0 + --jit): 4.711 msec¹

Code

ELEMS = 65536

def benchmark()
    arr = []
    for i in 1..ELEMS
        arr.append(rand(2147483647) % ELEMS)
    end

    t = Time.now

    acc = 0.0

    for e in arr
        e2 = arr[e]
        e3 = arr[e2]
        acc += e * e2
        acc *= e3 + e
        acc = Math.sqrt(acc)
    end


    puts("Time taken (Ruby): " + ((Time.now - t) * 1000).to_s + " msec.")
end

benchmark()

A JIT compiler is not comparable to interpreters in terms of performance. This figure is included for reference only. ↩ ↩² ↩³ ↩⁴

modules/gdscript/gdscript_byte_codegen.h

Byteron · 2023-01-02T20:06:35Z

If anyone's curious, I ported the benchmark code to PHP, Python and Ruby. Benchmark times were measured on an i7-6700K @ 4.4 GHz on Fedora 36.

PHP

Time taken (PHP 8.1.13 + XDebug): 16.055 msec

Time taken (PHP 8.1.13): 3.538 msec

Time taken (PHP 8.1.13 + JIT): 1.605 msec1

Code

Python

Time taken (Python 3.9.0): 12.492 msec

~~Time taken (Pyston 2.3.4): 8.459 msec1~~

Time taken (Pyston 2.3.5): 6.686 msec1

Code

Ruby

Time taken (Ruby 3.2.0): 7.357 msec

Time taken (Ruby 3.2.0 + --jit): 4.711 msec1

Code

Footnotes

A JIT compiler is not comparable to interpreters in terms of performance. This figure is included for reference only. ↩ ↩2 ↩3 ↩4

Given you are on different Hardware it probably makes sense to also include GDScript Numbers on your 6700K.

vnen

Looks good to me, apart from the bit I commented about (which can be addressed later).

bruvzg · 2023-01-02T20:35:49Z

Some tests on M1 mac, using release editor binary (average of 10 runs):

	Time
Godot (master, untyped)	13.9504
Godot (master, typed)	8.9020
Godot (this pr, untyped)	11.2082
Godot (this pr, typed)	6.0292
Python 3.10.9	9.1007
Ruby 2.6.10p210	8.6488

barathbheeman · 2023-01-02T21:18:13Z

I'm new to godot, so can someone pls explain why the comparisons are made against python, php and ruby instead of c# and c++ since these are the languages someone might actually use in a godot project?

Calinou · 2023-01-02T21:30:45Z

I'm new to godot, so can someone pls explain why the comparisons are made against python, php and ruby instead of c# and c++ since these are the languages someone might actually use in a godot project?

C# and C++ are languages compiled ahead-of-time, so these aren't fair comparisons to make. It's not possible for an interpreted language to even match an AOT-compiled language in terms of raw performance. Comparing interpreted languages with JIT-compiled languages is a stretch already, as JIT suffers from restrictions that interpreted languages don't have (such as iOS and consoles heavily restricting JIT usage).

jordo · 2023-01-02T22:36:24Z

I'm new to godot, so can someone pls explain why the comparisons are made against python, php and ruby instead of c# and c++ since these are the languages someone might actually use in a godot project?

We've been doing a lot of transpiling gdscript to c++ for our game logic in some performance heavy areas... so I have a lot of experience here. We run a lot of godot gameservers on cloud infrastructure so we pay for every CPU cycle used.

For GDScript vs C++ we can typically see close to two orders of magnitude of speedup, (usually 40-100X faster), which is why it's not a fair comparison but also illustrates how there can be so much to gain with an AOT language + optimization pass that compiles to machine instructions.

The typed gdscript example code runs in 16.033msec on my 2019 x86 macbook pro.

The native c++ version below, executes in 0.021000msec. So in this example, actually close to 1000X faster.

	int elems = 65536;
	
	std::vector<int> arr = {};
	for (int i = 0; i < elems; i++) {
		arr.push_back(rand() % elems);
	}

	uint64_t t = mach_absolute_time();

	float acc = 0.0;

	for (int e = 0; e < elems; e++) {
		int e2 = arr[e];
		int e3 = arr[e2];
		acc += e * e2;
		acc *= e3 + e;
		acc = sqrt(acc);
	}

	printf("Time taken (typed): %.4f%s",(mach_absolute_time() - t)/1000.0,"msec.");

Now we typically don't see a 1000X improvement. But in this trivial example the native optimization passes that a c++ compiler can do (vectorization, inlining, loop unrolling, et al) seem to really play a factor. In production code with typical game logic we usually see about a 40X-100X performance improvement.

* Removed instruction argument count and instruction prefetching. This is now done on the fly. Reduces jumps. * OPCODE_DISPATCH now goes directly to the next instruction, like in Godot 3.x. I have nothing I can use to test performance, so if anyone wants to lend a hand and compare with master (both on debug and release), it would be very welcome.

reduz · 2023-01-02T22:47:45Z

@jordo We have been discussing this here - godotengine/godot-proposals#6031 and the general consensus seems to be that users would be happy with an optional transpiler to C, but this is obviously Godot 4.1+ material.

TokisanGames · 2023-01-03T14:37:48Z

I ran our game w/ nearly 30k lines of static typed code and a thousand assets and compared beta 10. I didn't expect much. Though we have a lot of code we're not bound by gdscript or the CPU. We're bound by the renderer and the GPU. I rarely see any scripts with significant performance issues on the monitors, and fix them if I do.

Core i9-12900H
RTX 3070 mobile
Win 11/64

Testing the PR build (7211e04), I do have a lot more renderer errors and visual artifacts (prob due to other PRs). I did not get any gdscript errors.

What	B10	This PR
Loading the project and a big scene from the manager	24.7s	38s (1st) 17.6 (2nd)
Loading the game to title screen	14.1s	27.4s (1st) 13.6 (2nd)
Loading the first level	8.4s	16.6s (1st) 8.5s (2nd)
Playing, running around one area	100-145fps	100-145fps

Beta 10

PR

Zireael07 · 2023-01-03T14:50:08Z

Why the big difference between 1st and 2nd run?

Norrox · 2023-01-03T15:12:40Z

I'm new to godot, so can someone pls explain why the comparisons are made against python, php and ruby instead of c# and c++ since these are the languages someone might actually use in a godot project?
What Calinou said and we can already use C# with Godot so why compare the two ? :D

akien-mga · 2023-01-03T15:12:41Z

Note that the way binaries are compiled has a big impact on performance comparisons. Official beta builds are a lot more optimized than CI builds, so to do an accurate comparison using the PR's CI build, I would suggest downloading a CI build from the master branch around the time that PR was made (e.g. https://github.com/godotengine/godot/actions/runs/3818020117).

cridenour · 2023-01-04T03:28:55Z

Ran my project with around ~8K lines of statically typed GDScript in both debug and release. I don't have any benchmark numbers at the moment, but the procedural world generation felt faster - and felt much faster on MacOS.

No errors anywhere in the project.

LinuxUserGD · 2023-01-04T12:34:25Z

With Godot 4 beta 10 I noticed that string concatenation seems to be 700 times slower compared to Python.
GDScript (16.63s):

func string() -> int:
	var x : String = ""
	for i in range(0, 300000):
		x += " "
	return x.length()

Python 3.10 (0.023s):

def string():
    x = ""
    for i in range(0, 300000):
        x += " "
    return len(x)

reduz · 2023-01-04T14:14:48Z

@LinuxUserGD most likely Python is optimized for string processing (because that's a common use case for it) and their string class keeps track of all the additions to later merge them when you read from it. Godot is not designed with string processing in mind hence this is slow. We have a StringBuilder class we use on the C++ side that could be exposed to script if there is really demand, but probably there isn't.

holzmb · 2023-01-04T15:31:06Z

will this get backported to godot 3

AThousandShips · 2023-01-04T15:32:57Z

@holzmb GDScript has been completely reworked for 4.x

reduz · 2023-01-04T20:29:52Z

@holzmb
This optimization is already in Godot 3.x:

OPCODE_DISPATCH now goes directly to the next instruction, like in Godot 3.x.

What is not there is the addressing mode optimization, but that won't happen because its too much work.

akien-mga · 2023-01-05T12:08:07Z

Thanks!

reduz requested a review from a team as a code owner January 2, 2023 14:32

reduz force-pushed the gdscript-vm-optimization branch 2 times, most recently from 96c403f to 1cf2f75 Compare January 2, 2023 14:42

Chaosus added enhancement topic:gdscript performance labels Jan 2, 2023

Chaosus added this to the 4.0 milestone Jan 2, 2023

reduz force-pushed the gdscript-vm-optimization branch from 1cf2f75 to 3258dac Compare January 2, 2023 15:21

vnen reviewed Jan 2, 2023

View reviewed changes

modules/gdscript/gdscript_byte_codegen.h Outdated Show resolved Hide resolved

vnen approved these changes Jan 2, 2023

View reviewed changes

reduz force-pushed the gdscript-vm-optimization branch from 3258dac to 7211e04 Compare January 2, 2023 22:44

This comment was marked as outdated.

Sign in to view

LinuxUserGD added a commit to LinuxUserGD/gdscript-transpiler-bin that referenced this pull request Jan 3, 2023

add benchmark from godotengine/godot#70838

5c8c283

akien-mga approved these changes Jan 5, 2023

View reviewed changes

akien-mga merged commit fc4a734 into godotengine:master Jan 5, 2023

donte5405 mentioned this pull request Aug 21, 2024

Improve the performance of the GDScript VM godotengine/godot-proposals#6031

Open

Faless mentioned this pull request Sep 24, 2024

[Web] Use -Oz instead of -Os when optimizing for size #97407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations for GDScript VM #70838

Optimizations for GDScript VM #70838

reduz commented Jan 2, 2023 •

edited

Loading

Calinou commented Jan 2, 2023 •

edited

Loading

Byteron commented Jan 2, 2023

PHP

Python

Ruby

Footnotes

vnen left a comment

bruvzg commented Jan 2, 2023

barathbheeman commented Jan 2, 2023

Calinou commented Jan 2, 2023 •

edited

Loading

jordo commented Jan 2, 2023 •

edited

Loading

reduz commented Jan 2, 2023 •

edited

Loading

This comment was marked as outdated.

TokisanGames commented Jan 3, 2023 •

edited

Loading

Zireael07 commented Jan 3, 2023

Norrox commented Jan 3, 2023

akien-mga commented Jan 3, 2023

cridenour commented Jan 4, 2023

LinuxUserGD commented Jan 4, 2023 •

edited

Loading

reduz commented Jan 4, 2023

holzmb commented Jan 4, 2023

AThousandShips commented Jan 4, 2023

reduz commented Jan 4, 2023 •

edited

Loading

akien-mga commented Jan 5, 2023

Optimizations for GDScript VM #70838

Optimizations for GDScript VM #70838

Conversation

reduz commented Jan 2, 2023 • edited Loading

Calinou commented Jan 2, 2023 • edited Loading

PHP

Python

Ruby

Footnotes

Byteron commented Jan 2, 2023

PHP

Python

Ruby

Footnotes

vnen left a comment

Choose a reason for hiding this comment

bruvzg commented Jan 2, 2023

barathbheeman commented Jan 2, 2023

Calinou commented Jan 2, 2023 • edited Loading

jordo commented Jan 2, 2023 • edited Loading

reduz commented Jan 2, 2023 • edited Loading

This comment was marked as outdated.

TokisanGames commented Jan 3, 2023 • edited Loading

Zireael07 commented Jan 3, 2023

Norrox commented Jan 3, 2023

akien-mga commented Jan 3, 2023

cridenour commented Jan 4, 2023

LinuxUserGD commented Jan 4, 2023 • edited Loading

reduz commented Jan 4, 2023

holzmb commented Jan 4, 2023

AThousandShips commented Jan 4, 2023

reduz commented Jan 4, 2023 • edited Loading

akien-mga commented Jan 5, 2023

reduz commented Jan 2, 2023 •

edited

Loading

Calinou commented Jan 2, 2023 •

edited

Loading

Calinou commented Jan 2, 2023 •

edited

Loading

jordo commented Jan 2, 2023 •

edited

Loading

reduz commented Jan 2, 2023 •

edited

Loading

TokisanGames commented Jan 3, 2023 •

edited

Loading

LinuxUserGD commented Jan 4, 2023 •

edited

Loading

reduz commented Jan 4, 2023 •

edited

Loading