Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Use -Oz instead of -Os when optimizing for size #97407

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Faless
Copy link
Collaborator

@Faless Faless commented Sep 24, 2024

LLVM has an Oz optimization flag, which they define as:

Like -Os (and thus -O2), but reduces code size further.

I've tested this, on a release builds, and the result is ~25% smaller uncompressed size, and ~10% smaller when compressed.

I found out while evaluating the impact of LTO builds (#96851), and, while in the case of Os the LTO actually produces larger builds, in the case of Oz there is a (small) reduction in size (I'll add all the tests in #96851).

For now:

Oz, no LTO:

26M	lto-extract/none-oz/godot.wasm
7,0M	lto-extract/none-oz/godot.web.template_release.wasm32.nothreads.zip

Oz LTO:

24M	full-oz/godot.wasm
7,0M	full-oz/godot.web.template_release.wasm32.nothreads.zip

Os no LTO:

34M	none/godot.wasm
7,7M	none/godot.web.template_release.wasm32.nothreads.zip

Os LTO:

35M	full/godot.wasm
7,9M	full/godot.web.template_release.wasm32.nothreads.zip

LLVM has an Oz optimization flag, which they define as:

`Like -Os (and thus -O2), but reduces code size further.`

I've tested this, on a release builds, and the result is ~25% smaller
uncompressed size, and ~10% smaller when compressed.
@Faless Faless added this to the 4.4 milestone Sep 24, 2024
@Faless Faless requested a review from a team as a code owner September 24, 2024 12:22
@akien-mga akien-mga changed the title [Web] Use Oz instead of Os when optimizing for size [Web] Use -Oz instead of -Os when optimizing for size Sep 24, 2024
Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty significant gain! Looks great.

Any hints of tradeoffs on performance, and build times?

@Faless
Copy link
Collaborator Author

Faless commented Sep 24, 2024

Any hints of tradeoffs on performance, and build times?

Build times were almost the same between the two (no noticeable difference).

I didn't run any performance test yet, so we might want to look into that... maybe using something from the benchmark repository?

The emscripten docs say:

Like -Os, but reduces code size even further, and may take longer to run. This can affect both Wasm and JavaScript.

@Calinou
Copy link
Member

Calinou commented Sep 24, 2024

maybe using something from the benchmark repository?

https://github.com/godotengine/godot-benchmarks hasn't been tested as a web export, but it should be able to work there. Some of the rendering benchmarks won't make sense to run though, as only the Compatibilit rendering method can be used.

I would also try to run the 3D Platformer and Truck Town projects in very small windows with Print FPS enabled and V-Sync disabled (use Chromium's --disable-gpu-vsync --disable-frame-rate-limit CLI arguments), just to see if no additional CPU bottlenecks were added in those projects.

@Faless
Copy link
Collaborator Author

Faless commented Sep 24, 2024

(use Chromium's --disable-gpu-vsync --disable-frame-rate-limit CLI arguments),

That doesn't seem to work, I still get capped at 60 fps....

Nevermind... I didn't copy the string correctly :/

@Faless
Copy link
Collaborator Author

Faless commented Sep 24, 2024

I've tested both track town and the 3d platformer. In both cases the FPS difference is very small.

On the 3d platformer there's almost no difference, I get around 240 FPS on both demos.

Likewise, in track town, there is maybe a 1-2 FPS difference (out of 300), but it might as well be due to random fluctuations (my GPU fan starts spinning like hell when I remove the limit ;-) ).

@Faless
Copy link
Collaborator Author

Faless commented Sep 24, 2024

I made another test using the script in #70838 , these are the results (* 100 means I changed the elements count to be hundred times bigger):

Oz
Time taken (untyped): 20.295msec.
Time taken (typed): 9.356msec.

Os
Time taken (untyped): 19.48msec.
Time taken (typed): 9.319msec.

Oz * 100
Time taken (untyped): 2517.42msec.
Time taken (typed): 1642.606msec.

Os * 100
Time taken (untyped): 2550.795msec.
Time taken (typed): 1579.065msec.

Note: they varied a bit, I tried to get something close to the median value but we might want to be a bit more rigorous

@jordo
Copy link
Contributor

jordo commented Sep 25, 2024

Compiled our web game with O3, Os and Oz and profiled in chrome... General observation is that Oz is about 15-20% slower than Os, and Os about 5-10% slower than O3. 3 specific examples below, but generally in any section I looked at the numbers were similar:

Screenshot 2024-09-24 at 9 17 06 PM

.
Screenshot 2024-09-24 at 9 27 29 PM

.
Screenshot 2024-09-24 at 9 18 18 PM

@jordo
Copy link
Contributor

jordo commented Sep 25, 2024

I'm also currently working on a minimum viable build of the engine to deploy to web... Compiling with Os vs Oz I am able to get these numbers: raw wasm, gzip, and brotli:

Os:
Screenshot 2024-09-24 at 9 31 35 PM

Oz:
Screenshot 2024-09-24 at 9 33 39 PM

@akien-mga
Copy link
Member

Thanks for doing some performance tests! For the record, was this with or without LTO?

A 15% performance hit is pretty significant :/ but the size gain is also really worth having for the Web specifically.

I wonder if we should add a new size_extra (better name suggestions welcome) for -Oz, so we still offer both size (-Os) and -Oz as options. We could default to -Oz but still let users choose to build their templates with -Os if they prefer more performance over smaller size - or the other one, defaulting to more performance might be a safer bet overall, and users can choose to optimize size further if they can take the performance hit.

@Faless
Copy link
Collaborator Author

Faless commented Sep 25, 2024

I wonder if we should add a new size_extra (better name suggestions welcome) for -Oz

I think that's actually a good idea...

We could default to -Oz but still let users choose to build their templates with -Os if they prefer more performance over smaller size - or the other one, defaulting to more performance might be a safer bet overall, and users can choose to optimize size further if they can take the performance hit.

I think we need to decide if we care more about the ~10% size reduction vs the ~15% perf hit on cpu-bound operations.

We should also probably run the tests above with Godot 4 instead of Godot 3, but I expect more or less the same results...

@Faless
Copy link
Collaborator Author

Faless commented Sep 25, 2024

So, I've run some more tests, using the chromium profiler, with Godot 4, LTO, and debug symbols.

Please note, builds with debug symbols may be less optimized then regular builds:

em++: warning: running limited binaryen optimizations because DWARF info requested (or indirectly required) [-Wlimited-postlink-optimizations]

Test script (noise + 1.5MiB webp decode + 1.5MiB webp encode):

extends Node2D

func _ready() -> void:
	var tex := NoiseTexture2D.new()
	tex.width = 10240
	var noise := FastNoiseLite.new()
	noise.noise_type = FastNoiseLite.TYPE_SIMPLEX
	tex.noise = noise
	await tex.changed
	$TextureRect.texture = tex
	
	await get_tree().process_frame
	
	var image := Image.load_from_file("res://image.webp")

	await get_tree().process_frame
	
	var buf := image.save_webp_to_buffer()

	await get_tree().process_frame

	print("done")

Oz:

webp-save-oz
webp-load-oz
noise-oz

Os:

webp-save-os
webp-load-os
noise-os

Comparison:

Noise: 743.33 vs 741.64 (almost no difference, considering run variability)
WebP Decode: 546.46 vs 444.29 (huge, consistent, 20-25% difference)
WebP Encode: 1.92 vs 1.89 (almost no difference, considering run variability)

WebP decode seems hugely affected, while encoding and noise generation seems to have the same performance.

@roughbits01
Copy link

roughbits01 commented Sep 26, 2024

I'm also currently working on a minimum viable build of the engine to deploy to web... Compiling with Os vs Oz I am able to get these numbers: raw wasm, gzip, and brotli:

Os: Screenshot 2024-09-24 at 9 31 35 PM

Oz: Screenshot 2024-09-24 at 9 33 39 PM

Can you please share what steps did you take to get to ~10mb uncompressed? I disabled 3d, disabled almost all modules, and I used optimize = "size" but I get 24mb.

@CodeLazier
Copy link

CodeLazier commented Nov 11, 2024

您能否分享一下您采取了哪些步骤来达到 ~10mb 未压缩的大小?我禁用了 3d,禁用了几乎所有模块,我使用了 optimize = “size”,但我得到了 24mb。

It could possibly be Godot 3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants