Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly low performance of temporal antialiasing due to high GPU cost #61905

Open
Calinou opened this issue Jun 10, 2022 · 7 comments
Open

Comments

@Calinou
Copy link
Member

Calinou commented Jun 10, 2022

Godot version

4.0.alpha (b9375ea)

System information

Fedora 36, GeForce GTX 1080 (NVIDIA 510.68.02)

Issue description

In Godot, enabling TAA on a GTX 1080 and a 2560×1440 viewport takes up more than 1.2 ms of GPU time, even in an empty scene with just a Camera3D. The performance impact seems fairly constant regardless of the scene's contents:

Empty scene

TAA disabled TAA enabled
2022-06-10_17 25 36 2022-06-10_17 25 44

1 BoxMesh

TAA disabled TAA enabled
2022-06-10_17 24 39 2022-06-10_17 24 44

1 BoxMesh + 1 DirectionalLight3D with shadows + Default Environment

TAA disabled TAA enabled
2022-06-10_17 25 09 2022-06-10_17 25 15

According to the View Frame Time panel, nearly all of the rendering cost is on the GPU. CPU frame time barely changes when TAA is enabled, at least in simple scenes. Therefore, motion vector generation isn't the bottleneck. The actual TAA shader is more likely to be a bottleneck.

Replacing the main() function's contents with just return; in the taa_resolve.glsl shader results in a black image, but still increases GPU time by 0.7 mspf compared to TAA disabled.

Renderdoc confirms that TAA is indeed taking around 1.2 ms of GPU time. (I don't have access to actual Vulkan profiling tools on this GPU, as it's too old to use Vulkan profiling on NSight.)

At lower resolutions, the performance impact of TAA is much less noticeable:

575×310

TAA disabled TAA enabled
2022-06-10_17 33 47 2022-06-10_17 33 54

905×550

TAA disabled TAA enabled
2022-06-10_17 34 32 2022-06-10_17 34 38

This doesn't compare favorably to the TAA implementation in other open source rendering engines. For example, in Tesseract, in a semi-complex scene with many objects and lights, the frame time difference between TAA disabled and enabled is only ~0.2 ms ((1.0/333 - 1.0/357) * 1000):

TAA disabled in Tesseract TAA enabled in Tesseract
2022-05-24_22 49 49_complex_edit 2022-05-24_22 49 45_complex_edit

There are many technical differences between Godot 4 and Tesseract's rendering engines:

  • Tesseract uses traditional deferred rendering, not clustered forward rendering. It has a non-PBR renderer and favors performance over flexibility.
  • Tesseract uses a seemingly simpler (yet effective) form of TAA called TQAA (temporal quincunx antialiasing).
  • Tesseract uses OpenGL 4.3 on supported hardware (with 3.3 as a minimum), not Vulkan. This is less efficient than Godot on paper, even though OpenGL can be pretty efficient if used well.

Still, I feel TAA should not be this expensive on the GPU in Godot.

Steps to reproduce

  • Enable Perspective > View Frame Time in the top-left corner of the 3D viewport.
  • Enable Use Taa in the Project Settings.

Minimal reproduction project

test_taa_performance.zip

@Zireael07
Copy link
Contributor

Tesseract uses a seemingly simpler (yet effective) form of TAA called TQAA (temporal quincunx antialiasing).

Interesting. Sounds like something Godot should try, especially for GLES3 backend imho

@Calinou
Copy link
Member Author

Calinou commented Jun 10, 2022

Interesting. Sounds like something Godot should try, especially for GLES3 backend imho

The OpenGL backend will probably never get TAA, as it's intended for old/low-end hardware where TAA is too expensive. A TAA implementation also adds a lot of complexity to a renderer, and I think the OpenGL renderer is best kept simple so we can focus on making it stable.

I think Godot using a forward renderer will penalize it for TAA (compared to a deferred renderer), but there are probably some optimizations we can figure out.

@mrjustaguy
Copy link
Contributor

To my Knowledge, Forward vs Deferred rendering should have No impact on the TAA Costs.

@Calinou
Copy link
Member Author

Calinou commented Jul 6, 2022

JFonS said that the high TAA cost is expected due to how it works currently. It's a separate pass that requires a full-screen copy, which is expensive in itself. I suppose avoiding this copy could halve the GPU cost of TAA, if not more.

To make TAA cheaper, it should avoid performing this copy. For instance, this can be done by moving TAA to the tonemapping shader, but doing so will break FXAA and glow (so they won't be usable at the same time as TAA). Another solution needs to be found – let us know if you can think of one 🙂

@Calinou
Copy link
Member Author

Calinou commented Apr 12, 2023

We discussed this issue in today's rendering meeting and concluded on possible optimizations:

  • Do not run the motion vector generation for objects that haven't moved since the last frame.
  • Generate motion vectors in screen-space for what's in view.

@mrjustaguy
Copy link
Contributor

I've been trying to wrap my head around the TAA to tonemap migration for a while, and I don't get it.. Why would that break FXAA and glow? I mean the TAA step could be done before the two, and you'd apply those to the TAA result... Right?

I mean based on the post above I'm guessing currently TAA works something like scenario 1:

  1. Creates copy of Full Screen
  2. Does TAA
  3. Passes result to Tonemapper
  4. Tonemap applies Glow/FXAA if enabled

or less likely, scenario 2 like:

  1. Tonemapper applies Glow/FXAA if enabled
  2. TAA Creates copy of Full Screen based on Tonemapper output
  3. Does TAA

Whereas TAA in Tonemapper would work something like this:

  1. Does TAA
  2. Modifies output with Glow/FXAA after the TAA step is done

The main differences are that FXAA/Glow would possibly change from being applied before TAA to after TAA, which for Glow wouldn't change things much, but for FXAA could result in visible differences, as it'd look kind of like applying a filter to TAA, whereas if FXAA image is getting TAA it'd smear it a little

@jclounge
Copy link

jclounge commented Feb 25, 2024

Not sure if this is the same exact issue, but on my iMac with AMD 560X GPU and Godot 4.2.1 mono, enabling TAA slows everything down severely. With a completely empty 3D scene the editor becomes sluggish even without any anti-aliasing being visible in the viewport. Adding only a camera to the scene and then running the game makes it render at around 40fps with TAA enabled, compared to around 400fps with TAA disabled. Other AA modes seem to run fine. EDIT: This is without retina mode enabled, so the screen resolution is only 2048x1152. If I make the game window full-screen, the fps plummets to around 10fps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: For team assessment
Development

No branches or pull requests

5 participants