Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe 2D rendering performance drops on Windows since Godot 3.3 when using GLES3 #56301

Open
mphe opened this issue Dec 28, 2021 · 10 comments
Open

Comments

@mphe
Copy link
Contributor

mphe commented Dec 28, 2021

Godot version

3.4.2

System information

Windows 10, PopOS 21.10, GLES3, Nvidia GTX 970

Issue description

I'm developing on Linux and recently tested my project on Windows and noticed a significant drop in performance. After some debugging I identified the rendering as the bottleneck.
However, the performance on Linux (PopOS 21.10) is much better, even though it's the same machine, the same project, and the same Godot version.
I continued testing earlier versions of Godot and found out that the performance drop started with Godot 3.2.4 beta 1 and is therefore, maybe, related to the introduction of batching.

I created a small reproduction project that simply renders 200 sprites/lines/rects/circles using Sprite nodes/draw_line/draw_rect/draw_circle. It does not run update() or perform any logic, it will simply draw 200 shapes of a kind once and then idle.
Then I tested the performance for each render-kind, with batching on/off, on Windows 10 and PopOS 21.10, using Godot 3.2.3 (last version before the notable FPS drop) and the latest Godot 3.4.2.

I noticed that Linux reaches more than twice the FPS than Windows in almost all cases.
Even without batching, Linux still performs better in Godot 3.4.2 than on 3.2.3.
On Windows, however, Godot suffers extreme performance drops without batching enabled. E.g. 200 draw_line() calls in Godot 3.4.2 yield 165 FPS, while 3.2.3 still managed about 1000 FPS. In comparison, Linux reaches 1500 FPS in 3.4.2 and 1050 in 3.2.3.
I could reproduce this kind of behavior on other machines, as well.

So, in my original project where I have quite a lot draw_line calls that can't be batched, I get normal performance on Linux but really bad performance on Windows.

I found #54826 and #54377 that seem to be related, but since I was testing with 3.4.2, the PR apparently didn't make any difference. As suggested in #54826, I also tried different opengl options but without success.

Below are tables with average FPS values for all the different test cases.

Godot 3.2.3 (No batching for GLES3)

Test Windows Linux
sprites 2100 4600
draw_line 1000 1050
draw_rect 2000 5800
draw_circle 660 660

Godot 3.4.2

Batching: on

Test Windows Linux
sprites 2400 5400
draw_line 2500 6800
draw_rect 2500 6800
draw_circle 95 830

Batching: off

Test Windows Linux
sprites 2200 4600
draw_line 165 1500
draw_rect 2300 5700
draw_circle 95 830

Steps to reproduce

  1. Get the reproduction project
  2. Open Node2D.tscn
  3. Hide/Show draw_function_test or sprite_test node to test Sprite node performance or draw_ function performance.
  4. In case of draw_function_test, use the "Draw Type" property to test draw_line, draw_rect, or draw_circle
  5. FPS are displayed in the top left corner

Minimal reproduction project

godot-perftest.zip

@Calinou
Copy link
Member

Calinou commented Dec 28, 2021

Can you reproduce this when using GLES2?

cc @lawnjelly

@mphe
Copy link
Contributor Author

mphe commented Dec 28, 2021

Just tested it with draw_line and GLES2 on Windows. With batching enabled it is slightly faster (2700 FPS), without slightly slower (125 FPS).
So, no real difference.

@Calinou Calinou changed the title Severe rendering performance drops on Windows Severe 2D rendering performance drops on Windows since Godot 3.3 when using GLES3 Dec 28, 2021
@lawnjelly
Copy link
Member

lawnjelly commented Dec 29, 2021

The circle is likely slower since 3.2.3 due to either changes to dynamic buffers (but you say you have tried the OpenGL settings) or bug fixes (fixing pre-existing state bugs for robustness can result in slowdown). It isn't batched, and I don't recommend using that primitive. I think this came up in another issue a while ago, circles or arcs, but drawing circles manually using primitives that batch should be much faster.

We could probably make the circle primitive more sensible (convert it to a series of polys on entering the VisualServer instead of a bespoke primitive, in a similar manner to the changes to polyline), but really there has been little demand - not many people seem to use it.

Lines may be similar - I would just advise use batching. Legacy (non-batching) is really only maintained to detect graphical regressions now. It is now usually much more productive to spend time fixing bugs that might prevent you using batching, rather than spend time on the legacy canvas renderer.

Lines

With lines you are referencing some issues that are dealing with a specific case - thick lines. Lines of width 1 (default) and thick lines go through an entirely different pipeline. Also anti-aliasing or not can completely change the pipeline.

To quote from #54826 :

This strange behavior occurs only when drawing lines with width > 1 (tested with width=2). With width=1, 3.4. performance is extremely good.

I wrote the below on the assumption that these were thick lines you were addressing in this issue, but I now see if your demo project you are using default thickness lines (1). I don't know what kind of lines you are using in your actual project, so this may still be relevant, also the advise about not mixing primitives also applies:

Thick lines

Also see the note here regarding draw_lines with thick lines, the difference may be due to the difference between the old routine and drawing as polys:
#54377 (comment)

Essentially the older method of drawing seems to be faster for single lines, but does not scale.

You seem to be having a far greater difference on windows than linux between the two drawing methods, not sure for the reason for this, maybe the drivers are better on linux, or it is more efficient at the API communication necessary on linux.

We could switch to the old routine with batching off as noted on the PR, but I don't think it would help your situation, because you presumably are using batching, and just testing with batching off.

Ideally we would just switch between the two during VisualServerCanvas::canvas_item_add_line, but there is a chicken and egg problem here, when calling this function we don't know yet at this stage whether it will be called once or multiple times, so we optimize for multiple times. There is a slight possibility we could do the switch between the two later in the pipeline (i.e. write them all as some kind of dummy primitive or something, and defer the conversion to the actual primitive till later), but it seems a bit involved and error prone for what might be a fringe case.

A more pragmatic solution in this case would be for you to change your code so that it draws lines together, that way they should be batched and draw way faster. We worked through a similar situation with eirexe and their game, with great success. This should give you the best performance.

Essentially batching will do it's best to re-arrange things to work fast, but in some cases (particularly multiple commands within an item) the user can create a pathological situation (particularly with custom draw routines), and can benefit a lot by simply tweaking their drawing code, i.e.

Instead of drawing within an item:

Line
Circle
Line
Circle
Line
Circle

Draw:

Line
Line
Line
Circle
Circle
Circle

This way the lines will likely be batched.

The docs contain quite a bit of info on optimizing your drawing:
https://docs.godotengine.org/en/stable/tutorials/optimization/batching.html

Single Width lines

I will try and investigate more when I have some time - it is a long time since we added line batching, but it's likely the difference you are seeing is just due to dynamic buffering differences (the OpenGL settings in project settings) and changes to robustness.

@mphe
Copy link
Contributor Author

mphe commented Dec 29, 2021

In my actual project I use a lot of single-width, non-anti-aliased, 20-segment draw_polyline and draw_polyline_colors calls, with individual modulate values and a custom shader that accesses COLOR. According to the batching docs, the custom shader is probably the reason why batching doesn't work well in my project. I don't use any other draw_ functions, or at least not in the same amounts and mostly for debugging purposes.
I have about 6 draw_polyline calls per scene and try to fit as many of those scenes in the game as possible. Each draw_polyline call comes from an individual node in that scene. I could manually try to batch those 6 calls together using a MeshInstance. I'm not sure if that actually improves the performance, but that's a thing I could try. Manually batching all draw_polyline calls from all active instances would require a lot of work and fine-tuning and would break z-ordering.

@lawnjelly
Copy link
Member

lawnjelly commented Dec 29, 2021

It's very situation dependent so difficult to say anything useful. Polylines are not the same as lines, they go through a different path again, and if you are using z ordering this can affect things, and again if you are using a custom shader. It is difficult to predict what paths are being used and whether there indeed is a problem our side, without an MRP.

Also useful can be a "diagnose log" (rendering/batching/debug/diagnose_frame).

@mphe
Copy link
Contributor Author

mphe commented Jan 5, 2022

I'll try to create a stripped down version of my project for testing. However, the issue still stands. Even without batching, 165 FPS for drawing a few lines is ridiculous, so the problem is not really project dependant.

I tested manually grouping my polylines by a) putting them all in the same _draw function, b) using a MeshInstance. Since they are not static and need to be regenerated frequently, b) actually decreased the performance. In a static scene it slightly improved the performance. a) yielded about the same performance as without grouping, maybe minimally better, but I noticed in the diagnose log, that those 6 polylines are now batched.

@Calinou
Copy link
Member

Calinou commented Jan 5, 2022

As a workaround, you can draw long, thin stretched Sprites to use as lines or rely on the Line2D node (slower than Sprite, but makes it easier to draw curves). By using a specially crafted texture, this also allows for antialiasing that works well even for translucent thick lines and even if HDR is enabled in GLES3.

Edit: I released an add-on that does this for you: https://github.com/godot-extended-libraries/godot-antialiased-line2d

@mphe
Copy link
Contributor Author

mphe commented Jan 10, 2022

Line2D is too slow as it needs to be updated almost every frame.
Drawing stretched Sprites sounds interesting, but it will get ugly to make it colored like draw_polyline_colors.
For now, I'll just live it until this either gets fixed or Godot 4.0 is stable enough to switch.

@akien-mga
Copy link
Member

Might be worth checking how it fares in 3.5 RC 3+ now. I don't know if there were specific improvements for this specifically, but there's been rendering improvements all around so it can be worth testing.

@mphe
Copy link
Contributor Author

mphe commented Jun 11, 2022

Unfortunately, it's still roughly the same.

@lawnjelly lawnjelly modified the milestones: 3.5, 3.x Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants