Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cascaded canvas groups and canvas group shader revamp #74859

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SlugFiller
Copy link
Contributor

@SlugFiller SlugFiller commented Mar 13, 2023

What this PR is

Add the ability of putting canvas group instead of another canvas group.

What to test

  • Test on your project if you use canvas groups with custom shaders, there have been changes. There should not be a need to include the shader in https://docs.godotengine.org/en/stable/classes/class_canvasgroup.html anymore.
  • Experiment with putting canvas groups as children of other canvas group, and different clip_children combinations
  • Check interaction of canvas groups and clip_children nodes with backbuffer shaders, and BackBufferCopy nodes
  • Test both Forward+ and Compatibility renderer
  • Report any glitch or performance issue or unexpected behaviour

Big list of changes related to canvas groups and clip children:

  • Canvas groups can now be placed inside other canvas groups. Likewise for nodes with clip children. Closes Clip+Draw on parent plus Clip Only on child has different behavior on opengl and vulkan #74186, sort of.
  • Items inside a canvas group can now use the back buffer. However, the back buffer only contains item drawn inside the same canvas group, and does not contain items from the containing scene. Since the back buffer is not valid between canvas groups, a scene with n canvas groups and no BackBufferCopy nodes may perform up to 2n+1 back buffer copies (one inside, and one after each canvas group, plus one for the main scene).
  • The canvas group default shader no longer uses hint_screen_texture for the drawn children texture. Instead, a new hint_mask_texture is added. However, hint_screen_texture is still used as an alias for hint_mask_texture if used inside a canvas group or canvas item with clip children set, to preserve compatibility with existing shaders.
  • If a shader that does not use hint_mask_texture (nor hint_screen_texture) is added to a canvas group or parent with clip children, the masking effect is applied automatically, and the shader can control the tint and opacity applied to the children, via COLOR. If hint_mask_texture is used, the default masking operation is disabled, and a custom blending effect can be applied. Closes Improve CanvasItem Clip Children for more flexibility godot-proposals#5595
  • CanvasItemMaterial works as expected, allowing to change the blend mode without losing the default masking effect. (Re)fixes CanvasGroup breaks with any material applied #51204
  • New clip mode: Subtract. Causes the opaque part of children to become transparent for the parent, cutting holes into it. Closes Add a "Mask2D" node (and/or a "mask_parent" property for Sprite2D) godot-proposals#4282.
  • Clip-and-draw is done in one pass instead of two, fixing an issue where the parent's alpha would be applied twice, and hence squared.
  • Canvas group buffers inherit MSAA settings from the viewport. Fixes Vulkan: 2D msaa does not AA clipped children #66005

Breaking changes:

  • Setting use_mipmaps on CanvasGroup is now largely pointless, since any shader that uses a mipmap-enabled sampler on the mask texture will automatically trigger mipmap generation. The value still works, and will generate mipmaps even when unused by the shader.

Video memory usage: Cascading canvas groups and nodes with clip children creates full-screen buffers based on the cascading depth. Allocated buffers are cleared after a lower depth of canvas groups has been maintained for more than 2 seconds.

@insomniacUNDERSCORElemon
Copy link

insomniacUNDERSCORElemon commented Mar 15, 2023

I am not sure what hint_mask_texture does without testing, aside from subtract is there anything of the other bits of my proposal? Behind parent (alternatively not being broken by render order) and restore alpha? EDIT: Nevermind I see the bit about a custom shader, nice but I'm not so sure how that will work (especially compared to making existing rendering features work as expected) though it might not be so bad if it's easy to re-use.

(Well, nesting and subtract will go a long way, but the 2 features I described should make certain things more direct)

Also, any change for MSAA support of the clipped nodes? Though if some type of slight blur can be had that might work too, especially with individual tuning.

@SlugFiller
Copy link
Contributor Author

@insomniacUNDERSCORElemon There are two ways to achieve "behind parent". The main part of it is that it distinguishes between "transparent" and "not drawn". For a parent that's a polygon2D, this is pretty straightforward. The polygon's border is enforced by the graphic's card, as being part of a mesh, while the alpha exists in the polygon's texture. So what you can do is take the mask texture, and mix the polygon's texture on top of it. It would be something like this:

shader_type canvas_item;
render_mode blend_premul_alpha;

uniform sampler2D mask_texture : hint_mask_texture, repeat_disable, filter_nearest;

void fragment() {
    vec4 c = textureLod(mask_texture, SCREEN_UV, 0.0);

    c.rgb *= (1.0-COLOR.a);
    c.rgb += COLOR.rgb*COLOR.a;
    c.a = 1.0-(1.0-c.a)*(1.0-COLOR.a);
}

For a sprite, the original drawing region is a square, and this is probably not what you would want as the masking for the "children behind the parent". So, the next best option is to separate near-0 alpha from any other alpha with an if (COLOR.a > 0.001). It would screw anti-aliasing, though. So not exactly perfect. But the best you can have with a sprite.

The best solution, of course, is to clip the child with a "full" parent sprite, and then draw the "cut out" parent sprite above that.

Also, I'm not sure what issues you have with MSAA, but if you want nodes that are "blurred a bit", you can just use mipmaps:

uniform sampler2D mask_texture : hint_mask_texture, repeat_disable, filter_linear_mipmap;

void fragment() {
    vec4 c = textureLod(mask_texture, SCREEN_UV, 1.0); // LoD=1.0 for first level mipmap. Use higher values for blurrier results.

In other words, you have pretty crazy options to play with. The only limitation with this PR is one-parent many-children. And I'm hoping to fix this with a following PR, if this gets merged.

@insomniacUNDERSCORElemon

Also, I'm not sure what issues you have with MSAA

It doesn't work here. #66005

Forgive the outdated picture, but it still applies in 4.0.

There are two ways to achieve "behind parent".

I was not asking how, I was asking for said basic functionality to not need to be added by (or unknown to) every user.

Clipping to the parent's shape is already done so you just need a way to send it to the back after is has been rendered after subtractions you don't want applied to it. It's either that or a way to make a node un-subtractable (or I would say unclippable because that'd be useful too, but in this case it still needs to be clipped by the parent), but existing options being helpful would be more elegant.

Restore alpha is similar (it allows more complex designs that you might not want to/be-able-to do with subtract, also easier editing), though I get that's further down and it's reasonable not wanting to clutter the options of a new feature.

@SlugFiller
Copy link
Contributor Author

@insomniacUNDERSCORElemon I see the issue with MSAA. It seems the AA settings aren't being inherited by the canvas framebuffer. I'll take a look at it.

I was asking for said basic functionality to not need to be added

It's not basic functionality, though. The notion of (alpha = 0) != not_drawn is something that doesn't exist in any art program I know. The correct way to achieve the specific action stack in the picture you provided is to first separate the parent to alpha and color, then use the following tree:

Parent alpha (clip-only)
-Behind parent
-Parent color (subtract)
--Subtract alpha (subtract)
---Add alpha

If manually extracting color and alpha from the parent is too much, it is possible to achieve it via shaders. But really not recommended, because you want the color to be padded past the clipped area, to avoid artifacts.

Automating this whole thing for any user with no artist effort is not practical for the same reason Godot doesn't have a one-click RTS creator template. Godot is only meant to enable you to do what you want, with a little bit of ingenuity from yourself. Doing it any other way would not allow Godot's runtime to remain a tiny 30mb download, and the editor itself to be just over 100mb.

A good artist would already know how to mix effects in a way that achieves what they want.

@SlugFiller SlugFiller force-pushed the canvas-group-revamp branch 2 times, most recently from e2ebde0 to 11ec9d4 Compare March 16, 2023 17:42
@SlugFiller
Copy link
Contributor Author

Fixed MSAA inside canvas group buffer. This only applies to Vulkan, since GLES3 doesn't seem to have MSAA implemented.

I was kind of surprised to find that default AA doesn't seem to work for 2D polygons.

@insomniacUNDERSCORElemon
Copy link

insomniacUNDERSCORElemon commented Oct 5, 2023

Finally tested this. MSAA working and it's a definite improvement.

I don't like the setup (and resulting limitations) of subtract here (particularly if you want clipping+subtraction at the same level in the same tree). Though to be clear it is easy and perfect for many workflows despite that limit.

I had multiply/subtract working with transparency in the material blend modes before (note: invisible-parent option isn't relevant because clip-only option does that), and it would be a nice option to have here to not have that limit:

screenshot of clipped blend modes

clipping_enhanced_w_selfmodulate

It should just be a tiny shader adjustment, unless there is some issue with this? EDIT: Ok I see, the blend mode can no longer change parent opacity (according to my previous comments, it could at one point even before my edits), but the current implementation can affect all the
other nodes, restoring the parent pixels basically... not my first choice (for 2 untextured polygons, you need to add vertex colors to even be tell it's not just a polygon with the same color). but that is useful in a similar but different way. Is a material transparency flag (specifically allowing it to affect (immediate) parent transparency) a possibility?

On top of subtraction alongside clipping, the PR here doesn't seem to break when order changes (like older versions) so a blend mode and then a -1 z index node after it might work.

If that does work, it would allow my example in my proposal... though without restore alpha (which is a better name than add) option, it would require more nodes. Which honestly can be worse or better to edit depending on the situation, they are opposite workflows.

Also an option to exclude even below/above nodes (first/last in tree? as those are the most logical to exclude) from being clipped would make non-static designs less tedious.

EDIT: I did a character test, 4 unique uses of nested clipping (eye shine, pupils, eyelids, and the entire face)

cascade_test_head

I was happy to see that subtract mode also nests with itself, as I highlighted in one of the eyes (the other eye shine is 2 overlapping lines that aren't subtracted).

This is actually animated as well (psuedo-3D rotation), but aside from lacking an easy way to show it in decent quality the animation also doesn't look good when the face is at the edge of the head (fault of the skew and/or me)

Better example of nested subtract (3 levels)

triple_subtract

Nesting subtraction with partial transparency is also a bit wonky (understandable), even aside from fonts. With an opaque parent (final subtract) and partially transparent child, any overlapping sections of the parent is visible for some unexpected reason:

Screenshot_2023-10-07_05-42-38

(I would expect an intersection of 2 opaque areas to not create an opacity difference)

ERR_FAIL_COND(rt->direct_to_screen);
// Caller should already check both of these, and assume there's a valid backbuffer on return
DEV_ASSERT(rt->backbuffer_fbo == 0);
DEV_ASSERT(rt->direct_to_screen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DEV_ASSERT(rt->direct_to_screen);
DEV_ASSERT(!rt->direct_to_screen);

You've kept the condition unchanged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I adjusted the list of things to test at the top, to better cover issues like this.

@@ -355,7 +353,7 @@ class RasterizerCanvasGLES3 : public RendererCanvasRender {
void _prepare_canvas_texture(RID p_texture, RS::CanvasItemTextureFilter p_base_filter, RS::CanvasItemTextureRepeat p_base_repeat, uint32_t &r_index, Size2 &r_texpixel_size);

void canvas_render_items(RID p_to_render_target, Item *p_item_list, const Color &p_modulate, Light *p_light_list, Light *p_directional_list, const Transform2D &p_canvas_transform, RS::CanvasItemTextureFilter p_default_filter, RS::CanvasItemTextureRepeat p_default_repeat, bool p_snap_2d_vertices_to_pixel, bool &r_sdf_used) override;
void _render_items(RID p_to_render_target, int p_item_count, const Transform2D &p_canvas_transform_inverse, Light *p_lights, bool &r_sdf_used, bool p_to_backbuffer = false);
void _render_items(RID p_to_render_target, int p_item_count, const Transform2D &p_canvas_transform_inverse, Light *p_lights, bool &r_sdf_used, uint32_t canvas_group_level, bool p_alias_screen_to_mask = false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void _render_items(RID p_to_render_target, int p_item_count, const Transform2D &p_canvas_transform_inverse, Light *p_lights, bool &r_sdf_used, uint32_t canvas_group_level, bool p_alias_screen_to_mask = false);
void _render_items(RID p_to_render_target, int p_item_count, const Transform2D &p_canvas_transform_inverse, Light *p_lights, bool &r_sdf_used, uint32_t p_canvas_group_level, bool p_alias_screen_to_mask = false);

@SlugFiller SlugFiller force-pushed the canvas-group-revamp branch 2 times, most recently from 00d690f to beac7b7 Compare October 7, 2023 10:21
@SlugFiller
Copy link
Contributor Author

@insomniacUNDERSCORElemon I tried to explain it before, but "restore parent alpha" is not a possibility, because the alpha is multiplied during render, meaning, the use of alpha is lossy. As in, pixels that are deleted using alpha=0 are truly deleted. If you "restored" them, they would be black.

This is an inevitable effect of how alpha blending works in the GPU. The only way to circumvent it is to first do drawing on a black and white image/canvas representing the alpha mask, and then using a custom shader to turn it into an actual alpha mask multiplied with the parent.

Nevertheless, if you believe a "tiny shader change" can fix this, you can always create your own shader that blends the mask with the parent's colors and alpha in any way you desire. So you're free to try.

As for Z-index, I explained it elsewhere multiple times, but Z-index completely bypasses the tree order, meaning items with a different Z-index, for all intents and purposes, exist in a separate scene tree. So they are no longer considered children of the matching canvas group. Canvas group effects will therefore not apply to a child with a different Z-index.

Incidentally, a similar but slightly different effect applies to Y-sorting. A child of a Y-sorted node is transformed into a sibling of that node, so the parent is sorted as one of the children. That means that if said parent is a canvas group, its effects no longer apply to its children. This can be worked around by putting a sorting parent inside a canvas group parent, so the canvas group parent itself is not sorting.

@insomniacUNDERSCORElemon

"restore parent alpha" is not a possibility
if you believe a "tiny shader change" can fix this

With that bit I was only talking about final transparency with the multiply material, and the context was even a bit wrong (as older versions of the clip shader actually did allow it, as seen below).

FWIW I did get a basic restore alpha with an override shader (red remove, green restore, render everything else) working here (removing render_mode blend_premul_alpha; is one key), but it's clunky and not ideal as I describe below.

shader

Overriding the clip shader for all child nodes is the problem though, similar with subtraction working that way too. I want a way to also work up (or onto things that are rendered, not ones that will render) or at least something that is optional.

Particularly I'm not aware of any option for differentiation with a general shader other than to use color (which is made worse because MSAA introduces an outline that isn't purely the intended color).

With the old behavior, this (the orange/red) is 1 parent node with 3 child nodes (1 clipping, 2 deleting alpha via multiply):

ArcoLinux_2022-10-10_03-49-11 (copy 1)

I tried to clarify in my edit, but the actual issue with multiply is just the behavior of parent alpha as the multiply material otherwise works as expected. The old and new behaviors are both useful for different reasons.

Canvas group effects will therefore not apply to a child with a different Z-index
That means that if said parent is a canvas group, its effects no longer apply to its children.

Sorry, I saw z-index having some effect while clipped... I missed that on its own it looks like it's clamped to the parent index (allowing children to be re-ordered) and that it actually can bug onto other visible (clip/CG) nodes (which is not a new thing). Also a weird wobble (at least in the editor) when zooming in and out.

I looked at old versions again, the ability to break out of clipping was useful. And is probably a better behavior particularly as it makes it immediately obvious that it isn't properly integrated. Although from what you're saying it sounds like there is not much point for z-index anyway, at least in this context.

@insomniacUNDERSCORElemon
Copy link

insomniacUNDERSCORElemon commented Oct 9, 2023

I may have found a bigger issue: performance can easily be tanked depending on MSAA settings (context: 1050Ti, 1080p window size).

With x8MSAA, there is a 20FPS drop with 6 nodes of canvasgroups/clipping. FPS returns to 60 if clipping is disabled instead.
With x2MSAA, there is a 20FPS drop with 20 usages.
With MSAA off, it seems to drop 20FPS with about 350 usages.

This does not seem to depend on the type of content much (CG, what type of clipping, transparency, texture vs polygon, put onto a tilemap, animation doesn't matter). Except that nested clipping seems to perform better, x8MSAA only a 15FPS drop at 6 layers (and seeming to not drop much at 7 layers).

Perhaps it belongs to the MSAA implementation and thus not this PR (and maybe even discussed/tried/working-on etc already), I hope optimization is possible (less calls for static objects/discrete animation, batching on tilemaps) as it's obvious for doing less (a simpler/stylized style or just... nothing happening) to have less hardware strain than having everything zip-zooping everywhere.

Does HTML5 support MSAA? EDIT: It doesn't seem to, but also the clipping in 4.1.1 doesn't seem to have as much of an impact to FPS as with this PR (though again, this could be in MSAA code as well).

@golddotasksquestions
Copy link

golddotasksquestions commented Dec 22, 2023

New clip mode: Subtract. Causes the opaque part of children to become transparent for the parent, cutting holes into it. Closes godotengine/godot-proposals#4282.

I don't think this closes the Mask2D proposal as it is still the parent which decides to mask itself. It also chooses all children. We could require the user to assign particular children for masking, but it is already far less user firendly as the proposed Mask2D.

The whole purpose of the Mask2D proposal was about usability and making it easily discoverable, intuitive and simple to mask any CanvasItem. This proposal does not cover what is proposed in the Mask2D proposal to make this happen. For that I still think we need a Mask2D node.

@SlugFiller
Copy link
Contributor Author

Masking one child at a time wouldn't make much sense though, because you'd need to copy the back buffer for each child you add to the mask. That would just be horribly inefficient to the point of running at a fraction of a FPS. Also, if it's able to have some children as masks and some as not, then it's masking previous siblings, not just the parent.

Or, alternately, if it's only applying to the parent, then there's no point to the order of the children, and you should just apply all masks before drawing subsequent children. This can be done by making said "subsequent children" actually be following siblings of the parent instead of children. From the get-go, the fact that a mixed situation is ambiguous in how it should be rendered makes the application of it undesirable. Render results should not be ambiguous.

This PR achieves the desired outcome for using multiple masked children, with the same scene structure, and just slightly different property settings. And it's about as intuitive as any other feature in Godot.

@vyshliy
Copy link

vyshliy commented Dec 23, 2023

Masking one child at a time wouldn't make much sense though, because you'd need to copy the back buffer for each child you add to the mask. That would just be horribly inefficient to the point of running at a fraction of a FPS. Also, if it's able to have some children as masks and some as not, then it's masking previous siblings, not just the parent.

Thus, it will effectively not provide what is required from the mask. This seems to be a fairly common case of laying out a tree using nested sprites. This approach makes this impossible with clip_children. Node with clip_children enabled is the last visible node in the branch, since all its descendants are transparent masks of it. Is there a workaround here?

I think that proposal is mostly about usability and it's definitely a different point of view. Is it possible to implement within this PR a (possibly ineffective) Mask2D node, but more convenient for use?

@SlugFiller
Copy link
Contributor Author

Again, trying to mix mask and non-mask nodes under the same parent is ambiguous. As in, if you ask 10 different people what the render result of such a set-up should be, you'll get 10 different answers. To call this "intuitive" or "convenient" is a stretch, to put it mildly.

In theory, you could create a "cut into previous siblings" node by using backbuffer effects. But that would make it an issue when mixing with the parent. And, again, we're talking under 1 FPS results, if you have even a handful of sprites on the screen.

Setting up a clear parent to be cut, and cutting children is a lot more clear and intuitive. And the tree order is unchanged from a parent with multiple "Mask2D" children (And no non-"Mask2D" children). It doesn't get more intuitive than that.

What I do see possibly lacking is multiple children cut by multiple children. That requires accessing multiple buffers at the same time. This could be done by combining the backbuffer with a canvas group buffer, but the aliasing of screen_hint to mask_hint for compatibility makes this impossible. Maybe it can be fixed by turning off the alias if mask_hint is present in the shader.

@SlugFiller
Copy link
Contributor Author

Changed it so screen_hint to mask_hint aliasing doesn't happen if mask_hint is present in the shader. Since compatibility mode is clearly not needed in such a case.

@vyshliy
Copy link

vyshliy commented Dec 23, 2023

Again, trying to mix mask and non-mask nodes under the same parent is ambiguous. As in, if you ask 10 different people what the render result of such a set-up should be, you'll get 10 different answers. To call this "intuitive" or "convenient" is a stretch, to put it mildly.

You are right, there can be many opinions. Also I'm not sure what exactly is happening with performance inside the engine, I think you are much more competent here. As I understand it, clip_children was intended as a simple way to mask without using shaders. It's definitely useful in many cases, although in other situations I just don't understand how to work with it. Perhaps the use of this function is simply intended to be limited due to the rendering structure and the binding of the mask to the parent. I think the proposal noted above is about creating the mask as a separate entity that is somewhat easier to use. Is it possible to discuss the creation of such an abstract node here or is this offtopic? Here is a simple description of the proposed node.

If this is off-topic, then it’s hardly worth closing the proposal if it asks for another functionality.

Here is a basic list of differences between the current method and the proposed one:

  • In graphic editors, a mask is usually created as a separate entity. In the clip_children approach, the mask is determined by its parent and essentially all the heirs are the mask. As a consequence, the heir cannot be a mask, and the mask cannot be transferred to the other parent.

  • For varied use, the mask must be able to influence several nodes at once. Previously, this functionality was available using Light2D, but was removed in version 4. Currently it only masks the parent or heir.

  • For dynamic masking, it is convenient if the mask is independent of the position of the masked node. The same was implemented using Light2D in 3.x. Clip_children only works among relatives.

  • It’s good if the mask can be adjacent to other nodes. This way the structure of the tree is much simpler; there is no need to group masks into separate nodes to isolate its effect, as you need to do with clip_children.

This is more of a consultation than a critique of the current implementation. Is it possible to create a similar node based on BackBuffer effects?

@404Fox
Copy link

404Fox commented Jun 30, 2024

My project's design has visuals that depend on the ability to nest multiple canvas items with clip_children enabled, animating nested and parent nodes, along with some basic fragment shaders. I'm making my project on this branch and it's working well so far, so I personally would really like to see this merged. This resolves the up-front nesting and canvas group limitations of clip_children, more functionality can always be added later.

Regarding the Mask2D proposal,

  • Would it still be possible for someone to add the functionality for clip_children to target specific children, or for a clip to target several nodes ignoring the tree in/after this PR?
  • It is possible to make children ignore the parent's (transform) position (for the case of having a moving mask affecting independently moving children). There is a RemoteTransform2D node which can do this or one could implement a script that subtracts the parent transform, or add a flag to the function that implements parent/child transform relationship for in-editor usage.

I designed my scene and art with clip_children in mind. The downward-only behavior of clip_children makes it easy for me to understand clipping when arranging the scene tree. Mask2D nodes could get the job done but there would be a lot of clutter since I use 10+ unique masks in a scene, not to mention the complication of using clip + draw. That's just to say there are different use cases for a clip_children property versus Mask2D as a node which would make other use cases easier to do without the aforementioned change/workarounds, inefficiency or ambiguity aside.

@insomniacUNDERSCORElemon
Copy link

insomniacUNDERSCORElemon commented Jun 30, 2024

@404Fox have you evaluated your project for performance? As said in my previous comment, it doesn't take many parallel usages (350) of this feature to cause a 20FPS drop without adding anything else, but with MSAA turned on the usage budget becomes comically low (6-20) that most games (tilemaps, spawning enemies) will have absolutely no problem smashing past. When I tested it at least*.

I suspect this feature might be intended for simple things like a title screen or other one-at-a-time uses, so if this is not viable as the backbone for a project's art style that might end up as a won'tfix.

*=Small update: I just ran my test project again for a sanity check, EDIT: nevermind I accidentally tested the stable version of clipping in my last edit, testing the actual cascade build gives me the same low-performance results as before.

Possibly more viable with some other AA technique (downscaling?).

@404Fox
Copy link

404Fox commented Jun 30, 2024

@insomniacUNDERSCORElemon
I can see that now that you mention it, although my project doesn't need MSAA nor multiple hundreds of uses.

I put together a performance test that constantly spawns a scene with x5 uses of clipping, running at 4K resolution.
Screenshot_20240630_021403

You can observe the deltas between the current clip and this PR's clip, with and without MSAA. This is to try and rule out the combination of MSAA+this PR's clipping having some unusual performance loss over the base line. MSAA+clip is already expensive in the current implementation (yellow), but the clip cost from beta to this PR is too (red -> orange; no MSAA) and (yellow -> blue; with MSAA). We can also see by the purple and light-blue lines (versus others) that clipping in general has a significant performance cost, and that the PR does not harm non-clipping performance.

When you are using lots of clipping you will eventually reach a point of poor performance even in the current implementation. But that limit is lower here. I don't think it is at a level that makes this unusable for some games but it is worth seeing if there are ways performance can be improved, ideally if it could be close to the current clip implementation, or reach that under certain conditions such as if nesting is not being used. Or perhaps you could look at it the other way around, that we need better AA solutions, like a method that produces a result just as good as MSAA but with a lower cost.

This is run on a 7900XT so don't take the instance (x5 uses) number/framerate as a representation of end-user use case like a minimum spec PC or steam deck, it's just to illustrate the delta between different conditions.

Edit:
And here is an issue discussing clip performance issues in the current implementation.

@SlugFiller
Copy link
Contributor Author

The reason clipping is slow is because it needs to switch drawing to a different target, and also stop any draw calls batching in between. GPUs are optimized to make lots of draws into a single buffer, not to draw into multiple buffers.

However, to do any form of complex compositing, there isn't really much in terms of an alternative to using multiple buffers. Maybe combining a mega-buffer with compute shader rendering, or writable textures. But either option would reduce GPU compatibility, as well as require overhauling fundamental parts of the renderer. Ultimately, GPUs aren't made for reading and writing to the same buffer by default, and switching back and forth between two buffers introduces an inevitable pause.

There are a few places where performance could maybe be improved a bit, with regards to blitting and dirty regions, or reuse of buffers. But not to the point of hitting significant performance gains.

@insomniacUNDERSCORElemon
Copy link

insomniacUNDERSCORElemon commented Jun 30, 2024

@SlugFiller I should've pointed it out after doing the testing, but the big point is that for the same performance impact this PR can only render 22-25% of clipping instances compared to the implementation in 4.2.2. Which is to say the impact with this PR is there whether or not you're actually using clipping to do something complex (my tester is a literal tile design on a tilemap with clipping just used to put a line onto a polygon).

Though the discussions related to what 404Fox linked to (mention of cache, proposal 8747) is relevant here too, as allowing explicitly static art (to not impact every frame) would open up a lot of possibility. It would make sense too if low-FPS frame animation was less costly than linear animation. Though ideally this would be some backend optimization rather than another thing the user needs to change on every node (assuming it's viable, I mean it seems like the engine should know if something is never/sometimes/always updated). EDIT: Made a comment on that proposal

Also small note: I tried the newest version (same performance), crashes on -1 Z index w/clipping

@bmorledge-hampton19
Copy link

Are performance issues with clipping the primary thing keeping this PR from being merged? I for one would love to see it merged if only to have nested CanvasGroup nodes and more flexible render modes. Would it be possible to separate this functionality from the clipping and merge these changes sooner rather than later, or are these features too tightly coupled at this point?

Regardless, thanks for all your hard work on this. I've been really struggling with the current CanvasGroup implementation, and these changes would fix a lot of problems for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment