-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Sprite Batching #2642
Implement Sprite Batching #2642
Conversation
c2622d1
to
735ada3
Compare
I'm curious as to how well it does with the render rework. Edit: Nevermind, it is with the render rework, haha. |
I've added this to the 0.6 milestone; it would be very good to finally be able to improve this with the new renderer. |
@alice-i-cecile , great! It should be almost ready, but I wanted to do a once-over before I marked it as ready for review, I should get to that sometime within a week. |
735ada3
to
cecaf76
Compare
I fixed the last of the bugs that I know of and I this should now be ready for review! |
I've done a review pass on this and it looks solid to me. Clear, well-organized and no obviously questionable design choices. I'm no rendering expert though, so take that with a grain of salt. Do we have a strategy for automated testing or benchmarking of rendering features like this set up? |
031c7ab
to
a55ae25
Compare
a55ae25
to
75d495c
Compare
Sprites with the same texture will now be rendered with one draw call.
75d495c
to
d386c96
Compare
Pushed an extra commit that implements proper depth sorting and sprite flipping with only slight changes to the previous commit.
Not that I know of. |
Just wrapped up a first pass review on this and it looks good to me generally. I'm gonna do one more pass tomorrow, then I'm gonna try to sort out the best way to reconcile these changes with the custom-shaders branch (ex: which to merge first). I think its probably easier to just merge this first and then adapt on my side, but I guess we'll see :) |
We have most of the renderer foundations laid for 0.6 and I'm starting the "update and merge as many open features as possible" phase. How do you want to handle this pr?
I think starting from scratch might actually be easier than resolving conflicts everywhere. But I'll defer to you here (and ill default to resolving merge conflicts with this code and creating a new pr if i dont hear back from you). |
I'm totally fine with you just creating a new PR. It's no big deal to me on authorship or anything. I got in some good learning on this PR and if the renderer's changed as much to make it easier to create a new PR, I'm totally fine if you go ahead with that. I'll probably not have time to invest in making a new one super soon so it's probably more timely for you to do it if that's not far out of your way. |
Cool sounds good to me! |
I'd just like to point out that this PR is a little misleading. It says "batching", but the code actually implements instancing, not batching. Reading through the source code, I see that it draws the sprites with an instanced draw call. From all the research about the topic that I've done so far, instancing has been suggested as not optimal for sprites (because of low GPU occupancy, as the mesh is just a quad). Batching can supposedly be expected to have better performance. For anyone who doesn't know what the difference is: Instancing is a "hardware" technique, where you tell the GPU to draw many copies (instances) of the same mesh. You prepare and upload just the one basic mesh to the GPU. Every instance can have different uniforms / per-instance data inputs (like different transforms) for the shader. Batching is a "software" technique, where the data from all the objects is combined/merged together into one big mesh, on the CPU side, and then just rendered with a regular draw call. From the POV of the GPU, it is working with just one big mesh. With batching, you have more data to move around, and you have to compute transforms and such on the CPU. With instancing, the vertex shader can do the matrix multiplies on the GPU. However, with tiny meshes (like our quads), the GPU hardware is underutilized (because every instance is a separate wavefront/warp) and these aren't too many computations, so the CPU can do them pretty fast. AFAIK, the fastest sprite renderers are batched, not instanced. I just wanted to point this out. It's up to @cart what to do. If this PR is going to be reimplemented anyway, maybe it's worth trying to implement batching instead? Or if it's easier, to not delay 0.6 further, I guess you can just do this instancing implementation. It's already a big improvement, compared to not having it. Maybe we can just have that for now, and explore the possibility of further improving sprite rendering performance with batching later, in the future. Just please, if we have an instanced implementation, call it "instancing", not "batching", in the new PR and in the release notes, as to not be misleading about what it actually is. |
Hi @inodentry thanks for the breakdown! I wasn't aware of the difference. Interesting that batching would be faster than instancing. My naive perspective was that we were going to gain performance be avoiding all of the extra vertex attribute data. Honestly batching sounds no harder to implement than instancing, though. Anyway, thanks again for the info. That's good to know. |
That's certainly true when you have meshes with more vertices. However, for a quad, even with all the extra data duplication, the overhead is pretty small. GPU compute/shader cores execute workloads (like the vertex and fragment shader) in groups of many threads that must execute in lock-step (these are called "wavefronts" by AMD and "warps" by NVIDIA). Typically that means 32 threads (or sometimes 64, depending on the GPU architecture), that must all be executing identical instructions on every clock cycle. If the workload doesn't fit, the remaining processing potential of the GPU is left unutilized. An instance with just 4 vertices is obviously much smaller. |
Closing in favor of #3060 |
This still needs a little bit of work, but not much. On my machine I get get up to 83,000 sprites before droping below 60 fps, compared to before batching where it only got up to 33,000. 🎉
Objective
Solution