Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cull out-of-bounds triangles that were not clipped #9527

Closed
wants to merge 3 commits into from

Conversation

hrydgard
Copy link
Owner

@hrydgard hrydgard commented Apr 1, 2017

Helps #5001, at least for Toca.

Looks like this may also help #2159, #8251, #5507 and #7026 .

This could also be ported to the other backends with a little bit of effort. Need a uniform to stash the NaN in.

Really not sure how big the guard band should be, and the right way to do this would be to use a geometry shader instead of killing triangles by setting w = NaN, which I'm not sure will do sane things on all hardware. Unfortunately geometry shaders are not available everywhere.

@unknownbrackets
Copy link
Collaborator

I seem to remember it drops anything that even slightly goes outside the 4096x4096 grid - the one established by the viewport and offset parameters. But this might be affected by the clip flag (I think that was only depth?)

-[Unknown]

int spline_count_u;
int spline_count_v;
int spline_type_u;
int spline_type_v;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess these are VS parameters, not FS, aren't they?

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot we had that split. Didn't really mean to include that move anyway, I'll move them back.

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 1, 2017

Yeah, I don't think the clip flag is involved but you never know. 4096x4096 sounds reasonable. Should probably transform that space into the clip space (using offset etc as you say) and pass the four "walls" in as uniforms...

@unknownbrackets
Copy link
Collaborator

See here:
#2159 (comment)
#8251 (comment)

I think these two are affected by the same issue.

-[Unknown]

@LunaMoo
Copy link
Collaborator

LunaMoo commented Apr 1, 2017

#5507 and #7026 could also be affected, they have some exploded vertices going wild far outside the screen.

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 1, 2017

Yup, good memory, all of these are almost certainly the same thing. Now just got to figure out how to best implement it... This method works for me, but only because nVidia cards are smart enough to reject triangles with NaN coordinates instead of going insane - I believe it's undefined behaviour so it could also turn the moon into cheese, or whatever.

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 1, 2017

This sounds promising though:

https://twitter.com/nthibieroz/status/718065039085846529

@LunaMoo
Copy link
Collaborator

LunaMoo commented Apr 2, 2017

Sangokushi IX(Romance of the Three Kingdoms 9) still has black background with this, but at least that one has pretty easy workaround(disabling depth test via cwcheat).

Tested Driver 76 as well which appears completely fixed:
ules00740_00001
That I guess confirms this branch also works on AMD gpu's:].

When it comes to this game, once it get's fixed it might generate new group complaining on weird slowdowns unaffected by rendering resolution which I belive is caused by ~ 15k drawcalls:3. Not sure if those drawcalls are even valid as a lot of them have 1 vertex:
vertices_weirdness_driver76
a lot of those also have really crazy coords:

X	Y	Z	U	V	Color
1120822.000000	-1120428.000000	205.275391	62.956650	888.494385	00000000

@hrydgard hrydgard added this to the v1.5.0 milestone Apr 2, 2017
@hrydgard hrydgard changed the title D3D11-only experimental way to cull out-of-bounds triangles. Cull out-of-bounds triangles that were not clipped Apr 2, 2017
@hrydgard
Copy link
Owner Author

hrydgard commented Apr 2, 2017

OK, so I really don't understand this. I was under the impression that the PSP worked like this:

  • Clip triangles to w=0, splitting as necessary.
  • If any vertex is outside the 4096x4096 box, kill the whole triangle.

This means that we should not kill triangles with w < 0 in the vertex shader as they would have gotten clipped by the PSP and thus might end up inside the 4096x4096.

Anyway. I'm getting road triangle dropouts now that I'm sizing the box right:

debugblack

I decided to create a zoomed-out debug mode which instead of dropping triangles, colored the out-ouf-bound vertices purple:

debugcolor

In wireframe:
debugwire

The game is performing manual clipping of the road into that big "road triangle" (clearly dropping triangles by moving a vertex way off to the right, causing the annoying stripes) but it happens that triangles that should be drawn have vertices way outside the allowed 4096x4096 square (not marked but should be obvious where it is) and reach all the way to the car, causing the visible triangle dropouts. I had expected those to be behind the camera but no, they aren't - they are behind the z=0 of the Z buffer though. So the near Z clip of the PSP doesn't work quite as I thought, otherwise we'd be having these dropouts on the real PSP as well.

Grrr.

I guess there is a possibility though that the restriction doesn't apply vertically but that would be strange.

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 2, 2017

For now I left out the bottom border. If this fixes the rest of the games, then .. meh :) though not really happy about it.

Please test the other affected games again!

@unknownbrackets
Copy link
Collaborator

Hmm, we can do more testing. Thinking about it more, from the testing I previously did (was some time ago now though):

  • Depth:

    • CLAMPED when the "clip enabled" flag is ON, so -2.0 becomes 0.0.
    • CLIPPED when the "clip enabled" flag is OFF, so -2.0 is not drawn. Note: this seems backwards from the name of the flag, and is what we always do.
    • May never have tested whether extreme depth values were culled. Probably not?
  • Guard band:

    • Pretty sure drawing a box larger than the 4096x4096 size was culled.
    • Don't remember the behavior when only some vertices were inside, but I thought it did cull.
    • Could be the clip flag effects this in some way?
    • Definitely never tested vertical vs horizontal specifically.

With how it handles depth clipping (there's a "fuzz region" that is still clamped, iirc), I wouldn't put it past it to do something weird with the guard band, really...

-[Unknown]

// We also assume that everything behind the near clipping plane gets clipped and will thus not in reality
// exceed the guardband. This is a bit rough but should be ok.
float offsetX = gstate.getOffsetX();
float offsetY = gstate.getOffsetY();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess worst case they would just not cull things they should, right? Maybe we remove these unused vars to reduce confusion?

-[Unknown]

Copy link
Owner Author

@hrydgard hrydgard Apr 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in theory if the game has the viewport say at the far left of the 4096x4096 square, it could draw something very wide only to the right that we'd now cull but shouldn't. I don't think that's much of a concern though.

Removed the vars, they were leftovers from a few previous variants.

@@ -427,10 +427,11 @@ void DrawEngineD3D11::ApplyDrawState(int prim) {
if (rasterIter == rasterCache_.end()) {
D3D11_RASTERIZER_DESC desc{};
desc.CullMode = (D3D11_CULL_MODE)(keys_.raster.cullMode);
// desc.FillMode = gstate.isModeThrough() ? D3D11_FILL_SOLID : D3D11_FILL_WIREFRAME;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove these debugging comments?

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 2, 2017

This needs more work (and yes, HW testing) clearly, I'm getting some minor glitches in Wipeout on the left side.

BTW, in D3D11, we can get the CLAMPED behaviour for Z by using rasterdesc.DepthClipEnable, but that shouldn't affect this issue (might solve something else though).

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 2, 2017

Did an experiment on an variant on D3D11/9 only - so testers (like @LunaMoo :), please try both OpenGL and D3D11. In this case, we simply don't cull if Z is out of range.

@unknownbrackets
Copy link
Collaborator

FWIW, neither option seems to fix Brave Story (although I still think it's from culling), but the pre-experiment version removed a chunk of the triangle, the post-experiment removes none of it.

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 2, 2017

If it removed a chunk of the triangle, it has to be made out of multiple triangles...

Detailed hardware tests will be required to figure this out. Though as-is it does fix some games at least...

@LunaMoo
Copy link
Collaborator

LunaMoo commented Apr 3, 2017

Driver 76 still works fine in d3d11(d3d9 looks about the same /to make it clear it's also fixed).
In OGL it still glitches a bit, much less than originally, there's like 1 diagonal(more vertical) line flashing for like 1 frame in the intro, can't catch it, but in the same place like 70% of the screen was covered with glitch pretty much constantly so one frame of one tiny line, really not that bad. There are also glitches coming out on top of some smaller buildings however that could be separate bug of OGL as their vertices seems to look ok:
vertices_weirdness_driver76b
the texture used for that glitch is also the texture used for all far/background buildings, so it looks like it draws their distance/simplified models at close range. In case there might be anything interesting there, here's some tabs from GE debugger from the same thing(althrough other frame) https://gist.github.com/LunaMoo/08409df1971bc9f853557552317380d7

Sangokushi IX now works in both OGL and d3d11 backends:)
uljm05842_00000
~ still broken in d3d9

Edit: Found another problem in Driver 76 in build I made earlier before new changes here everything looks fine, in new build that works for all backends at certain angles in one place it doesn't draw part of the road:
ules00740_00015
With this d3d9/11 are buggy, where OGL shows the road fine.
Here's vertices used to draw that road with missing part:

Vertices
X	Y	Z	U	V	Color
470.366211	88.031372	3168.283203	0.000000	1.484802	00000000
618.523438	138.513428	5421.074219	0.000000	142.079895	00000000
610.217285	200.169434	8172.505859	52.651653	212.180603	00000000
610.217285	200.169434	8172.505859	52.651653	212.180603	00000000
470.366211	88.031372	3168.283203	0.000000	1.484802	00000000
610.217285	200.169434	8172.505859	52.651653	212.180603	00000000
289.525635	86.349365	3093.220703	128.974457	1.484802	00000000
542.077637	705.949219	30743.183594	128.974457	317.318542	00000000
491.182617	1083.723389	47601.550781	119.169609	330.519836	00000000
491.182617	1083.723389	47601.550781	119.169609	330.519836	00000000
289.525635	86.349365	3093.220703	128.974457	1.484802	00000000
491.182617	1083.723389	47601.550781	119.169609	330.519836	00000000
117.022339	84.744751	3021.615234	0.000000	1.484802	00000000
403.891357	1731.660156	76516.046875	112.410851	339.915588	00000000
403.891357	1731.660156	76516.046875	112.410851	339.915588	00000000
403.891357	1731.660156	76516.046875	112.410851	339.915588	00000000
117.022339	84.744751	3021.615234	0.000000	1.484802	00000000
403.891357	1731.660156	76516.046875	112.410851	339.915588	00000000
117.022339	84.744751	3021.615234	0.000000	1.484802	00000000
239.469360	2952.117676	130979.570313	107.791336	346.371887	00000000
117.022339	84.744751	3021.615234	0.000000	1.484802	00000000
-130.125854	201.343506	8224.892578	0.000000	231.234558	00000000

Oh yeah I didn't notice, but that was the experiment;p, reverting 8c86b24 fixes the road there.

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 3, 2017

Thanks for your testing @LunaMoo . This will be a tricky one to get completely right...

@sum2012
Copy link
Collaborator

sum2012 commented Apr 3, 2017

@hrydgard Does not fix #2159 because DX9 still black screen
Please change your word "fix" to "help"

@hrydgard
Copy link
Owner Author

hrydgard commented Apr 3, 2017

Thanks @sum2012 , done. Though that can probably be fixed.

@LunaMoo
Copy link
Collaborator

LunaMoo commented Apr 4, 2017

Sangokushi IX draws the black thing via:

X	Y	Z	U	V	Color
362.994385	139.997803	0.000000	185856.000000	35840.000000	9a9a9aff
468.992920	140.997803	0.000000	240128.000000	36096.000000	9a9a9aff
362.994385	155.997559	0.000000	185856.000000	39936.000000	9a9a9aff
468.992920	156.997559	0.000000	240128.000000	40192.000000	9a9a9aff
362.994385	170.997314	0.000000	185856.000000	43776.000000	9a9a9aff
468.992920	171.997314	0.000000	240128.000000	44032.000000	9a9a9aff
362.994385	185.997070	0.000000	185856.000000	47616.000000	9a9a9aff
468.992920	186.997070	0.000000	240128.000000	47872.000000	9a9a9aff
362.994385	200.996826	0.000000	185856.000000	51456.000000	9a9a9aff
468.992920	201.996826	0.000000	240128.000000	51712.000000	9a9a9aff
362.994385	215.996582	0.000000	185856.000000	55296.000000	9a9a9aff
468.992920	216.996582	0.000000	240128.000000	55552.000000	9a9a9aff
362.994385	231.996582	0.000000	185856.000000	59392.000000	9a9a9aff
468.992920	232.996338	0.000000	240128.000000	59648.000000	9a9a9aff
0.000000	0.000000	0.000000	0.000000	0.000000	000000ff
7999.877930	3999.938965	0.000000	4096000.000000	1024000.000000	000000ff

I think only the last rectangle matters here which looks like it's way higher than what should be affected by this branch now and is indeed in other backends, so why fails in d3d9? Maybe it's related to it being rectangle prim and first vertex at (0,0) is within range ~ isn't rectangle divided to two triangles? Or maybe because the texture is disabled for this in opposite to other cases where this branch does work in d3d9?
Edit: I think it's the texture thing, if I remove the last "Texture map enable: 0" before it draws this, so it uses previous texture, then this branch works.

@thedax
Copy link
Collaborator

thedax commented May 16, 2017

This actually does help #5507, but it causes issues with missing UI elements (the tachometer during races, parts of the initial loading screens to be black instead of white, and various graphics on the main menu to disappear).

@hrydgard
Copy link
Owner Author

Hm. Until this is understood better, maybe it makes sense to commit it behind a compat flag which we specifically enable for TOCA and the few other games it helps. But I'm a bit disappointed it isn't working better than it is, I don't understand why.

@unknownbrackets
Copy link
Collaborator

I think we need to do some hardware testing - it could be the PPS hardware does the culling in some strange way. Maybe it doesn't even cull based on z at all, or something?

-[Unknown]

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Jul 29, 2018

Okay, so summarizing hrydgard/pspautotests#194 based on what would achieve those results:

  1. Treat gstate.isClippingEnabled() as gstate.isDepthClampEnabled().
    1.1. When disabled -> any vert outside [0, 1.0] (possibly with some rounding padding) after projection is culled.
    1.2. When enabled -> verts are not culled based on depth, and depth is clamped to [0, 65535] (likely at fragment level - that's a separate problem to worry about for depth testing, etc.)
    1.3. In some backends, we use depth range to handle minz/maxz, which might have to be corrected for when culling. Example: minz=16384, maxz=49152 -> means [-1.0, 1.0] is depth CLIP range, but outside [-2.0, 2.0] is culled.
    1.4. Using and populating u_depthRange for this may be the best bet?

  2. Ignore gstate.isClippingEnabled() when it comes to X or Y. Culling is performed regardless of the flag, which only seems to affect depth.

  3. Use a uniform to specify the guardband, which is impacted by gstate.getOffsetX() and gstate.getOffsetY().
    3.1 For example, offset(X, Y)= (0, 0) would cause culling of any vertices with X < 240 or Y < 136. It is most commonly set to (2048 - 480/2, 2048 - 272/2), the center of the 4096x4096 valid area.
    3.2. Ideally we'd reverse the transform in the uniform values, such that the min/max values could be compared with gl_Position.xy / gl_Position.z, I think?
    3.3. Unfortunately, this would probably require 4 floats (either min/max for each coord, or mul&add for each with hardcoded 2048 each way.)

  4. We could potentially pass NAN as a dedicated uniform, or perhaps use 0.0 for w?

I think that covers it, but it's possible again that I've misunderstood how it works due to bias (see hrydgard/pspautotests#194). What do you think @hrydgard?

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented Aug 1, 2018

That all sounds good. but I expect that there might still be some remaining strange edge cases we can't foresee with the clipping or not and the resulting XY positions...

Passing on a couple more floats for the guardband and a NaN value shouldn't be much of an issue though, we still have some fluff to save in our uniform structs. I don't know how reliable passing 0.0 for W is but maybe that works. We can also enable all this on a per-game basis using compat.ini although would be good to avoid that.

@unknownbrackets
Copy link
Collaborator

There's definitely more to it. Persona 1 would draw hallways incorrectly from the above rules - see here:
#9622 (comment)

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented Aug 4, 2018

Oh. Yeah .. no idea about that one. It's a bit annoying how the PSP's rasterizer is mostly so normal, but has these bizarre edge cases....

I'm gonna rebase this again soon, for whatever it's worth.

…st for Toca.

Really not sure how big the guard band should be, and the right way to
do this would be to use a geometry shader instead of killing triangles
by setting w = NaN, which I'm not sure will do sane things on all
hardware. Unfortunately geometry shaders are not available everywhere.

Assumes the viewport is centered in the 4096x4096 rectangle to save a
uniform.

Ignores the bottom for now, can't figure out TOCA :(
… the clipper works when Z is out of range, let's only cull when Z is in range.
@hrydgard
Copy link
Owner Author

hrydgard commented Aug 5, 2018

Rebased this on master again (squashed a couple of commits to make the rebase easier, there was some back-and-forth going on)

@unknownbrackets
Copy link
Collaborator

I've confirmed the following so far:

  • That particular draw, with its model/view/projection values, should indeed draw and is definitely affected by the clipping flag.
  • If I tweak the coordinates, I can adjust to a position where the prim is skipped.
  • My original test used an ortho projection matrix, but if I switch to identity and change the viewport to scale 1:1 too, behavior is the same (implying that it's not before viewport.)

-[Unknown]

<<<<<<< HEAD
=======
OutputDebugStringA(LineNumberString(code).c_str());
>>>>>>> Experimental way to cull out-of-bounds triangles. Helps #5001, at least for Toca.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Ah... I tried taking the coordinates through M, V, P, and then viewport. It still worked after MVP, which doubly confirmed the above - then I finally realized the obvious, d'oh.

The vertices that are outside the 4096x4096 space in this case ALSO have depth outside 65535. If the depth is forced to a valid value, it kills the verts. So basically, when the depth clamp flag is enabled, clamping bypasses killing the vertex. But if it's not clamped, it's subject to the 4096x4096 cull.

Now it makes sense again.

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented Aug 6, 2018

"bypasses killing the vertex" must then mean that the hardware performs some sort of clipping against the near plane, making the remaining verts in-bounds in X and Y, because clearly the 0-4096 thing is a hardware limitation.

And that would mean that the clamp flag performs clipping, and maybe the clip flag performs clamping? :)

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Aug 6, 2018

That makes sense. In that way, I suppose it makes sense to call it a clipping flag.

But, it's important to note that depth clamp will cause an object entirely behind or in front of the frustum (i.e. before the near plane or after the far plane) to still draw, and not clip to those planes. Meaning it only clips to the four planes on the sides.

It definitely does clamp depth, though. If you consider the viewing frustum to be always 4096x4096 and depth 0-65535 (regardless of viewport or minz/maxz), then with depth clamp off, even a single vertex slightly outside the 0-65535 range will cause the triangle not to draw. Same as outside 4096x4096.

But with it on - unless I'm confused - it will draw even the parts outside the frustum (on the near and far sides), just squished to the near and far sides like a deformed tin can.

The minz/maxz params are applied after this, possibly at the fragment level. Though I still think it's appropriate to think of them as clipping, essentially.

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented Aug 6, 2018

Oh I see. The phenomenon where clamping depth makes triangles stay makes sense for vertices that are closer to the camera than the Z near plane, but not beyond W=0 , and beyond the Z far plane, but not beyond W=1. Those would, without clamping, end up with Z that are too large (or small, depending on the projection matrix) to interpolate and therefore the GPU throws them away. Maybe it's really that simple and games never really try to draw triangles that cross W=0....

@unknownbrackets
Copy link
Collaborator

It's possible I'm not testing crossing w=0 and w=1 properly, though. I'm not exactly sure how to cleanly test that, tbh.

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants