Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug which could cause a crash during composition #1568

Merged

Conversation

marktucker
Copy link
Contributor

Fix a bug which could cause a crash during composition or recomposition
after a change to a stage. Sufficiently frequent/repetitive calls to
_EvalRefOrPayloadArcs in primIndex.cpp, where ComputeLayerStack is called
many times for the same layer could result in Pcp_LayerStackRegistry
returning a PcpLayerStack that was either already deleted, or about to be
deleted. Fixing this required moe careful management of the table mapping
identifiers to layer stacks in the registry object.

Description of Change(s)

I believe there was unsafe code in the layer stack registry. Given enough threads simultaneously being in _EvalRefOrPayloadArcs in pcp/primIndex.cpp, with all those threads calling ComputeLayerStack for the same layer file, there is lots of opportunity for the registry cleanup code in the PcpLayerStack destructor to interact badly with the Pcp_LayerStackRegistry::FindOrCreate method.

I wish I could provide a reproduceable test case for this crash, but I was only able to make it happen within Houdini. But the crash would happen within a call to UsdUtilsFlattenLayerStack, in the SdfChangeBlock destructor when exiting UsdFlattenLayerStack. With this change I can no longer reproduce the crash in Houdini, and all the USD tests still pass (so at least I'm pretty sure I didn't break anything). Really curious to hear thoughts about this change...

after a change to a stage. Sufficiently frequent/repetitive calls to
_EvalRefOrPayloadArcs in primIndex.cpp, where ComputeLayerStack is called
many times for the same layer could result in Pcp_LayerStackRegistry
returning a PcpLayerStack that was either already deleted, or about to be
deleted. Fixing this required moe careful management of the table mapping
identifiers to layer stacks in the registry object.
PcpLayerStack if that layer stack isn't actually in the registry (it may
have already been replaced by a call to FindOrCreate on another thread).
@marktucker
Copy link
Contributor Author

marktucker commented Jul 22, 2021

FWIW, here is a simple Houdini file that demonstrates the crash:
EDIT: The original file posted here was missing a file asset.usda. New zip archive here:
stitching_crash_2.zip

@jilliene
Copy link

Filed as internal issue #USD-6804

@lbiasco-dwa
Copy link

I believe I'm also seeing this bug pop up. I haven't found a consistent repro with just USD either, but if I do find one I'll try testing this fix against it.

Copy link
Contributor

@sunyab sunyab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! I added a few notes, let me know what you think.

pxr/usd/pcp/layerStackRegistry.cpp Outdated Show resolved Hide resolved
pxr/usd/pcp/layerStackRegistry.h Show resolved Hide resolved
in the identifierToLayerStack map, along with a comment explaining why it
should always be called.
@sunyab
Copy link
Contributor

sunyab commented Aug 12, 2021

Thanks @marktucker! Changes look good to me, I've merged them internally and they should land in dev soon.

@lbiasco-dwa
Copy link

Just to follow up with some more affirmation, I found a couple fairly consistent repro, the best being rapidly running LoadAndUnload on some heavy assets at my studio (failing on anywhere between 1 to 20 load/unloads), and this change completely fixes the problem! It hasn't crashed once, even on 50 load/unloads. Sadly the repro requires production assets that I can't share so I can't pass along a test, but I at least wanted to confirm the stability of the fix in case there was any uncertainty about the fix working with more cases.

@marktucker
Copy link
Contributor Author

Thanks @lbiasco-dwa! It's great to have additional evidence that the bug is not Houdini-specific, and that this fix works in more scenarios than the one I was testing with.

@pixar-oss pixar-oss merged commit 875e4d5 into PixarAnimationStudios:dev Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants