Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egl-wayland: Fix an unbounded array growth issue #120

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

dkorkmazturk
Copy link
Collaborator

We do not seem to shrink the dynamically allocated streamImages array that we use for storing the resources associated with the swapchain images. When an entry in this array is destroyed in the destroy_stream_image() function, its fields are simply reverted to default/invalid values. Later on, the entries in this array are tried to be recycled in the add_surface_image() function. However, this approach is error-prone since we keep valid and invalid entries together, which requires us to check the validity of the entries every time we access them. Also, when explicit sync is in use, this array just keeps growing over time, especially during the application window resizes, due to a bug in the entry destruction logic.

So, to solve these problems, this change converts the dynamically allocated streamImages array into a linked list for simplified insertions and deletions. Each entry in this linked list is removed from the list and deallocated once they are no longer needed. So, all of the entries in the list stay valid.

Per-entry mutexes are replaced with a single mutex that guards accesses to the entire list to make sure that the linked list does not get corrupted when it is accessed from multiple threads. This only happens when explicit sync is not in use. The sizes of the critical sections that are protected by this new mutex are very small. To test to see if this change creates lock contention issues, weston-simple-egl application with swap interval of 0 was run on a Wayland compositor that does not support explicit sync to create as much lock contention as possible. However, no measurable difference in performance was observed after this change was applied.

As a side effect of this change, a bug in the
wlEglSurfaceCheckReleasePoints() function, where we wrongly assumed that all the entries in the streamImages array were valid, is fixed. This bug caused us to pass destroyed DRM syncobjs to the drmSyncobjTimelineWait() function, which led to random application freezes as a result in some cases since it prevented images from being released back to the EGL stream.

Another side effect of this change is that, it makes the maximum number of entries in this list known when explicit sync is in use, allowing us to avoid dynamically allocating the arrays for the list of DRM syncobjs and timeline points in wlEglSurfaceCheckReleasePoints(). This fixes a memory leak issue that can happen if only one of these allocations fails.

Copy link
Collaborator

@amshafer amshafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this! I'm happy that this cleaned up a lot of the ugly array handling.

DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE,
&firstSignaled) != 0) {
goto end;
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this bracketed block now that the if condition is gone? Is it just so syncobjWaitRes has a bounded lifetime? If so I think we should get rid of the extra brackets and declare syncobjWaitRes above with the rest of the local variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is a good practice to keep the scope of the variables minimal. The rest of the code in this function does not need to know the returned value. But I do not mind removing this scope and moving the declaration of the variable if you prefer that way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting variable scope is nice when you can do it, but not at the readability cost of disrupting the block structure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically this isn't something we really seem to do, but I think if you want to attain the same scope reduction we could create a small helper function to wrap drmSyncobjTimelineWait. I think that could look nicer than this block scoping

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, you could also remove syncobjWaitRes entirely and just check errno in your assert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking errno makes sense instead of using a separate variable, which helps to clean this up. Thank you for all the suggestions.

pthread_mutex_lock(&surface->ctx.streamImagesMutex);

// Locate the corresponding WlEglStreamImage
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about the brackets here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, scope reduction. The rest of the code does not need to know if the image is found or not.

if (!image) {
goto fail_destroy_sync;
if (!found) {
pthread_mutex_unlock(&surface->ctx.streamImagesMutex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to move this pthread_mutex_unlock into fail_destroy_sync. This matches how we unlock the mutex in fail_release and also prevents us from accidentally forgetting to unlock it if we use fail_destroy_sync from a second location in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe jumping to fail_release actually makes more sense since the eglImage here is already acquired. It also looks like fail_release needed some changes for how image is handled now.

src/wayland-eglsurface.c Show resolved Hide resolved
src/wayland-eglsurface.c Show resolved Hide resolved
*/
syncPoints[i] = UINT64_MAX;

if (numSyncPoints >= MAX_IMAGES) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect if we have exactly MAX_IMAGES elements, because we'll have already incremented numSyncPoints to MAX_IMAGES.

Instead, this check needs to go before the assignment and increment above.

Or alternatively, we could have an extra loop to count the number of elements we'll need and then call alloca to allocate the arrays for them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numSyncPoints is incremented to n right after the elements at the index n-1 is assigned. We check for this condition right after the increment and before the loop which we assign the elements at the index n. So as long as the initial value of numSyncPoints is less than MAX_IMAGES, which currently is, this should work. Though I believe moving the check above the assignment as you suggested would be less error-prone. We wouldn't be relying on an assumption of the initial value of numSyncPoints.

We do not seem to shrink the dynamically allocated streamImages array
that we use for storing the resources associated with the swapchain
images. When an entry in this array is destroyed in the
destroy_stream_image() function, its fields are simply reverted to
default/invalid values. Later on, the entries in this array are tried to
be recycled in the add_surface_image() function. However, this approach
is error-prone since we keep valid and invalid entries together, which
requires us to check the validity of the entries every time we access
them. Also, when explicit sync is in use, this array just keeps growing
over time, especially during the application window resizes, due to a
bug in the entry destruction logic.

So, to solve these problems, this change converts the dynamically
allocated streamImages array into a linked list for simplified
insertions and deletions. Each entry in this linked list is removed from
the list and deallocated once they are no longer needed. So, all of the
entries in the list stay valid.

Per-entry mutexes are replaced with a single mutex that guards accesses
to the entire list to make sure that the linked list does not get
corrupted when it is accessed from multiple threads. This only happens
when explicit sync is not in use. The sizes of the critical sections
that are protected by this new mutex are very small. To test to see if
this change creates lock contention issues, weston-simple-egl
application with swap interval of 0 was run on a Wayland compositor that
does not support explicit sync to create as much lock contention as
possible. However, no measurable difference in performance was observed
after this change was applied.

As a side effect of this change, a bug in the
wlEglSurfaceCheckReleasePoints() function, where we wrongly assumed that
all the entries in the streamImages array were valid, is fixed. This bug
caused us to pass destroyed DRM syncobjs to the drmSyncobjTimelineWait()
function, which led to random application freezes as a result in some
cases since it prevented images from being released back to the EGL
stream.

Another side effect of this change is that, it makes the maximum number
of entries in this list known when explicit sync is in use, allowing us
to avoid dynamically allocating the arrays for the list of DRM syncobjs
and timeline points in wlEglSurfaceCheckReleasePoints(). This fixes a
memory leak issue that can happen if only one of these allocations
fails.
Copy link
Collaborator

@amshafer amshafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this looks good now pending Kyle confirming the response to his feedback.

*/
if (image->buffer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not something for this change, but I think if we made WlEglStreamImage refcounted, then we could simplify this teardown dance quite a lot.

Copy link
Collaborator

@kbrenneman kbrenneman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks right.

@amshafer amshafer merged commit 59a60d6 into NVIDIA:master Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants