-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
egl-wayland: Fix an unbounded array growth issue #120
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this! I'm happy that this cleaned up a lot of the ugly array handling.
src/wayland-eglsurface.c
Outdated
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE, | ||
&firstSignaled) != 0) { | ||
goto end; | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this bracketed block now that the if condition is gone? Is it just so syncobjWaitRes
has a bounded lifetime? If so I think we should get rid of the extra brackets and declare syncobjWaitRes
above with the rest of the local variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is a good practice to keep the scope of the variables minimal. The rest of the code in this function does not need to know the returned value. But I do not mind removing this scope and moving the declaration of the variable if you prefer that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting variable scope is nice when you can do it, but not at the readability cost of disrupting the block structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically this isn't something we really seem to do, but I think if you want to attain the same scope reduction we could create a small helper function to wrap drmSyncobjTimelineWait
. I think that could look nicer than this block scoping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, you could also remove syncobjWaitRes
entirely and just check errno
in your assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking errno makes sense instead of using a separate variable, which helps to clean this up. Thank you for all the suggestions.
src/wayland-eglsurface.c
Outdated
pthread_mutex_lock(&surface->ctx.streamImagesMutex); | ||
|
||
// Locate the corresponding WlEglStreamImage | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about the brackets here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, scope reduction. The rest of the code does not need to know if the image is found or not.
src/wayland-eglsurface.c
Outdated
if (!image) { | ||
goto fail_destroy_sync; | ||
if (!found) { | ||
pthread_mutex_unlock(&surface->ctx.streamImagesMutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to move this pthread_mutex_unlock
into fail_destroy_sync
. This matches how we unlock the mutex in fail_release
and also prevents us from accidentally forgetting to unlock it if we use fail_destroy_sync
from a second location in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe jumping to fail_release actually makes more sense since the eglImage here is already acquired. It also looks like fail_release needed some changes for how image is handled now.
*/ | ||
syncPoints[i] = UINT64_MAX; | ||
|
||
if (numSyncPoints >= MAX_IMAGES) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect if we have exactly MAX_IMAGES elements, because we'll have already incremented numSyncPoints to MAX_IMAGES.
Instead, this check needs to go before the assignment and increment above.
Or alternatively, we could have an extra loop to count the number of elements we'll need and then call alloca
to allocate the arrays for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numSyncPoints is incremented to n right after the elements at the index n-1 is assigned. We check for this condition right after the increment and before the loop which we assign the elements at the index n. So as long as the initial value of numSyncPoints is less than MAX_IMAGES, which currently is, this should work. Though I believe moving the check above the assignment as you suggested would be less error-prone. We wouldn't be relying on an assumption of the initial value of numSyncPoints.
We do not seem to shrink the dynamically allocated streamImages array that we use for storing the resources associated with the swapchain images. When an entry in this array is destroyed in the destroy_stream_image() function, its fields are simply reverted to default/invalid values. Later on, the entries in this array are tried to be recycled in the add_surface_image() function. However, this approach is error-prone since we keep valid and invalid entries together, which requires us to check the validity of the entries every time we access them. Also, when explicit sync is in use, this array just keeps growing over time, especially during the application window resizes, due to a bug in the entry destruction logic. So, to solve these problems, this change converts the dynamically allocated streamImages array into a linked list for simplified insertions and deletions. Each entry in this linked list is removed from the list and deallocated once they are no longer needed. So, all of the entries in the list stay valid. Per-entry mutexes are replaced with a single mutex that guards accesses to the entire list to make sure that the linked list does not get corrupted when it is accessed from multiple threads. This only happens when explicit sync is not in use. The sizes of the critical sections that are protected by this new mutex are very small. To test to see if this change creates lock contention issues, weston-simple-egl application with swap interval of 0 was run on a Wayland compositor that does not support explicit sync to create as much lock contention as possible. However, no measurable difference in performance was observed after this change was applied. As a side effect of this change, a bug in the wlEglSurfaceCheckReleasePoints() function, where we wrongly assumed that all the entries in the streamImages array were valid, is fixed. This bug caused us to pass destroyed DRM syncobjs to the drmSyncobjTimelineWait() function, which led to random application freezes as a result in some cases since it prevented images from being released back to the EGL stream. Another side effect of this change is that, it makes the maximum number of entries in this list known when explicit sync is in use, allowing us to avoid dynamically allocating the arrays for the list of DRM syncobjs and timeline points in wlEglSurfaceCheckReleasePoints(). This fixes a memory leak issue that can happen if only one of these allocations fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this looks good now pending Kyle confirming the response to his feedback.
*/ | ||
if (image->buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not something for this change, but I think if we made WlEglStreamImage
refcounted, then we could simplify this teardown dance quite a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks right.
We do not seem to shrink the dynamically allocated streamImages array that we use for storing the resources associated with the swapchain images. When an entry in this array is destroyed in the destroy_stream_image() function, its fields are simply reverted to default/invalid values. Later on, the entries in this array are tried to be recycled in the add_surface_image() function. However, this approach is error-prone since we keep valid and invalid entries together, which requires us to check the validity of the entries every time we access them. Also, when explicit sync is in use, this array just keeps growing over time, especially during the application window resizes, due to a bug in the entry destruction logic.
So, to solve these problems, this change converts the dynamically allocated streamImages array into a linked list for simplified insertions and deletions. Each entry in this linked list is removed from the list and deallocated once they are no longer needed. So, all of the entries in the list stay valid.
Per-entry mutexes are replaced with a single mutex that guards accesses to the entire list to make sure that the linked list does not get corrupted when it is accessed from multiple threads. This only happens when explicit sync is not in use. The sizes of the critical sections that are protected by this new mutex are very small. To test to see if this change creates lock contention issues, weston-simple-egl application with swap interval of 0 was run on a Wayland compositor that does not support explicit sync to create as much lock contention as possible. However, no measurable difference in performance was observed after this change was applied.
As a side effect of this change, a bug in the
wlEglSurfaceCheckReleasePoints() function, where we wrongly assumed that all the entries in the streamImages array were valid, is fixed. This bug caused us to pass destroyed DRM syncobjs to the drmSyncobjTimelineWait() function, which led to random application freezes as a result in some cases since it prevented images from being released back to the EGL stream.
Another side effect of this change is that, it makes the maximum number of entries in this list known when explicit sync is in use, allowing us to avoid dynamically allocating the arrays for the list of DRM syncobjs and timeline points in wlEglSurfaceCheckReleasePoints(). This fixes a memory leak issue that can happen if only one of these allocations fails.