AMDVLK out of bounds write causes memory corruption/crash when an attached display monitor has more than 64 video modes. #179

kleinerm · 2024-10-01T05:10:45Z

Short summary:

If a Vulkan client on Linux tries to call vkGetDisplayModePropertiesKHR() on a video output which exposes more than 64 video modes then amdvlk experiences memory and stack corruption due to out-of-bounds writes, leading to client crash/termination, e.g., "Stack smashing detected" abort().

Explanation of how/why this happens:

If a Vulkan client on Linux tries to call vkGetDisplayModePropertiesKHR() on a VKDisplayKHR (running on a virtual terminal, or created from a RandR leased output or similar), then the following call-chain is executed in XGL's vk_instance.cpp:

vk_physical_device.cpp: vkGetDisplayModePropertiesKHR() -> vk_physical_device.cpp: PhysicalDevice::GetDisplayModeProperties()-> vk_instance.cpp: Instance::GetScreenModeList() -> Multiple calls to pScreen->GetScreenModeList() implemented in PAL, which will then call into the Linux OS specific amdgpuDevice.cpp : Device::QueryScreenModesForConnector()

Device::QueryScreenModesForConnector() returns the number of video modes supported on the Linux DRM connector for the video output associated with the VKDisplayKHR handle, or a list of all the video modes. It returns the full count, or the full list of modes.

This is a problem, because the current implementation of XGL uses multiple arrays for storing or caching these returned video modes, which have a fixed maximum size of Pal::MaxModePerScreen = 64 elements, as defined in palPlatform.h.

For example pModeList in struct vk_instance.h ScreenObject

xgl/icd/api/include/vk_instance.h

Line 355 in c4ca1f6

Pal::ScreenMode* pModeList[Pal::MaxModePerScreen];

and in

xgl/icd/api/vk_physical_device.cpp

Line 9534 in c4ca1f6

Pal::ScreenMode* pScreenMode[Pal::MaxModePerScreen];

are arrays which are involved in the above call sequence and have a fixed maximum capacity of 64 video modes.

Therefore calling vkGetDisplayModePropertiesKHR() on a display with > 64 modes will cause out-of-bounds writes into these arrays, corrupting data structures inside the VKInstance and also stack corruption, leasing to malfunction or crash of the client app.

Possible solutions?

Ideally there wouldn't be fixed size arrays like those mentioned above. Or at least the limit of Pal::MaxModePerScreen inside
https://github.com/GPUOpen-Drivers/pal/blob/31f6a70d0e9ee22894daab5305dd91b3ebb599a5/inc/core/palPlatform.h#L88
should be raised to a much larger number.

Then the code inside Instance::GetScreenModeList() should do checking on the modecount values returned by Pal and clamp the returned mode counts and internally cached modes to a maximum of Pal::MaxModePerScreen to prevent future out-of-bounds writes due to array overflows.

xgl/icd/api/vk_instance.cpp

Line 868 in c4ca1f6

VkResult Instance::GetScreenModeList(

Thanks.

The text was updated successfully, but these errors were encountered:

jinjianrong · 2024-11-03T23:52:02Z

The fix will be in next release

kleinerm · 2024-11-04T00:13:04Z

Great! Did you bump the Pal::MaxModePerScreen to a higher value and implement suitable bounds checking/clamping to prevent a future repetition of the bug? Or even switch to dynamic allocation?

For reference, a user of my software which first reported this crash, experienced it with a HDR projector that exposed 107 video modes. So if you just bumped the static limit, but didn't switch to dynamic allocation, I would propose choosing at least a limit of 256 modes at a minimum, or maybe even a more generous limit of 512 modes to be on the relative safe side.

Thanks!

jinjianrong · 2024-11-05T07:36:09Z

Here is the fix : #180

…ure drivers. AMDVLK v-2024.Q4.2 will have a suitable bug fix, so the workaround ain't needed anymore on Navi+ / RDNA gpu's supported by that driver. The fix is untested and not testable by myself as I don't have a RDNA+ gpu, but I performed a full code review and conclude that the bug should be fixed by that fix, which now implements dynamic allocation of suitable data structures, so no hard mode count limit anymore and future proof. Bug report: GPUOpen-Drivers/xgl#179 Bugfix merged 5.11.2024: GPUOpen-Drivers/xgl#180

kleinerm · 2024-11-06T01:51:50Z

I reviewed the code of your fix and it looks good to me. Dynamic allocation, great!
I can't test this myself easily anymore, as the current driver no longer supports the Polaris and Vega gpu's I have, sadly only RDNA. But I will let my user who has a RDNA gpu know about the fix. Maybe he can test it already and then I can close this issue.
Thanks.

Update Khronos Vulkan Headers to 1.3.301 Update compliant CTS version to 1.3.9.2 Fix buffer overflow when the count of display modes exceeds MaxModePerScreen(64) (xgl issues #179) Expose extension VK_EXT_swapchain_colorspace Fix command buffer stack usage estimates Add persistent waves to CPS path Extend GFX11 app opts to Strix Copy cbState struct as a whole Add BG3 layer to modify a barrier Disable dcc when the application transitions to the feedback loop image layout (AMDVLK Issue #375) Fix NSA threshold for World War Z and Sniper Elite 5 Get WaveSize from GPURT compiler options Remove unnecessary UpdateFrameTraceController call Update RtFp16BoxNodes Values Change VK_ASSERT_MSG to VK_ASSERT_ALWAYS_MSG Set AllowVaryWaveSize for GPURT shaders Remove VK_ALLOC_A macro Fix nsaThreshold definitions in app_profile Bail out on error when parsing shader profile Enable the option OptimizePointSizeWrite Fix CTS failures in dEQP-VK.pipeline.pipeline_library.graphics_library.misc.other.* Disable descriptorBufferCaptureReplay Disabling DCC for VK_IMAGE_LAYOUT_RENDERING_LOCAL_READ_KHR Update some settings' value [RT] Fix the use of 'forceInvalidAccelStruct' setting Fix usage of LayoutUncompressed for LAYOUT_RENDERING_LOCAL_READ Fix coding standard issues in vk_defines Expose more graphics and compute queues Disable htile for external memory when creating image to work-around the corruption issue in interop with mesa GL Initializing the fence count to 0 for every device index. [DPRINTF] Fix debug printf crash Fix sample shading is enabled for POPS incorrectly Expose VK_KHR_compute_shader_derivatives Fix menu list corruption in Blender 4.3.0 Beta no rgp markers for rgd Bump up GPURT version to 51 Enable GpuRt setting for persistent waves Update PAL Version in XGL 909 VK_EXT_shader_replicated_composites - Driver Implementation Fix "Unknown()" events appearing in RGP captures Specify SRD stride for SrvTable and TypedSrvTables as 4 Fix memory leaks Reset plane index according to the latest aspectMask to fix corruption in Blender 4.3.0 Beta: EEVEE render engine Fix an assertion of incorrect primitive type Remove m_stringTableId from devmode Bugs fix for Proton Raytracing Games Acq rel for CmdWaitEvents Clean up macros and build options

jinjianrong · 2024-12-09T05:05:31Z

Fixed in 2024.Q4.2

kleinerm closed this as completed in kleinerm/Psychtoolbox-3@37bbf13 Nov 2, 2024

kleinerm reopened this Nov 2, 2024

jinjianrong added assigned reproduced labels Nov 3, 2024

jinjianrong added fixed internally and removed assigned labels Nov 3, 2024

jinjianrong closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDVLK out of bounds write causes memory corruption/crash when an attached display monitor has more than 64 video modes. #179

AMDVLK out of bounds write causes memory corruption/crash when an attached display monitor has more than 64 video modes. #179

kleinerm commented Oct 1, 2024

jinjianrong commented Nov 3, 2024

kleinerm commented Nov 4, 2024

jinjianrong commented Nov 5, 2024

kleinerm commented Nov 6, 2024

jinjianrong commented Dec 9, 2024

AMDVLK out of bounds write causes memory corruption/crash when an attached display monitor has more than 64 video modes. #179

AMDVLK out of bounds write causes memory corruption/crash when an attached display monitor has more than 64 video modes. #179

Comments

kleinerm commented Oct 1, 2024

jinjianrong commented Nov 3, 2024

kleinerm commented Nov 4, 2024

jinjianrong commented Nov 5, 2024

kleinerm commented Nov 6, 2024

jinjianrong commented Dec 9, 2024