Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD Memory Leak #7108

Open
Eraden opened this issue Jan 6, 2023 · 12 comments
Open

AMD Memory Leak #7108

Eraden opened this issue Jan 6, 2023 · 12 comments
Labels
C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times

Comments

@Eraden
Copy link

Eraden commented Jan 6, 2023

Bevy version

0.9.1

Relevant system information

If you cannot get Bevy to build or run on your machine, please include:

  • the Rust version you're using (you can get this by running cargo --version)
    • cargo 1.68.0-nightly (70898e522 2022-12-05)
    • rustc 1.68.0-nightly (bdb07a8ec 2022-12-11)
    • rustc 1.66.0 (69f9c33d7 2022-12-12)
  • the operating system or browser used, including its version
    • Arch Linux
    • Linux archpc 6.1.3-arch1-1 Render Text #1 SMP PREEMPT_DYNAMIC Wed, 04 Jan 2023 16:28:15 +0000 x86_64 GNU/Linux
    • AMD Ryzen 9 3950X 16-Core Processor

If your bug is rendering-related, copy the adapter info that appears when you run Bevy.

2023-01-06T08:09:33.812139Z  INFO winit::platform_impl::platform::x11::window: Guessed window scale factor: 1.25    
2023-01-06T08:09:34.007245Z  INFO bevy_render::renderer: AdapterInfo { name: "AMD Radeon RX 5700 XT", vendor: 4098, device: 29471, device_type: DiscreteGpu, driver: "AMD open-source driver", driver_info: "2022.Q4.4 (LLPC)", backend: Vulkan }

You should also consider testing the examples of our upstream dependencies to help isolate any setup-specific issue:

  • wgpu for rendering problems
  • winit for input and window management
  • gilrs for gamepad inputs

What you did

Example: https://github.com/bevyengine/bevy/blob/v0.9.1/examples/games/game_menu.rs

What went wrong

If it's not clear, break this out into:

  • application does not exit fully, must be killed with Ctrl+C or sigkill
  • memory usage is constantly increasing on every screen (even Main Menu without any interaction)

Additional information

Tested on 2 mobile Intel and there's no problem

Other information that can be used to further reproduce or isolate the problem.
This commonly includes:

  • logs
    bevy.log
  • workarounds that you used:
    • multiple pc
    • master branch
  • links to related bugs, PRs or discussions
@Eraden Eraden added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Jan 6, 2023
@ghost
Copy link

ghost commented Jan 6, 2023

@Eraden I'll try to take a look at this in the near future. Could you try to break the example down to something minimal?

possibly related to #5691 there is also
#5856 (which I've run into OOM with 64GB ram running tests)

@Eraden
Copy link
Author

Eraden commented Jan 6, 2023

I tried running splash screen only and even without countdown. Unfortunately this always leaks for me.

I tried:

  • Gnome on Wayland
  • Gnome on Xorg
  • KDE
  • SwayWM

Surprisingly this appear to always run with Wayland backend (even on Xorg). It's also appear to increase by ~21MB of RAM every 2 seconds

valgrind besides yelling about using uninitialized wayland variable can't even properly display anything

@Eraden
Copy link
Author

Eraden commented Jan 6, 2023

Versions below 0.9.x crash with

2023-01-06T11:11:29.848037Z ERROR wgpu_hal::vulkan::instance: VALIDATION [UNASSIGNED-CoreValidation-Shader-InconsistentSpirv (0x6bbb14)]
	Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x55c7fb275ab0, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: [VUID-StandaloneSpirv-Flat-06202] OpEntryPoint interfaces variable must not be vertex execution model with an input storage class for Entry Point id 43.
  %vertex_color = OpVariable %_ptr_Input_uint Input
    
2023-01-06T11:11:29.848058Z ERROR wgpu_hal::vulkan::instance: 	objects: (type: DEVICE, hndl: 0x55c7fb275ab0, name: ?)    
2023-01-06T11:11:29.848877Z ERROR wgpu_hal::vulkan::instance: VALIDATION [UNASSIGNED-CoreValidation-Shader-InconsistentSpirv (0x6bbb14)]
	Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x55c7fb275ab0, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: [VUID-StandaloneSpirv-Flat-06202] OpEntryPoint interfaces variable must not be vertex execution model with an input storage class for Entry Point id 43.
  %vertex_color = OpVariable %_ptr_Input_uint Input
    
2023-01-06T11:11:29.848883Z ERROR wgpu_hal::vulkan::instance: 	objects: (type: DEVICE, hndl: 0x55c7fb275ab0, name: ?)    

@Eraden
Copy link
Author

Eraden commented Jan 6, 2023

Using amd pro seems to slow down leak but leak still exists

1MB per minute

VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/amd_pro_icd64.json LIBGL_ALWAYS_SOFTWARE=0 cargo run

1MB per couple seconds

__GLX_VENDOR_LIBRARY_NAME=mesa VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/amd_pro_icd64.json LIBGL_ALWAYS_SOFTWARE=0 cargo run

When using amd_pro prefixes leak more or less disappear. Initially it raises but when left for couple minutes I don't have visible memory usage increase

amd_pro cargo run

@IceSentry
Copy link
Contributor

Could you try running some of the wgpu examples at https://github.com/gfx-rs/wgpu/tree/master/wgpu/examples and see if you still have this issue.

@james7132 james7132 added C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled labels Jan 6, 2023
@james7132
Copy link
Member

You said you tried running on main and that didn't mitigate it? This seems awfully similar to the memory leak solved by #6878 which is in main but not in 0.9.1.

@alice-i-cecile
Copy link
Member

Are you running in release or debug mode? Logs can cause memory leaks in debug mode, but that magnitude is very high.

@Eraden
Copy link
Author

Eraden commented Jan 7, 2023

Today (07.01.2023) results:

Subject 1

Could you try running some of the wgpu examples at https://github.com/gfx-rs/wgpu/tree/master/wgpu/examples and see if you still have this issue.

  • hello_triangle - no leak
  • mipmap - no leak
  • skybox - no leak

Subject 2

You said you tried running on main and that didn't mitigate it? This seems awfully similar to the memory leak solved by #6878 which is in main but not in 0.9.1.

  • nightly + debug

    • rustc 1.68.0-nightly (bdb07a8ec 2022-12-11)
    • branch=main#076e6f78

    Memory leak

  • nightly + release

    • rustc 1.68.0-nightly (bdb07a8ec 2022-12-11)
    • branch=main#076e6f78

    Memory leak

  • stable + debug

    • rustc 1.66.0 (69f9c33d7 2022-12-12)
    • branch=main#076e6f78
  • stable + release

    • rustc 1.66.0 (69f9c33d7 2022-12-12)
    • branch=main#076e6f78

Screens tested:

  • Splash
  • Main Menu

Subject 3

Are you running in release or debug mode? Logs can cause memory leaks in debug mode, but that magnitude is very high.

After crate edit and logs completely removed

obraz

Memory leak

@mbrea-c
Copy link

mbrea-c commented Dec 8, 2023

I am running into what appears to be the same issue with an AMD RX 6600 XT external gpu (connected via thunderbolt). In my case memory usage grows by a constant ~100MB per second, leading to a crash in a few minutes of running. When disconnecting the egpu and running with the integrated intel gpu, the leak disappears.

I can reproduce with a completely default project with only DefaultPlugins added, i.e.:

cargo new bevy_memleak_test
cd bevy_memleak_test
cargo add bevy
echo 'use bevy::prelude::*; fn main() { App::new().add_plugins(DefaultPlugins).run(); }' > src/main.rs
cargo run --release

Both 12.1 and the latest git commit are affected.
When adding the plugins from DefaultPlugins separately and one-by-one, the one that introduces the problem is bevy_render::RenderPlugin.

Here's my adapter and system information for reference:

2023-12-08T17:25:52.689903Z  INFO bevy_render::renderer: AdapterInfo { name: "AMD Radeon RX 6600 XT", vendor: 4098, device: 29695, device_type: DiscreteGpu, driver: "AMD open-source driver", driver_info: "2023.Q2.3 (LLPC)", backend: Vulkan }
2023-12-08T17:25:52.742711Z  INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Linux  Void", kernel: "6.5.13_1", cpu: "11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz", core_count: "4", memory: "15.4 GiB" }

I've tried poking around bevy_render for a bit, but I couldn't find anything else about the root cause :/

@alice-i-cecile
Copy link
Member

Are you able to reproduce this memory leak on wgpu examples? This is our upstream rendering crate, and I suspect that any hardware-specific problems are likely to be found there (or even further upstream).

@mbrea-c
Copy link

mbrea-c commented Dec 8, 2023

Just tried with both wgpu latest and 0.17.1, but I can't reproduce the leak in either. Tried hello_triangle, msaa_line, shadow and skybox examples

@mbrea-c
Copy link

mbrea-c commented Dec 8, 2023

Interesting, similarly to #10260 this seems to only affect the AMDVLK driver. The leak doesn't happen with the RADV driver:

2023-12-08T20:15:48.992436Z  INFO bevy_render::renderer: AdapterInfo { name: "AMD Radeon RX 6600 XT (RADV NAVI23)", vendor: 4098, device: 29695, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 23.3.0", backend: Vulkan }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

No branches or pull requests

5 participants