Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak regression from 0.18 to 0.19 #5113

Closed
wez opened this issue Jan 21, 2024 · 12 comments
Closed

memory leak regression from 0.18 to 0.19 #5113

wez opened this issue Jan 21, 2024 · 12 comments

Comments

@wez
Copy link
Contributor

wez commented Jan 21, 2024

Description

Sorry, this is going to be a bit of a lame report in terms of precisely pointing to the problem.
In wezterm, upgrading from 0.18 to 0.19 has led to several users complaining both of panics about resource utilization and slowly increasing GTT and VRAM usage until OOM occurs in others.

Repro steps

With an AMD GPU, run wezterm and inside it run htop -d 0.
Open https://github.com/Umio-Yasuno/amdgpu_top in gui mode and you can see the GTT number climbing over time, along with VRAM.
I have a larger card than the user in the report above so I didn't experience a fatal OOM but the usage is excessive and climbing dramatically when compared to wezterm with wgpu 0.18,

Note: I reverted the wgpu 0.19 changes in wez/wezterm@bf07f6d to head off any other user reports in the meantime, so if you're actually going to run this you'll want to checkout the parent of that commit. Building is pretty much just a cargo build and should be easy to do (https://wezfurlong.org/wezterm/install/source.html#installing-from-source has instructions)

Expected vs observed behavior

The utilization should be flat as it was in 0.18

Extra materials

See linked issues.

Wezterm's wgpu related code can be found:

Platform

Linux X11 or Wayland with AMD hardware seems prone to the leaky behavior.
I'm not sure about the user with the panic behavior. I don't know whether these are platform-specific manifestations of the same underlying problem.

@Danielkonge
Copy link

Danielkonge commented Jan 21, 2024

Just a note: I also saw the leaky behavior on macOS (arm), where wezterm took over 100 times the amount of ram it usually does (and seemed to grow until my system complained).

@cwfitzgerald
Copy link
Member

Would it be possible to get a heap profiling trace showing the allocation stacks that are leaking?

@wez
Copy link
Contributor Author

wez commented Jan 21, 2024

dhat-heap.json
Here's a dhat heap that you can load into https://nnethercote.github.io/dh_view/dh_view.html
This was produced with cargo build --bin wezterm-gui --features dhat-heap --release (with debug symbols enabled in the Cargo.toml) and running htop for a while

I've not really analyzed it to understand what's in there yet, but sharing now anyway :)

@nical
Copy link
Contributor

nical commented Jan 26, 2024

Might be fixed by #5141.

@vorporeal
Copy link
Contributor

vorporeal commented Mar 6, 2024

A cursory test of our application (using nvtop to monitor GPU memory usage), comparing 0.18, 0.19.1, and 0.19.3 indicates there is a GPU memory leak present in both 0.19.1 and 0.19.3 that is not present in 0.18, so it appears #5141 didn't fix the issue.

Of note - I observed this with an nVidia GPU, so it's not just an AMD issue.

@Aultus-defora
Copy link

I have a similar behaviour with MacOS Ventura 13.4.1 (and later) on Fusion Drive and Intel Mac. While running boids example on 0.18 issue of memory just continuing to grow all the time persists. I only noticed this when all bevy native games (with a camera) started to leak memory. Weirdly enough at v0.13 there is no leak when cargo run --example boids --release; starting from v0.14 cargo run --example boids --release causes memory leak that looks like bevyengine/bevy#9035

@ErichDonGubler
Copy link
Member

ErichDonGubler commented Mar 11, 2024

This might also be related to #5378. 🤔 Ah, the bug wasn't exposed in a release yet, AFAIK. Nevermind!

@vorporeal
Copy link
Contributor

It appears that #5413 fixes the memory leak issue - I tested our app against that commit (merged into main) and the one before it and the memory leak reproduces only on the previous commit.

@vorporeal
Copy link
Contributor

@cwfitzgerald Any chance a 0.19.4 release could be cut that includes #5143? There have been a number of stability improvements in the 0.19 branch that we have been unable to pull into our app due to the (now-fixed) memory leak.

@Wumpf
Copy link
Member

Wumpf commented Mar 30, 2024

👍 I marked #5413 now to needing backport
The next release is scheduled to come fairly soon (10th april), but I agree it might be still worth doing another round of backports

These are all PRs that are known to need backport to a patch release https://github.com/gfx-rs/wgpu/pulls?q=is%3Apr+label%3A%22PR%3A+needs+back-porting%22+

If there's any others that you're aware of please point them out

@Wumpf Wumpf closed this as completed Mar 30, 2024
@Wumpf
Copy link
Member

Wumpf commented Mar 30, 2024

oh, thought both links are the same. Can't speak for #5143 if that one was linked intentionally, we don't have a fix for that yet but I don't think it's a big concern?

@vorporeal
Copy link
Contributor

Ah, no, #5143 was a typo, sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

8 participants