Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

media repo workers use an awful lot of memory #9365

Closed
richvdh opened this issue Feb 10, 2021 · 9 comments
Closed

media repo workers use an awful lot of memory #9365

richvdh opened this issue Feb 10, 2021 · 9 comments
Assignees
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Feb 10, 2021

A lot of it gets cleaned up when we eventually do a gen 2 GC, but 44G seems a bit insane for a process which is essentially a glorified web server:

image

It would be good to find out what is using that memory, with a view to either not using it in the first place, or making it easier to free so that it gets removed in a gen 0 or gen 1 GC.

@richvdh
Copy link
Member Author

richvdh commented Feb 10, 2021

(this tends to cause the box to go into swap, which is obviously bad)

@benbz
Copy link
Contributor

benbz commented Feb 10, 2021

FYI the media repo's memory profile seem to change after a deploy on 18th Jan:
image

@erikjohnston
Copy link
Member

Previous issue: #8653, though the situation seems to have changed somewhat

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Feb 11, 2021
@richvdh richvdh added S-Major Major functionality / product severely impaired, no satisfactory workaround. and removed S-Minor Blocks non-critical functionality, workarounds exist. labels Feb 11, 2021
@richvdh
Copy link
Member Author

richvdh commented Feb 11, 2021

I don't think it's minor.

@michaelkaye
Copy link
Contributor

So we have a tactical cronjob in place on matrix.org for this ; are we going to be able to look into this soon or should we make the automatic restarting more robust?

@clokep
Copy link
Member

clokep commented Feb 16, 2021

This might be a regression from #9108 or one of the earlier PRs updating that code.

@clokep
Copy link
Member

clokep commented Feb 16, 2021

From @richvdh doing a bit of research:

it's mostly discovered by introspection of the heap; but treq.client._BufferedResponse is where the response gets buffered in memory

Looking a bit at this I think setting unbuffered=True on the call to treq.request "fixes" this. I'm not sure of the other ramifications of this though, so looking into that.

@clokep
Copy link
Member

clokep commented Feb 19, 2021

Hopefully the changes in #9421 will help out with this, but I guess it'll be a few days before we know.

@clokep
Copy link
Member

clokep commented Feb 22, 2021

After this change it seems that the huge spikes in memory allocation are gone:

image

It also seems that nothing is being allocated into gen2 anymore:

image

Due to the above I'm going to call this complete. But @richvdh or @benbz please let me know if I'm missing anything here!

@clokep clokep closed this as completed Feb 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants