Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in Valetudo? #39

Closed
dugite-code opened this issue Oct 30, 2018 · 15 comments
Closed

Memory leak in Valetudo? #39

dugite-code opened this issue Oct 30, 2018 · 15 comments
Labels
bug Something isn't working

Comments

@dugite-code
Copy link

I discovered a small issue when using Home-Assistant to Poll the Map too often (every 10 seconds to get a sort of Live map). The /tmp directory filled and the vacuum locked up. A re-boot fixes the issue obviously.

I had done other things on the vacuum in the past so I might have had less space available than normal.

A quick work-around was to throw this into cron
*/1 * * * * cd /tmp/maps && ls -t /tmp/maps | tail -n +2 | xargs rm --

@dugite-code
Copy link
Author

dugite-code commented Nov 5, 2018

Actually after further looking into the matter it looks like it might be a low memory issue rather than the /tmp partition filling up. I'v added a swap file on /mnt/data to see if that solves my issues

As you can see Valetudo is taking up 76% of the memory:
image

After restart:
image

I have also added */30 * * * * /usr/sbin/service valetudo restart to my crontab

@dugite-code dugite-code changed the title Over-polling map Memory leak in Valetudo? Nov 5, 2018
@Hypfer
Copy link
Owner

Hypfer commented Nov 5, 2018

After looking at the code for 20 minutes, I still have no idea what could be responsible for the memory leak.

However, I've noticed that on every request the robot + charger images are read from flash for no reason so I guess the whole part will be rewritten at some point.
Could you provide a specification for this interface? What endpoints are there? What does Home-Assistant expect?
I'm not using those home automation frameworks nor did I write said code so I don't have the slightest Idea. 🙃

@Hypfer Hypfer added the bug Something isn't working label Nov 5, 2018
@dugite-code
Copy link
Author

dugite-code commented Nov 6, 2018

Essentially all HA is doing is calling YOUR.VACUUM.ROBOT.IP/api/remote/map and then pulling the image using the mapsrc from the json response using a GET request (using the python requests library).

The only real difference between HA and using curl is the speed of the requests, normally HA will do this every 30 seconds I tuned it up to every 10 seconds, I wouldn't think that would cause an issue.

setting drawRobot and drawCharger to false by calling YOUR.VACUUM.ROBOT.IP/api/remote/map?drawRobot=false&drawCharger=false&scale=5&border=3&doCropping=true&drawPath=true looks like it might fix the issue. When drawing the robot and charger at 3min run time it was rising above 25% memory, when not drawing the robot and charger it's bouncing between 16% and 20% at 5min run time.

I'll run this for a while without restarting and let you know if the memory usage stays at 20% mark after an hour it's back at the 56% mark

As a side note: Having a further look into the HA settings I've enabled limit_refetch_to_url_change: True, This should reduce the fetching of the map image a little.

@dugite-code
Copy link
Author

After further looking into it, it's strange I am seeing the issue after all the upstart script has an oom score of 1000 so it should in theory handle itself. I am using an older firmware v11_003194 so perhaps that is the issue.

@Hypfer
Copy link
Owner

Hypfer commented Dec 23, 2018

Is this still happening? Did you try remote debugging and taking a heap dump?

@dugite-code
Copy link
Author

dugite-code commented Dec 25, 2018 via email

@axel-kah
Copy link

I just rooted my rockrobo v1 with latest firmwarebuilder (FW v11_003194, --disable-xiaomi), dummycloud_0.1 and Valetudo 0.9 via rrcc. I really like the newly gained power over my vacuum but observed the same behaviour as @dugite-code. Basically every map related call increases the memory footprint of Valetudo process.

I did some measurements in a notebook to stress Valetudo by doing the same api call 1000 times in a row. requests was used to interact with Valetudo API and output of ps via paramiko for the process info. The memory footprint steadily increases to up to 60% before the process terminates when triggering a new map with api/remote/map and requesting the png.
measurements_trigger_and_read_n_times_2018_12_29_144840

Simply fetching the same png for 1000 times also steadily increases footprint but only by 0.5% points.
measurements_trigger_once_read_n_times_2018_12_29_145514

Doing the same for api/map/latest does also increase footprint somewhat.
measurements_fetch_latest_n_times_2018_12_29_155024

I don't have any experience with node.js apps but I'll try to use the techniques described here to get a clue why the garbage collection fails: https://www.nearform.com/blog/how-to-self-detect-a-memory-leak-in-node/

@axel-kah
Copy link

Tracing the memory leak is a lot harder than I thought. Stumbling blocks:

Debugging on the robot:

  • the binary output of pkg does not support the --inspect flag
  • passing the --inspect flag to pkg --options does not work
  • using memwatch-sigusr2 (depends on memwatch-next) requires crosscompilation (didn't want to get into that)

Debugging on the host:

  • usage of the VAC_MAP_TEST env variable was not self explanatory and could not find more documentation. What kind of file do you have to povide? I only found out it's not a rendered png :)
  • Webserver.js needed slight adaptation to use custom paths for the rrlog files (copied from robot) and temp folder. This effectively mocked the robots filesystem and allows to use the regular code paths that create png maps from logfiles (win!).

Using --inspect was finally possible! Using chromiums debugger for stepping/breaking worked and taking heap snapshots as well. I took several snapshots while requesting 100 maps in between. Unfortunately comparing those snapshots is also not as straight forward as I thought. I can see that the heap increases and creates arrays almost linearly with request count but when trying to attribute it to object class names, function names or anything recognizable from the source I failed because It's all just very generic names in the heap detail view.

Do the devs have any input for me how to debug this properly, what to look for?

@Hypfer
Copy link
Owner

Hypfer commented Dec 30, 2018

Thanks for looking into that and providing this graphs @axel-kah!
Sadly, I can't help much with that. I'd still love to just drop local map generation altogether which should definitely fix this issue as well :^)

This depends on #66

@Hypfer
Copy link
Owner

Hypfer commented Mar 23, 2019

I guess this should be fixed with https://github.com/Hypfer/Valetudo/releases/tag/0.2.2

@Hypfer
Copy link
Owner

Hypfer commented Mar 23, 2019

Nope. Still an issue.

Although it seems like Valetudo is not the only software experiencing it
jimp-dev/jimp#153

@Hypfer
Copy link
Owner

Hypfer commented Mar 23, 2019

Doesn't happen with Node 8 it seems.

@Hypfer
Copy link
Owner

Hypfer commented Mar 23, 2019

Testing Node 11.12 shows no signs of the leak so I guess nodejs/node#23862 this is the culprit.

Fixed in Node 11.10
nodejs/node#25993

@Hypfer Hypfer closed this as completed Mar 23, 2019
@dugite-code
Copy link
Author

dugite-code commented Mar 25, 2019

Awesome, with the map via MQTT it looks like it's now working as expected.

Update: Frustratingly after re-enabling valentudo it again caused the vacuum to unprovision at the 4am re-boot. I am certain it's not the memory leak though as unlike before where the vacuum was locked up due to low memory it was just unprovisioned. After testing this I might set-up the private provisioning from dustcloud as with MQTT I have don't really need miio any more

@lance36
Copy link

lance36 commented Mar 25, 2019

Great work!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants