[BUG] Machine learning memory leak #3142

rafsko1 · 2023-07-07T09:22:35Z

The bug

yacht dashboar is showing that immich_machine_learning is consuming between 20%-60% of ram

The OS that Immich Server is running on

Debian

Version of Immich Server

v1.66.1

Version of Immich Mobile App

v1.66.0

Platform with the issue

Server
Web
Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    logging:
      driver: none
    volumes:
      - tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

###################################################################################
# Database
###################################################################################

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# Optional Redis settings:

# Note: these parameters are not automatically passed to the Redis Container
# to do so, please edit the docker-compose.yml file as well. Redis is not configured
# via environment variables, only redis.conf or the command line

# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=/srv/dev-disk-by-uuid-73d1767c-8c6f-4fd0-bf8b-da093241cda3/Backup/Immich/


###################################################################################
# Typesense
###################################################################################
TYPESENSE_API_KEY=blablablahq1q1q1!!!
# TYPESENSE_ENABLED=false

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

# DISABLE_REVERSE_GEOCODING=false
# REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE=Hello!

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

Reproduction steps

sudo docker-compose pull && sudo docker-compose up -d

Additional information

No response

bo0tzz · 2023-07-07T09:23:46Z

How much ram in MB/GB is it actually using? We can't do much with just a percentage.

weber8thomas · 2023-07-07T09:46:43Z

Same probleme here

Updated to 1.66.1 2 days ago

Here are my current docker stats

You can clearly see the difference since the last update

bo0tzz · 2023-07-07T09:58:34Z

@weber8thomas which version were you running before you updated?

rafsko1 · 2023-07-07T10:47:08Z

Now its consuming 17%, 1.25 GB / 7.63 GB but ive seen it consuming almost 6gb / 8gb

weber8thomas · 2023-07-07T11:09:15Z

@weber8thomas which version were you running before you updated?

I was running 1.65.0

bo0tzz · 2023-07-07T12:24:51Z

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

weber8thomas · 2023-07-07T12:53:38Z

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

The point is that no processes are running (CPU between 0 & 1% on docker stats) and no jobs listed on the admin dashboard of the WEB UI. So that's why this is unusual compare to the previous versions.

mertalev · 2023-07-07T15:43:50Z

Same probleme here

Updated to 1.66.1 2 days ago

Here are my current docker stats

You can clearly see the difference since the last update

Please make a new issue for this. While it's expected that ML will use a high amount of RAM, there are unusual spikes here. Also be sure to mention the version you were using before updating and to post the ML logs.

rafsko1 · 2023-07-08T23:12:32Z

Exactly same story here

rafsko1 · 2023-07-09T12:41:33Z

Now its 63%, 4.81 GB / 7.63 GB and doesn't look like that will drop.

Dodo55 · 2023-07-10T13:27:54Z

@bo0tzz

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

Sorry, but I can confirm that no unloading takes place within a reasonable timeframe and the RAM hogging / memleak goes further when another set of jobs run on a new occassion.
(See attached image as proof)

Please investigate and fix this issue as soon as possible.

Meanwhile I'm thinking on creating a cronjob restarting the ML container every hour as a temporary workaround. Can it cause any trouble?

mertalev · 2023-07-10T14:40:07Z

Running a cronjob for it shouldn't cause an issue.

I think model unloading is causing a memory leak. The first time they're unloaded you can see a small decrease, but the next time RAM usage swells up further.

vikrant82 · 2023-08-03T20:17:43Z

I dont think this is fixed. I am on 1.71.0 and I am still seeing machine learning taking up around 1.6G of memory out of 8G.

rafsko1 · 2023-08-03T21:04:02Z

Same here. 27%, 2.05 GB / 7.63 GB

vikrant82 · 2023-08-03T22:46:29Z

Is there a way to disable machine learning. IMMICH_MACHINE_LEARNING_URL=false and removing machine learning container didnt seem to help as immich server kept crashing.

mertalev · 2023-08-04T00:08:08Z

That memory usage is completely normal.

Is there a way to disable machine learning. IMMICH_MACHINE_LEARNING_URL=false and removing machine learning container didnt seem to help as immich server kept crashing.

Could you share the logs for the server?

vikrant82 · 2023-08-04T00:39:04Z

Ok, I thought the container is supposed to unload the models once it is idle based on the discussion above. Is there a documentation on how to switch off machine learning.

Thanks..

mertalev · 2023-08-04T00:46:23Z

Model unloading is currently disabled by default since it can cause a memory leak.

As for disabling machine learning, the steps you mention are all that should be needed. If the server is crashing, I'd need to see the logs to help you.

Watever44 · 2023-09-04T00:08:06Z

Model unloading is currently disabled by default since it can cause a memory leak.

As for disabling machine learning, the steps you mention are all that should be needed. If the server is crashing, I'd need to see the logs to help you.

If the models are not unloading, can the model with machine learning mean it can keep increasing ?
That's what I am seeing. I also see an increase at midnight, so I suppose that's when some jobs are run. Didn't find where that was set.
Example :
20:45 -> 1.130 gig
23:59 -> 893.371 mib
00:08 -> 2.590 gig
01:47 -> 2.294 gig
11:10 -> 2.470 gig
keep increasing
19:45 -> 2.705 gig
reboot
19:54 -> 1.093 gig
20:02 -> 2.086 gig

I am not sure if it's the same issue with model loading or something else.

mertalev · 2023-09-04T06:17:31Z

Models are loaded on-demand now, so the container will have lower RAM usage until then. Models won't be unloaded after this by default, though. RAM usage can also vary based on the images sent and the number of concurrent requests.

rafsko1 added bug labels Jul 7, 2023

rafsko1 changed the title ~~[BUG] machine learning memory consumption~~ [BUG] RPi 4 8gb machine learning memory consumption Jul 7, 2023

bo0tzz closed this as completed Jul 7, 2023

mertalev changed the title ~~[BUG] RPi 4 8gb machine learning memory consumption~~ [BUG] Machine learning memory leak Jul 10, 2023

mertalev reopened this Jul 10, 2023

This was referenced Jul 10, 2023

refactor(ml): ray serve #2758

Closed

fix(ml): race condition when loading models #3207

Merged

alextran1502 closed this as completed in #3207 Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Machine learning memory leak #3142

[BUG] Machine learning memory leak #3142

rafsko1 commented Jul 7, 2023

bo0tzz commented Jul 7, 2023

weber8thomas commented Jul 7, 2023

bo0tzz commented Jul 7, 2023

rafsko1 commented Jul 7, 2023

weber8thomas commented Jul 7, 2023

bo0tzz commented Jul 7, 2023

weber8thomas commented Jul 7, 2023 •

edited

Loading

mertalev commented Jul 7, 2023

rafsko1 commented Jul 8, 2023

rafsko1 commented Jul 9, 2023

Dodo55 commented Jul 10, 2023 •

edited

Loading

mertalev commented Jul 10, 2023

vikrant82 commented Aug 3, 2023

rafsko1 commented Aug 3, 2023 •

edited

Loading

vikrant82 commented Aug 3, 2023

mertalev commented Aug 4, 2023

vikrant82 commented Aug 4, 2023

mertalev commented Aug 4, 2023

Watever44 commented Sep 4, 2023

mertalev commented Sep 4, 2023

[BUG] Machine learning memory leak #3142

[BUG] Machine learning memory leak #3142

Comments

rafsko1 commented Jul 7, 2023

The bug

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Your docker-compose.yml content

Your .env content

Reproduction steps

Additional information

bo0tzz commented Jul 7, 2023

weber8thomas commented Jul 7, 2023

bo0tzz commented Jul 7, 2023

rafsko1 commented Jul 7, 2023

weber8thomas commented Jul 7, 2023

bo0tzz commented Jul 7, 2023

weber8thomas commented Jul 7, 2023 • edited Loading

mertalev commented Jul 7, 2023

rafsko1 commented Jul 8, 2023

rafsko1 commented Jul 9, 2023

Dodo55 commented Jul 10, 2023 • edited Loading

mertalev commented Jul 10, 2023

vikrant82 commented Aug 3, 2023

rafsko1 commented Aug 3, 2023 • edited Loading

vikrant82 commented Aug 3, 2023

mertalev commented Aug 4, 2023

vikrant82 commented Aug 4, 2023

mertalev commented Aug 4, 2023

Watever44 commented Sep 4, 2023

mertalev commented Sep 4, 2023

weber8thomas commented Jul 7, 2023 •

edited

Loading

Dodo55 commented Jul 10, 2023 •

edited

Loading

rafsko1 commented Aug 3, 2023 •

edited

Loading