Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Machine learning memory leak #3142

Closed
1 of 3 tasks
rafsko1 opened this issue Jul 7, 2023 · 20 comments · Fixed by #3207
Closed
1 of 3 tasks

[BUG] Machine learning memory leak #3142

rafsko1 opened this issue Jul 7, 2023 · 20 comments · Fixed by #3207

Comments

@rafsko1
Copy link

rafsko1 commented Jul 7, 2023

The bug

yacht dashboar is showing that immich_machine_learning is consuming between 20%-60% of ram

The OS that Immich Server is running on

Debian

Version of Immich Server

v1.66.1

Version of Immich Mobile App

v1.66.0

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    logging:
      driver: none
    volumes:
      - tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

###################################################################################
# Database
###################################################################################

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# Optional Redis settings:

# Note: these parameters are not automatically passed to the Redis Container
# to do so, please edit the docker-compose.yml file as well. Redis is not configured
# via environment variables, only redis.conf or the command line

# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=/srv/dev-disk-by-uuid-73d1767c-8c6f-4fd0-bf8b-da093241cda3/Backup/Immich/


###################################################################################
# Typesense
###################################################################################
TYPESENSE_API_KEY=blablablahq1q1q1!!!
# TYPESENSE_ENABLED=false

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

# DISABLE_REVERSE_GEOCODING=false
# REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE=Hello!

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

Reproduction steps

sudo docker-compose pull && sudo docker-compose up -d

Additional information

No response

@bo0tzz
Copy link
Member

bo0tzz commented Jul 7, 2023

How much ram in MB/GB is it actually using? We can't do much with just a percentage.

@rafsko1 rafsko1 changed the title [BUG] machine learning memory consumption [BUG] RPi 4 8gb machine learning memory consumption Jul 7, 2023
@weber8thomas
Copy link

Same probleme here

Updated to 1.66.1 2 days ago

image

Here are my current docker stats

image

You can clearly see the difference since the last update

image

@bo0tzz
Copy link
Member

bo0tzz commented Jul 7, 2023

@weber8thomas which version were you running before you updated?

@rafsko1
Copy link
Author

rafsko1 commented Jul 7, 2023

Now its consuming 17%, 1.25 GB / 7.63 GB but ive seen it consuming almost 6gb / 8gb

@weber8thomas
Copy link

@weber8thomas which version were you running before you updated?

I was running 1.65.0

@bo0tzz
Copy link
Member

bo0tzz commented Jul 7, 2023

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

@bo0tzz bo0tzz closed this as completed Jul 7, 2023
@weber8thomas
Copy link

weber8thomas commented Jul 7, 2023

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

The point is that no processes are running (CPU between 0 & 1% on docker stats) and no jobs listed on the admin dashboard of the WEB UI. So that's why this is unusual compare to the previous versions.

@mertalev
Copy link
Contributor

mertalev commented Jul 7, 2023

Same probleme here

Updated to 1.66.1 2 days ago

image

Here are my current docker stats

image

You can clearly see the difference since the last update

image

Please make a new issue for this. While it's expected that ML will use a high amount of RAM, there are unusual spikes here. Also be sure to mention the version you were using before updating and to post the ML logs.

@rafsko1
Copy link
Author

rafsko1 commented Jul 8, 2023

Screenshot 2023-07-09 at 00 10 25

Exactly same story here

@rafsko1
Copy link
Author

rafsko1 commented Jul 9, 2023

Screenshot 2023-07-09 at 13 39 34

Now its 63%, 4.81 GB / 7.63 GB and doesn't look like that will drop.

@Dodo55
Copy link

Dodo55 commented Jul 10, 2023

@bo0tzz

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

Sorry, but I can confirm that no unloading takes place within a reasonable timeframe and the RAM hogging / memleak goes further when another set of jobs run on a new occassion.
(See attached image as proof)

Please investigate and fix this issue as soon as possible.

Meanwhile I'm thinking on creating a cronjob restarting the ML container every hour as a temporary workaround. Can it cause any trouble?
Immich_memoryleak2

Immich_memoryleak

@mertalev mertalev changed the title [BUG] RPi 4 8gb machine learning memory consumption [BUG] Machine learning memory leak Jul 10, 2023
@mertalev mertalev reopened this Jul 10, 2023
@mertalev
Copy link
Contributor

Running a cronjob for it shouldn't cause an issue.

I think model unloading is causing a memory leak. The first time they're unloaded you can see a small decrease, but the next time RAM usage swells up further.

@vikrant82
Copy link

I dont think this is fixed. I am on 1.71.0 and I am still seeing machine learning taking up around 1.6G of memory out of 8G.

image

@rafsko1
Copy link
Author

rafsko1 commented Aug 3, 2023

Same here. 27%, 2.05 GB / 7.63 GB

@vikrant82
Copy link

Is there a way to disable machine learning. IMMICH_MACHINE_LEARNING_URL=false and removing machine learning container didnt seem to help as immich server kept crashing.

@mertalev
Copy link
Contributor

mertalev commented Aug 4, 2023

That memory usage is completely normal.

Is there a way to disable machine learning. IMMICH_MACHINE_LEARNING_URL=false and removing machine learning container didnt seem to help as immich server kept crashing.

Could you share the logs for the server?

@vikrant82
Copy link

Ok, I thought the container is supposed to unload the models once it is idle based on the discussion above. Is there a documentation on how to switch off machine learning.

Thanks..

@mertalev
Copy link
Contributor

mertalev commented Aug 4, 2023

Model unloading is currently disabled by default since it can cause a memory leak.

As for disabling machine learning, the steps you mention are all that should be needed. If the server is crashing, I'd need to see the logs to help you.

@Watever44
Copy link

Model unloading is currently disabled by default since it can cause a memory leak.

As for disabling machine learning, the steps you mention are all that should be needed. If the server is crashing, I'd need to see the logs to help you.

If the models are not unloading, can the model with machine learning mean it can keep increasing ?
That's what I am seeing. I also see an increase at midnight, so I suppose that's when some jobs are run. Didn't find where that was set.
Example :
20:45 -> 1.130 gig
23:59 -> 893.371 mib
00:08 -> 2.590 gig
01:47 -> 2.294 gig
11:10 -> 2.470 gig
keep increasing
19:45 -> 2.705 gig
reboot
19:54 -> 1.093 gig
20:02 -> 2.086 gig

I am not sure if it's the same issue with model loading or something else.

@mertalev
Copy link
Contributor

mertalev commented Sep 4, 2023

Models are loaded on-demand now, so the container will have lower RAM usage until then. Models won't be unloaded after this by default, though. RAM usage can also vary based on the images sent and the number of concurrent requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants