Memory leak after running onnx model numerous times #22271

vsbaldeev · 2024-09-30T13:14:38Z

Describe the issue

In then following scenario I see that memory consumption of python process is continuously growing:

Train pipeline of CountVectorizer and RandomForestClassifier using sklearn.
Then convert pipeline to onnxruntime.InferenceSession.
Run model many times on random data and measure memory consumption.

To reproduce

The following code run run_model_many_times function two twice. Once with running onnxruntime.InferenceSession method and once without. While code is running it is printing memory consumption in megabytes, which is close to macOs activity monitor numbers.

In the first case memory consumption is continuously growing and in the second case is not.

import uuid

import onnxruntime
import psutil
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import StringTensorType
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline


def create_trained_model():
    x = [str(uuid.uuid4()) for _ in range(100)]
    y = ["a" for _ in range(50)] + ["b" for _ in range(50)]

    pipeline = Pipeline(
        steps=[
            ("vectorizer", CountVectorizer()),
            ("classifier", RandomForestClassifier())
        ]
    )

    pipeline.fit(x, y)

    return pipeline


def convert_sklearn_model_to_onnx_session(sklearn_model):
    onnx_model_proto_string = convert_sklearn(
        sklearn_model,
        initial_types=[('features', StringTensorType((None,)))],
        verbose=False
    ).SerializeToString()

    return onnxruntime.InferenceSession(onnx_model_proto_string, providers=["CPUExecutionProvider"])


def get_one_prediction(onnx_model, input_data):
    return onnx_model.run(
        [onnx_model.get_outputs()[1].name],
        {onnx_model.get_inputs()[0].name: input_data}
    )


def get_used_megabytes():
    # https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_full_info
    # uss (Linux, macOS, Windows): aka “Unique Set Size”,
    # this is the memory which is unique to a process and which would be freed
    # if the process was terminated right now.
    return psutil.Process().memory_full_info().uss / (1024 * 1024)


def run_model_many_times(onnx_model, count, *, dummy):
    print("run_model_many_times, dummy = ", dummy)

    print("Memory in the beginning = ", get_used_megabytes())

    for i in range(count):
        input_data = [str(uuid.uuid4()) for _ in range(20)]

        if not dummy:
            get_one_prediction(onnx_model, input_data)

        if i % 10000 == 0:
            print("Memory in the middle = ", get_used_megabytes())

    print("Memory in the end = ", get_used_megabytes())


def main():
    sklearn_model = create_trained_model()
    onnx_model = convert_sklearn_model_to_onnx_session(sklearn_model)

    count = 100000

    run_model_many_times(onnx_model, count, dummy=False)
    run_model_many_times(onnx_model, count, dummy=True)


if __name__ == '__main__':
    main()

Example of printing:

run_model_many_times, dummy =  False
Memory in the beginning =  126.875
Memory in the middle =  126.875
Memory in the middle =  140.96875
Memory in the middle =  156.296875
Memory in the middle =  171.671875
Memory in the middle =  187.015625
Memory in the middle =  202.328125
Memory in the middle =  217.640625
Memory in the middle =  232.96875
Memory in the middle =  248.3125
Memory in the middle =  263.625
Memory in the end =  278.96875

run_model_many_times, dummy =  True
Memory in the beginning =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the middle =  278.96875
Memory in the end =  278.96875

Urgency

On production server python process consumed 100 GB of RAM memory after two months. The only solution is restart the process. So yes, it's urgent.

Platform

Linux

OS Version

both of MacOs 15.0 and Ubuntu 20

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19.2

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

yuslepukhin · 2024-10-04T20:56:04Z

If you are running on CPU with languages other than python you can disable MemoryArena:

https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L886

### Description Expose enable_mem_arena property for SessionOptions ### Motivation and Context #22271

vsbaldeev · 2024-10-07T09:23:50Z

@yuslepukhin Thank you for your attention on this problem. I have several questions about enable_cpu_mem_arena property.

According to API https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.enable_cpu_mem_arena I already have the opportunity to disable MemoryArena. Haven't I ?
What are the consequences of disabling MemoryArena? Is performance degradation one of them?

yuslepukhin · 2024-10-07T15:24:02Z

Memory arena was created to pool GPU allocations as they are much more expensive. However, historically, it was left on by default for CPU scenarios.

This is a slub allocator. Memory arena serves only Tensor allocations and nothing else.
With the arena off, all allocations for CPU EP go to the OS heap.

Typically, OS heap is much more intelligent than arena locking (CUDA workloads usually do not run from multiple threads) and thus may benefit users. The exact perf impact may depend on the model you are running and the environment.

In the past we have fixed a few kernels that exhibited heap contention due to inefficiencies, but not many. It does reduce memory footprint.

Python environment due to its GC makes matters more complicated. Said that, I will look into your scenario.

vsbaldeev · 2024-10-08T08:59:18Z

As far as I understand, there are no memory leaks in the main branch version because of #22323 and one can disable memory arena in python. Will this fix be added to the next release ? Which version should I use in my scenario ?

yuslepukhin · 2024-10-09T19:01:22Z

We are not aware of any memory leaks on Onnxruntime side. However, one issue here, if memory arena is enabled and at least one OrtValue leaks and/or not garbage collected, then that arena is held allocated.

I cannot recommend you a specific version, we usually recommend the latest. I think this PR would be in the upcoming release.

Depending on the scenario, some customers reported a slight perf improvement and some slight degradation when memory arena is disabled, but nothing major. However, everyone likes the reduced memory consumption.

yuslepukhin · 2024-10-15T18:25:33Z

I ran the same model via C++ API and there are no leaks. I will check Python side when I get a chance.

yuslepukhin · 2024-10-15T19:42:58Z

I ran it with tracemalloc and while I cannot explain everything, this is what it shows.

It seems that the top offender is uuid module.
Its memory is not growing when its results are not used for inferencing.
Somehow, a large amount of memory is retained otherwise, and there might be some connection
between uuid strings generated and uuid internals, though it beats me to explain what it might be.
I tried to run a GC, but it has no effect on it.

Memory gains from onnxruntime appear to be minimal and are likely to be GCed.

I will check if we are leaking those strings in Pybind.

The PID of this script is: 28556
run_model_many_times, dummy = False

[ Top 10 differences ]
c:\Python\lib\tracemalloc.py:558: size=57.8 KiB (+57.8 KiB), count=1115 (+1114), average=53 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\skl2onnx\operator_converters\text_vectoriser.py:426: size=1232 B (-24.7 KiB), count=22 (-452), average=56 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\skl2onnx\operator_converters\text_vectoriser.py:105: size=0 B (-19.5 KiB), count=0 (-415)
d:\dev\data\gh_issue_22271.venv\lib\site-packages\skl2onnx\common_container.py:797: size=0 B (-11.4 KiB), count=0 (-209)
d:\dev\data\gh_issue_22271.venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:266: size=2118 B (+2118 B), count=55 (+55), average=39 B
c:\Python\lib\uuid.py:281: size=1780 B (+1780 B), count=21 (+21), average=85 B
D:\dev\data\gh_issue_22271\repro.py:65: size=680 B (+680 B), count=3 (+3), average=227 B
c:\Python\lib\tracemalloc.py:423: size=568 B (+568 B), count=4 (+4), average=142 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_init_.py:1127: size=560 B (+496 B), count=2 (+1), average=280 B
D:\dev\data\gh_issue_22271\repro.py:73: size=496 B (+496 B), count=1 (+1), average=496 B
Memory in the middle = 147.01171875

[ Top 10 differences ]
c:\Python\lib\uuid.py:281: size=16.2 MiB (+16.2 MiB), count=200021 (+200001), average=85 B
c:\Python\lib\tracemalloc.py:193: size=29.5 KiB (-27.8 KiB), count=629 (-592), average=48 B
c:\Python\lib\tracemalloc.py:558: size=87.6 KiB (+27.3 KiB), count=1746 (+587), average=51 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:266: size=7890 B (+5900 B), count=121 (+68), average=65 B
c:\Python\lib\uuid.py:138: size=1425 B (+1425 B), count=2 (+2), average=712 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_init_.py:1127: size=560 B (+560 B), count=2 (+2), average=280 B
D:\dev\data\gh_issue_22271\repro.py:73: size=496 B (+496 B), count=1 (+1), average=496 B
D:\dev\data\gh_issue_22271\repro.py:80: size=0 B (-496 B), count=0 (-1)
d:\dev\data\gh_issue_22271.venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:248: size=474 B (+474 B), count=2 (+2), average=237 B
D:\dev\data\gh_issue_22271\repro.py:45: size=784 B (+392 B), count=8 (+7), average=98 B
Memory in the middle = 219.84375

[ Top 10 differences ]
c:\Python\lib\uuid.py:281: size=32.4 MiB (+16.2 MiB), count=400021 (+200001), average=85 B
D:\dev\data\gh_issue_22271\repro.py:73: size=496 B (+496 B), count=1 (+1), average=496 B
D:\dev\data\gh_issue_22271\repro.py:80: size=0 B (-496 B), count=0 (-1)
c:\Python\lib\tracemalloc.py:484: size=0 B (-128 B), count=0 (-2)
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_pswindows.py:727: size=1528 B (+112 B), count=6 (+2), average=255
:1: size=376 B (+96 B), count=4 (+2), average=94 B
c:\Python\lib\tracemalloc.py:99: size=0 B (-80 B), count=0 (-1)
Memory in the middle = 295.2109375

[ Top 10 differences ]
c:\Python\lib\uuid.py:281: size=48.6 MiB (+16.2 MiB), count=600021 (+200001), average=85 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:266: size=11.6 KiB (+5238 B), count=152 (+73), average=78 B
D:\dev\data\gh_issue_22271\repro.py:45: size=4816 B (+4424 B), count=80 (+79), average=60 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_init_.py:1127: size=560 B (+560 B), count=2 (+2), average=280 B
D:\dev\data\gh_issue_22271\repro.py:73: size=496 B (+496 B), count=1 (+1), average=496 B
D:\dev\data\gh_issue_22271\repro.py:80: size=0 B (-496 B), count=0 (-1)
:1: size=432 B (+432 B), count=5 (+5), average=86 B
c:\Python\lib\tracemalloc.py:193: size=46.0 KiB (-336 B), count=981 (-7), average=48 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_pswindows.py:727: size=1528 B (+176 B), count=6 (+3), average=255 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_pswindows.py:896: size=752 B (+144 B), count=2 (+1), average=376 B
Memory in the middle = 362.9375

[ Top 10 differences ]
c:\Python\lib\uuid.py:281: size=64.9 MiB (+16.2 MiB), count=800021 (+200001), average=85 B
c:\Python\lib\tracemalloc.py:193: size=53.2 KiB (-19.1 KiB), count=1134 (-407), average=48 B
c:\Python\lib\tracemalloc.py:558: size=87.9 KiB (+18.9 KiB), count=1754 (+404), average=51 B
d:\dev\data\gh_issue_22271.venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:266: size=12.6 KiB (+1005 B), count=167 (+15), average=77 B
D:\dev\data\gh_issue_22271\repro.py:73: size=496 B (+496 B), count=1 (+1), average=496 B
D:\dev\data\gh_issue_22271\repro.py:80: size=0 B (-496 B), count=0 (-1)
c:\Python\lib\tracemalloc.py:484: size=0 B (-128 B), count=0 (-2)
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_pswindows.py:989: size=112 B (+88 B), count=3 (+2), average=37 B
c:\Python\lib\tracemalloc.py:99: size=0 B (-80 B), count=0 (-1)
d:\dev\data\gh_issue_22271.venv\lib\site-packages\psutil_pswindows.py:727: size=1480 B (+64 B), count=5 (+1), average=296 B

yuslepukhin · 2024-10-23T18:04:06Z

There is a memory leak in PyBind layer. This is only manifested when data is not fed via numpy array. Until it is fixed, the simple workaround is to feed data with numpy array.

vsbaldeev · 2024-10-24T08:15:31Z

I wrapped the data with numpy array.

import uuid

import onnxruntime
import psutil
import numpy
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import StringTensorType
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline


def create_trained_model():
    x = [str(uuid.uuid4()) for _ in range(100)]
    y = ["a" for _ in range(50)] + ["b" for _ in range(50)]

    pipeline = Pipeline(
        steps=[
            ("vectorizer", CountVectorizer()),
            ("classifier", RandomForestClassifier())
        ]
    )

    pipeline.fit(x, y)

    return pipeline


def convert_sklearn_model_to_onnx_session(sklearn_model):
    onnx_model_proto_string = convert_sklearn(
        sklearn_model,
        initial_types=[('features', StringTensorType((None,)))],
        verbose=False
    ).SerializeToString()

    options = onnxruntime.SessionOptions()
    options.enable_cpu_mem_arena = False

    return onnxruntime.InferenceSession(
        onnx_model_proto_string,
        providers=["CPUExecutionProvider"],
        sess_options=options
    )


def get_one_prediction(onnx_model, input_data):
    return onnx_model.run(
        [onnx_model.get_outputs()[1].name],
        {onnx_model.get_inputs()[0].name: input_data}
    )


def get_used_megabytes():
    # https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_full_info
    # uss (Linux, macOS, Windows): aka “Unique Set Size”,
    # this is the memory which is unique to a process and which would be freed
    # if the process was terminated right now.
    return psutil.Process().memory_full_info().uss / (1024 * 1024)


def run_model_many_times(onnx_model, count, *, dummy):
    print("run_model_many_times, dummy = ", dummy)

    print("Memory in the beginning = ", get_used_megabytes())

    for i in range(count):
        input_data = numpy.array([str(uuid.uuid4()) for _ in range(20)])

        if not dummy:
            get_one_prediction(onnx_model, input_data)

        if i % 10000 == 0:
            print("Memory in the middle = ", get_used_megabytes())

    print("Memory in the end = ", get_used_megabytes())


def main():
    sklearn_model = create_trained_model()
    onnx_model = convert_sklearn_model_to_onnx_session(sklearn_model)

    count = 100000

    run_model_many_times(onnx_model, count, dummy=False)
    run_model_many_times(onnx_model, count, dummy=True)


if __name__ == '__main__':
    main()

Code above is producing the following output

run_model_many_times, dummy =  False
Memory in the beginning =  129.03125
Memory in the middle =  129.03125
Memory in the middle =  129.0625
Memory in the middle =  129.1875
Memory in the middle =  129.328125
Memory in the middle =  129.328125
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the end =  129.375
run_model_many_times, dummy =  True
Memory in the beginning =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the middle =  129.375
Memory in the end =  129.375

yuslepukhin added the core runtime issues related to core runtime label Oct 1, 2024

yuslepukhin mentioned this issue Oct 4, 2024

[PyBind] Expose enable_mem_arena property for SessionOptions #22323

Merged

yuslepukhin added a commit that referenced this issue Oct 4, 2024

[PyBind] Expose enable_mem_arena property for SessionOptions (#22323)

0645ad1

### Description Expose enable_mem_arena property for SessionOptions ### Motivation and Context #22271

sophies927 added the memory label Oct 10, 2024

yuslepukhin self-assigned this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak after running onnx model numerous times #22271

Memory leak after running onnx model numerous times #22271

vsbaldeev commented Sep 30, 2024 •

edited

Loading

yuslepukhin commented Oct 4, 2024

vsbaldeev commented Oct 7, 2024

yuslepukhin commented Oct 7, 2024 •

edited

Loading

vsbaldeev commented Oct 8, 2024

yuslepukhin commented Oct 9, 2024

yuslepukhin commented Oct 15, 2024

yuslepukhin commented Oct 15, 2024

yuslepukhin commented Oct 23, 2024

vsbaldeev commented Oct 24, 2024

Memory leak after running onnx model numerous times #22271

Memory leak after running onnx model numerous times #22271

Comments

vsbaldeev commented Sep 30, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

yuslepukhin commented Oct 4, 2024

vsbaldeev commented Oct 7, 2024

yuslepukhin commented Oct 7, 2024 • edited Loading

vsbaldeev commented Oct 8, 2024

yuslepukhin commented Oct 9, 2024

yuslepukhin commented Oct 15, 2024

yuslepukhin commented Oct 15, 2024

yuslepukhin commented Oct 23, 2024

vsbaldeev commented Oct 24, 2024

vsbaldeev commented Sep 30, 2024 •

edited

Loading

yuslepukhin commented Oct 7, 2024 •

edited

Loading