Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posting big files causes OOM and has a very strange memory usage #2669

Closed
CPlusPlus17 opened this issue Jan 16, 2018 · 9 comments
Closed

Posting big files causes OOM and has a very strange memory usage #2669

CPlusPlus17 opened this issue Jan 16, 2018 · 9 comments
Labels

Comments

@CPlusPlus17
Copy link

CPlusPlus17 commented Jan 16, 2018

Long story short

Sending very large file with POST causes memory leak and OOM on server.

Expected behaviour

Does not causing a memory leak.

Actual behaviour

Eats memory in a very strange way:

MP1: 2GB
MP2: 8GB
MP3: 6GB
MP4: 5GB
MP5: 12GB
MP6: 10GB
MP7: 15GB
MP8: Crash because OOM

Memory is going up and down within seconds until the process crashes.

Steps to reproduce

session = aiohttp.ClientSession()       
url = "http:/test:9099/savefile.aspx"
headers = {'content-type': 'application/json;', }

body = """
                {{
                    "filename": "{0:s}",
                    "content": "{1:s}",
                    "errorcount": "{2:s}",
                    "log": "{3:s}"
                }}
                """.format("path_" + file_address, base64.b64encode(open(WORKING_DIRECTORY + str(file_address), mode='rb').read()).decode('utf-8'), "0",
                           "nothing")
        resp = await session.post(url, data=body, headers=headers)
        resp = await resp.json()

If the file is around 2.4 GB things are starting to getting broken.

Your environment

Linux: Fedora 27
Python: 3.6.3
aiohttp: 2.3.8 Client

@asvetlov
Copy link
Member

I don't know what MP{X} abbreviation is for and what your test does.
Let's assume you have 1 GB file.
But as I see in code snipped you do the following:

  1. Read entire file into the memory (1GB)
  2. Encode the file content into base64 form (+1.25GB)
  3. Decode b64encoded bytes to utf8 (+1.25GB)
  4. Format to string (+1.25GB)
  5. Send through the wire. Internally aiohttp converts the string into bytes (+1.25GB) and pushed into the buffer (again +1.25GB)

In case of non-ascii data memory consumption doubles at least. You file is 2.5 GB long. Multiply to 2.5.
Do you expect low memory footprint still?

@CPlusPlus17
Copy link
Author

Measure points, in time axis when the post is processed. No I don't except low memory footprint. But as soon as the post is being processed the memory is hopping up and down until oom. This has nothing to do with the read of the file and the decoding. Until the post, memory usage is as expected.

@asvetlov
Copy link
Member

The only thing I can suggest is converting body back to bytes before calling aiohttp: await session.post(url, data=body.encode('utf-8')).

P.S.
Sending gigabytes of data as JSON is bad idea anyway.

@CPlusPlus17
Copy link
Author

Sadly, same behavior. Memory is going crazy and server runs out of memory.

The API is used by a customer and many others application send big files, so they just say it's my problem when it's not working with python... Maybe I can profile it to see why this is happening. Still don't know the source of this after a specific limit is reached.

@fafhrd91
Copy link
Member

@ManuelGysin what is memory usage before session.post() call but after formatting string?

@CPlusPlus17
Copy link
Author

2018-01-20 11:54:24,208 - convert.py:MainThread:381:save_file - INFO - 5596.77734375
2018-01-20 11:54:24,209 - convert.py:MainThread:382:save_file - INFO - Posting file with size 2379

File is in real 2379 MB, memory usage before post is 5596 MB. So we have at least a tripling of the usage within the post code path.

@alk3mist
Copy link

@asvetlov
Copy link
Member

asvetlov commented Oct 3, 2018

aiohttp has internal buffers for sent data, the memory duplication at the moment of transmitting is unavoidable.

Loading a huge data into a memory is a bad idea anyway, moving gigabyte sized buffers in memory forth and back acts as a blocking call.

Streaming upload is the proper solution.

@asvetlov asvetlov closed this as completed Oct 3, 2018
@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants