Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage with idle websocket #1955

Closed
naggie opened this issue Jun 5, 2017 · 14 comments
Closed

100% CPU usage with idle websocket #1955

naggie opened this issue Jun 5, 2017 · 14 comments
Assignees
Milestone

Comments

@naggie
Copy link

naggie commented Jun 5, 2017

Long story short

Fairly standard websocket server/client setup, but with session, results in 100% CPU usage.

  1. Generic websocket server
  2. Generic websocket client, but with a session that does a get request (to the same endpoint) before a new request to start a websocket
  3. In a few minutes, CPU usage increases to 100%

Note that removing the get request seems to avoid the problem.

Expected behaviour

Low CPU usage when idle

Actual behaviour

Worker process suddenly goes to 100% CPU usage, confirmed with htop.

Steps to reproduce

Server

import asyncio
import aiohttp
from aiohttp import web

async def worker_websocket(request):
    ws = aiohttp.web.WebSocketResponse()
    await ws.prepare(request)

    async for msg in ws:
        pass

    return ws

app = aiohttp.web.Application()
app.router.add_get('/worker-websocket',worker_websocket)
web.run_app( app, port=int(8080))

worker

import aiohttp
from aiohttp import web
import asyncio

async def websocket_client(loop):
    url = 'http://localhost:8080/worker-websocket'

    session = aiohttp.ClientSession(loop=loop)

    # removing this prevents the problem!
    async with session.get(url) as resp:
        pass

    async with session.ws_connect(url=url) as ws:
        async for msg in ws:
            pass

loop = asyncio.get_event_loop()
client = websocket_client(loop)
loop.run_until_complete(client)

Run server + worker and wait a few minutes. Eventually the server process will eat one core. Remove the session.get and it does not happen.

Your environment

I've reproduced this, so far, on:

  • MacOS 10.12 (Sierra) + Python 3.6.1 with aiohttp 2.1.0
  • Ubuntu server 16.04 LTS + Python 3.5.2 with aiohttp 2.1.0

Using strace, all I can see is highly frequent epoll_wait calls returning 0. My attempts with pyrasite and gdb-python3 yielded nothing useful.

I'm still yet to try ipdb and try to look through the source.

The only thing I can find that is even remotely similar is https://groups.google.com/forum/#!topic/libuv/opFk74lww7k .

Otherwise my experience with aiohttp has been fantastic. Thanks for a great library.

@frillip
Copy link

frillip commented Jun 6, 2017

Confirmed on Windows too.

Windows 7 Enterprise 64bit + Python 3.5.1 amd64 + aiohttp 2.1.0

Usage is show as 25% as it's running on a quad-core machine:
image

Oddly, usage spikes suddenly?

image

@naggie naggie changed the title 100% CPU usage when otherwise idle 100% CPU usage with idle websocket Jun 6, 2017
@fafhrd91
Copy link
Member

fafhrd91 commented Jun 8, 2017

@naggie @frillip could you test master branch

@naggie
Copy link
Author

naggie commented Jun 8, 2017

I can confirm it's still an issue on master, commit 6b85bcd

@pjknkda
Copy link
Contributor

pjknkda commented Jun 10, 2017

It seems like a problem related to keep-alive. When I tested the code above, web_protocol._process_keepalive is called infinitely after 75 seconds later (which is default value of keepalive_timeout) from calling session.get.

    def _process_keepalive(self):
        if self._force_close:
            return

        # 1. "self._keepalive_time" is not updated during waiting "async for msg in ws:"
        # therefore the value of "next" is unchanged.
        next = self._keepalive_time + self._keepalive_timeout

        if len(self._request_handlers) == len(self._waiters):  # <-- 2. no waiters during "async for msg in ws:"
            now = self._time_service.loop_time
            if now + 1.0 > next:
                self.force_close(send_last_heartbeat=True)
                return

        # 3. Therefore, the value of "next" is current time, which calls "self._process_keepalive" immediately.
        self._keepalive_handle = self._loop.call_at(
            next, self._process_keepalive)

@fafhrd91
Copy link
Member

should be fixed in master

@naggie
Copy link
Author

naggie commented Jun 18, 2017

Fantastic! Thank you all for your time!

@ericdevries
Copy link

I have found that this still results in a 100% cpu usage situation after the default timeout of 75 seconds, using it in combination with python-socketio. If needed, I can provide new evidence.

aiohttp version 2.3.1

@zTrix
Copy link

zTrix commented Nov 18, 2017

Same issue happened to me, any hint on how to debug this issue?

@fafhrd91
Copy link
Member

@ericdevries @zTrix could you provide reproducible example?

@zTrix do you use python-socketio as well?

@ericdevries
Copy link

@fafhrd91 sure, I will create one tomorrow and attach it to this ticket

@fafhrd91 fafhrd91 reopened this Nov 20, 2017
@fafhrd91 fafhrd91 added the bug label Nov 20, 2017
@fafhrd91 fafhrd91 assigned fafhrd91 and unassigned fafhrd91 Nov 20, 2017
@fafhrd91 fafhrd91 added this to the 3.0 milestone Nov 20, 2017
@zTrix
Copy link

zTrix commented Nov 20, 2017

@fafhrd91 No I didn't use python-socketio.

In my case I just started an server with websocket, and diving into _process_keepalive function shows that next actually is less than _loop.time(), but len(self._request_handlers) == len(self._waiters) is False, so it's calling itself just like an infinite loop.

@asvetlov
Copy link
Member

Wild idea: disable HTTP keepalives after websocket connection establishment. They are useless, isn't it?

@asvetlov
Copy link
Member

Also looks like RequestHandler._process_keepalive misses self._keepalive_time = self._loop.time() line.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants