-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(load_testing): Enable some automatic reconnection handling #159
Conversation
@@ -15,7 +15,7 @@ class MobileAppUser(HttpUser, PhoenixChannelUser): | |||
wait_time = between(1, 5) | |||
socket_path = "/socket" | |||
|
|||
prob_reset_map_data = 0.3 | |||
prob_reset_map_data = 0.02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decreasing this since reseting map data will be rare. This was causing memory usage to spike.
Coverage of commit
|
870c94f
to
f29a4eb
Compare
Coverage of commit
|
load_testing/phoenix_channel.py
Outdated
) | ||
leave_push.send() | ||
return leave_push.get_reply() | ||
|
||
def sleep_with_heartbeat(self, seconds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulled from dotcom
load_testing/phoenix_channel.py
Outdated
# run_forever is blocking | ||
# https://github.com/websocket-client/websocket-client/issues/980#issuecomment-2065628852 | ||
daemon = threading.Thread(target=self.run_forever) | ||
daemon.daemon = True | ||
daemon.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this comment suggesting that threading can be avoided by using rel b/c it is async, but I found run_forever
was blocking even when using rel
as the dispatcher.
I don't love this threading, and I'm pretty sure it is the reason why keyboard interrupt doesn't work when running locust with a single worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm going to try getting rid of rel entirely. Seems like it isn't strictly necessary websocket-client/websocket-client#969
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a way to get this all working without rel
, that seems like it'd be nice, but if that doesn't pan out, this seems like it's fine.
The previous library was experiencing flaky SSL errors when sending messages and had high CPU usage while running locally, even in distributed mode. Switching this library seems to have resolved those issues - ran tests for 200 users joining & leaving the same sets of stops and encountered client errors only caused by backend failures
Coverage of commit
|
@boringcactus could you please re-review? I ended up changing the websocket library to websockets. This seems to have helped the reliability of leaving channels - no more mysterious SSLEof Errors. This also uses less local CPU - I was getting 90% CPU usage warnings with the old version even in distributed mode, but the new version can spawn 200 users not in distributed mode without hitting a CPU warning. |
@@ -7,7 +7,7 @@ | |||
from typing import Any | |||
|
|||
import gevent | |||
import websocket | |||
import websockets.sync.client as websockets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular async version uses asyncio, which is not supported in locust.
Summary
Ticket: Repeat load testing at increased scale
What is this PR for?
This adds a couple of changes to the load testing script to support running it at scale.
The main change is to use a long-lived websocket connection with some baked-in retry logic. With this change, I was able to run a load test with 200 users without hitting the
ssl.SSLEOFError
exception I was seeing before. I did see some logs ofreconnect
, indicating that automatic reconnection was preformed as expected.There is some weirdness around the change to use
run_forever
with threading that I'll comment on specifically.