Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artillery runs forever and never exit (v2) #238

Closed
endyjasmi opened this issue Jan 11, 2017 · 42 comments
Closed

Artillery runs forever and never exit (v2) #238

endyjasmi opened this issue Jan 11, 2017 · 42 comments

Comments

@endyjasmi
Copy link

When the command artillery quick -d 60 -r 256 https://www.company.com/ is executed, artillery will run fine for the first few request and later keep on looping with the following progress report and never exit.

Report for the previous 10s @ 2017-01-11T09:08:56.069Z
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

Following is the environment it runs in:

root@nodejs-512mb-lon1-01:~/artillery# node -v
v6.9.2
root@nodejs-512mb-lon1-01:~/artillery# artillery -V
1.5.0-22

Following is the log file generated by running DEBUG=* artillery quick -d 60 -r 256 https://www.bloomon.nl/ 2>&1 | tee debug.log:
https://gist.github.com/endyjasmi/a597adc6a0fc5e1c874d7abbe9a93262

Thanks in advance for the helps.

@hassy
Copy link
Member

hassy commented Jan 11, 2017

Thanks for the report @endyjasmi. I'm unable to reproduce the issue here. The following command completes fine for me:

artillery quick -d 60 -r 256 http://localhost:8080/

localhost:8080 is an nginx instance, the tests complete when left to run normally and when I kill nginx halfway through a run to simulate the target failing under load.

What does the normal Artillery output look like on your system? Do you see reported latency metrics for the full 60 seconds? Are there any errors reported? What does pgrep -lfa node output once Artillery is printing empty reports?

@endyjasmi
Copy link
Author

@hassy From what I understand, this happens when the connection from artillery to the server is high.

@hassy
Copy link
Member

hassy commented Jan 13, 2017

@endyjasmi Can you post a log? (Just the normal output from Artillery as it runs the test). Can you also check the CPU usage of node processes and check that there are two of them still running?

@portenez
Copy link

portenez commented Feb 15, 2017

I see the same problem.

Report for the previous 10s @ 2017-02-15T16:24:03.192Z
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  Concurrent users:    65
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

It ran against an environment that never went down.

The load generator machine did experience high cpu usage, but then it went down to normal levels.

@hassy
Copy link
Member

hassy commented Feb 15, 2017

@portenez thanks, can you post the full log? Did you ever see metrics being reported while Artillery was running or always just NaN?

@portenez
Copy link

We just figured it out. It was an exception being thrown during custom js code run via beforeRequest. We fixed the exception, and the test was able to complete.

So, I guess there's still a bug. That an exception in one of the preProcessors will cause the whole test to hang

@igorclark
Copy link

Hi folks, I'm seeing this too, but when I'm using a prepared Websocket test. It gets to the end of the test, everything's fine, cluster being tested is fine, and then Artillery just keeps doing this forever:

Report for the previous 10s @ 2017-02-16T17:36:42.428Z
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

@hassy
Copy link
Member

hassy commented Feb 18, 2017

@portenez thanks, that's something to fix. @igorclark does it happen every time or only sometimes?

A reproducible test case would be amazing, but until then the only option I have is to review the code and try to spot any potential causes.

@igorclark
Copy link

@hassy don't know about every time, but certainly 9 times out of 10.

Would love to provide a full test case including back-end, but it's not a public project. Here's the majority of the test plan though - don't know if it'll be much use but hope so :-)

config:
  target: "ws://<host-name>/"
  phases:
    -
      duration: 60
      arrivalRate: 100
  ws:
    # Ignore SSL certificate errors
    # - useful in *development* with self-signed certs
    rejectUnauthorized: false
scenarios:
  -
    engine: "ws"
    flow:
      -
        loop:
          -
            send: '{<json-message-1>}'
          -
            send: '{<json-message-2>}'
          -
            send: '{<json-message-3>}'
          -
            send: '{<json-message-4>}'
        count: 100

@hassy
Copy link
Member

hassy commented Feb 21, 2017

@igorclark Thanks for that. Just to confirm - you don't see any error messages printed to the console at any point during a run that ends up spinning forever?

If you don't mind, next time that happens, does the following command print anything:

pgrep -lfa node | grep worker.js

@igorclark
Copy link

Hey @hassy - no, no Artillery error messages. There are sometimes a couple of "ECONNRESET" or "connection closed by other party" when the server is getting really hammered, but that's normal, no? And when it ends up just spinning, it doesn't have any other errors, just the report every 10s.

I'll do another run tomorrow and run the pgrep command to get you some output.

@tiaod
Copy link

tiaod commented Mar 9, 2017

same issue
image
image
I set a very height arrivalRate and it seems run forever and never stop.

hello.yml:

config:
  target: 'http://www.example.com'
  http:
    pool: 10
  phases:
    - duration: 60
      arrivalRate: 2000
  defaults:
    headers:
      Connection: close

scenarios:
  - flow:
    - get:
        url: "/"

@tiaod
Copy link

tiaod commented Mar 9, 2017

Ten minutes later, it's still running,
image

@hassy
Copy link
Member

hassy commented Mar 10, 2017

Thanks @tiaod. Seems like high arrival rates have something to do with it but I was unable to reproduce previously. I'll give it another go though.

@hassy
Copy link
Member

hassy commented Mar 10, 2017

@tiaod are you wrapping Artillery in another script? Your issue may be related to #264 then.

@tiaod
Copy link

tiaod commented Mar 11, 2017

@hassy Oh I forgot to mention that. You are right, I use child_proccess spanw to run Artillery. But on a lower arrivalRate (eg: 1000), it could stop normally.

@peara
Copy link

peara commented Mar 27, 2017

Hi @hassy
I have the same issue when testing a socket io server with high arrivalRate.
From what I see, there are some connections which never get closed.
I think when the server unable to handle that much request, artillery will just wait forever.
The sever also seems to ignore these connections while artillery just waits and sends nothing.
Is there a mechanism to let artillery close these connections itself instead of waiting for response from server?
I would also like to consider these scenarios as failure.

The same thing happens if I don't start the server. Artillery will keep waiting.

@hassy
Copy link
Member

hassy commented Mar 27, 2017

@peara Thanks for the info. Artillery does try to time out connections - with an explicit timeout setting for HTTP and the default value used by the underlying library for Socket.io - something to look into to try to solve this. Cheers

@cwthompson
Copy link

Unsure if relevant but thought it worth posting.

Recently I came across this issue as well and it seemed to be issues with target and the scenarios url.

example:

  target: 'http://myurlexample.com/'
  http:
    pool: 10
  phases:
    - duration: 60
      arrivalRate: 10

scenarios:
  - flow:
    - get:
        url: "preview/1/"

resulted in: the NaN issue as described by @igorclark

Report @ 11:40:26(+0100) 2017-09-28
  Scenarios launched:  100
  Scenarios completed: 0
  Requests completed:  0
  Concurrent users:    289
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

but changing the target to myurlexample.com and scenarios url to /preview/1/ removed the NaN's from the results.

Summary report @ 11:42:41(+0100) 2017-09-28
  Scenarios launched:  600
  Scenarios completed: 600
  Requests completed:  600
  Concurrent users:    60
  RPS sent: 9.93
  Request latency:
    min: 95.5
    max: 228.7
    median: 108.5
    p95: 146.8
    p99: 182.9
  Scenario duration:
    min: 105
    max: 303
    median: 132
    p95: 191.5
    p99: 239.5
  Scenario counts:
    0: 600 (100%)
  Codes:
    200: 600

@Limess
Copy link

Limess commented Aug 7, 2018

We suffered this when using an invalid request URL (literal url). This issue describes the details #512.

This would explain why prefixing the path worked for @cwthompson, as any URL which is not prefixed by / is treated as an absolute URL.

@JoeScho
Copy link
Contributor

JoeScho commented May 20, 2019

Hey @hassy 👋

I'm experiencing the same symptoms.

Artillery: 1.6.0-28
Node.js: v12.2.0
OS: darwin/x64

My config looks something like:

config:
  target: "https://example.com"
  phases:
    - duration: 60
      arrivalRate: 20
  defaults:
    headers:
      x-api-key: 'XXXXXXX'
scenarios:
    flow:
      - get:
          url: "/example"
          qs:
            foo: "bar"

The issue is intermittent (occurring ~2/3rds of the time). Below are two runs, one run immediately after the other with no config changes.

Run no. 1 (error)

$ artillery run example.yml

Started phase 0, duration: 60s @ 16:57:05(+0100) 2019-05-20
Report @ 16:57:15(+0100) 2019-05-20
Elapsed time: 10 seconds
  Scenarios launched:  199
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: 20.1
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

Run no. 2 (success)

$ artillery run example.yml

Started phase 0, duration: 60s @ 16:57:17(+0100) 2019-05-20
Report @ 16:57:27(+0100) 2019-05-20
Elapsed time: 10 seconds
  Scenarios launched:  199
  Scenarios completed: 86
  Requests completed:  86
  RPS sent: 20.12
  Request latency:
    min: 124.4
    max: 202.6
    median: 137.4
    p95: 182.5
    p99: 201.6
  Codes:
    200: 86

If I get to the bottom of it I'll post here. If you have any ideas about what I might be doing wrong, or if I can provide any more information please let me know!

Thanks

@kbjr
Copy link

kbjr commented Dec 15, 2019

I seem to be seeing the same issue, but have a bit more info that might be useful. First, see this screenshot, showing the issue with the hung test:

Screenshot_1

The log output on the service has stopped, all the received requests have completed successfully. The test runner itself is in the hung state and sending no more requests (the test should have ended at 1 minute).

The interesting part is that, when I kill the test runner, it seems like one last small batch of requests is fired off just before artillery exits:

Screenshot_2

Notice the last two requests in the server log, that are received immediately upon killing artillery. You can also see that these requests are then immediately aborted (the finished=false implying that the connection was closed on these requests before a response was sent, likely because artillery exited).

My guess would be that artillery is hung because it's waiting for these requests, but the requests were never actually sent. And then killing the process causes them to somehow get flushed out and finally sent.

@vojtech-cerveny
Copy link

This is so annoying issue 😭

I have artillery for stress tests and I want to find a limits of server - and artillery found it, but I am not able to get some reasonable output from artillery due this issue.

My setup - I run this 2x parallel in docker containers to avoid problems with CPU overload.

config:
  target: "ws://raspberry:3000/ws?templateId=1&username=JakeTheDog"
  phases:
    - duration: 120
      arrivalRate: 30
      name: "Stress creating connection"
  processor: "./utils/elasticsearch.js"

scenarios:
  - engine: "ws"
    flow:
      - send: "ping pong"
      - think: 1

The problem begins when the server begins to have data processing problems and starts to have timeouts and eventually crashes. After that, artillery starts show

Report @ 14:05:46(+0000) 2020-01-17
Elapsed time: 11 minutes, 10 seconds
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

Any solution for that? It is so annoying due I send results from artillery to elasticsearch and now, it doesn't send anything.

@vojtech-cerveny
Copy link

@hassy - Can you provide some information about this issue? Without it, it doesn't make any sense to use it 🤷‍♂️

So please provide some information, if you care about that or this will be still open and we should look for another tool for stress testing.

@hassy
Copy link
Member

hassy commented Jan 27, 2020

It's an extremely difficult issue to reproduce (and to create a reproducible test case for). In your case specifically @vojtech-cerveny, there's nothing unusual about the test script itself, so the issue is somewhere else, but there's no way to try to isolate what it might be unless you can provide more information.

@whitebyte
Copy link

whitebyte commented Feb 13, 2020

I can confirm that the issue is consistently reproducible with high arrival rate.

Artillery: 1.6.0-29
Artillery Pro: not installed (https://artillery.io/pro)
Node.js: v12.14.1
OS: linux/x64

The higher arrival rate, the longer it takes from Artillery to finish and print a report. E.g. with timeout = 30s it succeeds to finish in 30s with arrival rate = 10, but it takes 60s for arrival rate = 500

I'm running it on a pretty beefy machine that has a lot of free CPU, yet I see CPU usage warnings

@hassy
Copy link
Member

hassy commented Feb 13, 2020

@whitebyte If Artillery still completes successfully, that's not an instance of the issue. It would make sense that the test takes a longer time to complete if you're creating more virtual users (50 times more if you're going from 10 to 500/second).

CPU warnings-wise - that also still makes sense. Artillery uses one CPU by default, so you're likely maxing a CPU, especially with a high arrival rate (each arrival is a whole new TCP connection by default).

@whitebyte
Copy link

I'm using only one phase with a constant arrival rate. After some arrival rate threshold Artillery starts working indefinitely, eventually spitting NaNs

@whitebyte
Copy link

Turns out that elapsed time is ignored during the run and only # of scenarios launched is taken into account. It looks like Artillery calculates the required number of scenarios to be launched before start, and then just sticks to this number, ignoring the actual time passed. Which is OK, but should be documented somewhere.

@ihemantkumar
Copy link

ihemantkumar commented Jul 7, 2020

Hi, I am running artillery too for load testing.

Artillery: 1.6.1
Node.js: v10.15.3
OS: linux/x64

I am getting the same issue. Did anyone able to resolve this issue? If yes, please share.
I am attaching the screens from my trial run. Here's the config I used.

config:
    target: "someURL"
    ensure:
      maxErrorRate: 1
    phases:
      - duration: 10
        arrivalRate: 5
        rampTo: 10
        name: "Warming up"
      - duration: 20
        arrivalRate: 10
        rampTo: 50
        name: "Max load"
scenarios:
  - engine: "ws"
    flow:
      - send: "hello"
      - think: 3
      - send: "how are you?"

1
2

@rained23
Copy link

rained23 commented Sep 4, 2020

I encountered the same issue too.
When my server crash, artillery will run forever. I have a timeout set.
I am expecting that artillery will just consider a scenario to be failed so it can finish the load test.
I don't know why it stops launching a scenario if the whole test is based on the calculated total scenario.

My config

  http:
    pool: 30
    timeout: 10
  phases:
    - duration: 60
      arrivalRate: 1
      name: Warm up
    - duration: 120
      arrivalRate: 1
      rampTo: 30
      name: Ramp up load
    - duration: 300
      arrivalRate: 30
      name: Sustain load

image

@alexanderankin
Copy link

still happening:
image

@hassy
Copy link
Member

hassy commented Mar 3, 2021

Which version of Artillery is that @alexanderankin? (artillery -V) What type of service is that? (HTTP, Socket.io, WS, something else) What does the earlier part of the report look like, e.g. are there any errors?

@alexanderankin
Copy link

@hassy I am using errors to stop my scenario, its using http, version 1.6.1.

@mirao
Copy link

mirao commented Apr 3, 2021

I'm having the same issue. My test runs once per day, in docker container (based on image node:12). Till now (for several months) it worked as expected, but in last week the issue happened 3 times.

Here is a log: https://gist.github.com/mirao/780e34d7443a327cbac02306c89767b0. You can see that the test has been already running for more than 6 hours and haven't finished yet. Normally it finishes in a few minutes.
From log ("Requests completed") it looks that only 5994 requests (from expected 6000 requests) were sent by Artillery client.
Cannot see any issue/error/outage on server side nor in client (artillery).

TEACHER_TOKEN=some_teacher_token STUDENT_TOKEN=some_student_token npx artillery run -e dev --output output/artillery.json load-test.yml

load-test.yml:

config:
  socketio:
    transports: ["websocket"]
  phases:
    - duration: 120 # Run scenario for x seconds
      arrivalCount: 2000 # Create total of x clients
  ensure:
    maxErrorRate: 0
  environments:
    dev:
      target: "wss://my_socketio_server_in_kubernetes"
    scenarios:
  - engine: "socketio"
    name: "Get teacher's sections"
    flow:
      - emit:
          channel: "authenticate"
          data:
            {
              "strategy": "local",
              "accessToken": "{{ $processEnvironment.TEACHER_TOKEN }}",
            }
          acknowledge:
            match:
              json: "$.1.authenticated"
              value: true
      - think: 1
      - emit:
          channel: "section::find"
          data: {}
      - think: 20
      - emit:
          channel: "section::find"
          data: {}
      - think: 120
  - engine: "socketio"
    name: "Get student's sections"
    flow:
      - emit:
          channel: "authenticate"
          data:
            {
              "strategy": "local",
              "accessToken": "{{ $processEnvironment.STUDENT_TOKEN }}",
            }
          acknowledge:
            match:
              json: "$.1.authenticated"
              value: true
      - think: 1
      - emit:
          channel: "section::find"
          data: {}
      - think: 30
      - emit:
          channel: "section::find"
          data: {}
      - think: 120

$ npx artillery -V

------------ Version Info ------------
Artillery: 1.6.1
Artillery Pro: not installed (https://artillery.io/pro)
Node.js: v12.20.0
OS: linux/x64
--------------------------------------

@like-thrill
Copy link

------------ Version Info ------------
Artillery: 1.7.6
Artillery Pro: not installed (https://artillery.io/pro)
Node.js: v16.10.0
OS: darwin/x64

While running artillery script for more than 2 hours. To given duration, it is running fine but after completing the given time, it goes in an infinite loop.

Logs Eg. after completing given duration:
Request latency:
min: NaN
max: NaN
median: NaN
p95: NaN
p99: NaN

If we run the same script for around 1 hour it's running fine and reports JSON file created.

@hassy FYI...

@jasperblues
Copy link

For me just using an afterResponse processor script will cause Artillery to hang.

@hassy
Copy link
Member

hassy commented Jul 28, 2022

@jasperblues is your afterResponse hook calling the next() callback to return control to Artillery?

https://www.artillery.io/docs/guides/guides/http-reference#afterresponse-hooks

@jasperblues
Copy link

Ooops! Thanks @hassy . . . i guess that's a common n00b error.

@adispennette
Copy link

Hello, I am seeing a very similar issue, when I intentionally put a bad path in the config for the processor artillery never exits.
I set up a working test then just pass in an override for the processor that is a path that does not exist and then it runs forever.

--overrides '{"config": { "processor": "~/functional-test/artillery-error-test/src/test-performance/processor.js" } }'

VERSION INFO:

Artillery Core: 2.0.0-22
Artillery Pro:  not installed (https://artillery.io/product)

Node.js: v18.6.0
OS:      darwin

@jgu12
Copy link

jgu12 commented Sep 7, 2022

Hello, I am seeing a very similar issue, when I intentionally put a bad path in the config for the processor artillery never exits. I set up a working test then just pass in an override for the processor that is a path that does not exist and then it runs forever.

--overrides '{"config": { "processor": "~/functional-test/artillery-error-test/src/test-performance/processor.js" } }'

VERSION INFO:

Artillery Core: 2.0.0-22
Artillery Pro:  not installed (https://artillery.io/product)

Node.js: v18.6.0
OS:      darwin

I'm seeing the same behavior on Artillery Core: 2.0.0-23, test with an invalid processor path errors out but never exits

@honarkhah
Copy link

Drop the latest link that will resolve the issue to help people to find it easier as this is the first link on search!

#1799

@hassy hassy closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests