Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Node stops responding to RPC calls #8480

Closed
peterbitfly opened this issue Apr 25, 2018 · 13 comments
Closed

Node stops responding to RPC calls #8480

peterbitfly opened this issue Apr 25, 2018 · 13 comments
Assignees
Labels
M6-rpcapi 📣 RPC API. Z3-stale 🍃 Issue is in principle valid, but it is not relevant anymore or can not reproduced.
Milestone

Comments

@peterbitfly
Copy link

I'm running:

  • Which Parity version?: v1.10.2
  • Which operating system?: Linux
  • How installed?: via binaries
  • Are you fully synchronized?: yes
  • Which network are you connected to?: ethereum
  • Did you try to restart the node?: N/A

Since upgrading our parity nodes to 1.9.6 / 1.10.1 / 1.10.2 we see cases where the nodes stop responding to any kind of RPC call. When querying such a node in this state via curl we just get a connection refused error back.

It happens more frequently on nodes that have a high volume of RPC calls. We were able to somewhat reliable reproduce the issue using the following stress test script (the RPC errors usually start to occur after ~1-2 hours):

const Web3 = require("web3");
let web3 = new Web3("http://localhost:8545");

let i = 0;
function stress() {
  web3.eth.isSyncing((err, result) => {
    if (err) {
      console.log(err);
    }
    i++
    if (i % 1000 === 0) {
      console.log(i);
    }
  });
}

setInterval(stress, 10);

I suspect that the issue might be related to upgrading the jsonrpc module in #8181

@peterbitfly
Copy link
Author

Small update, I am unable to replicate the issue using parity v1.9.5 & the script posted above.

@tomusdrw tomusdrw self-assigned this Apr 25, 2018
@tomusdrw
Copy link
Collaborator

@ppratscher With 1.9.5 do you see some of the requests fail?

@peterbitfly
Copy link
Author

No, not a single request failed during the 3+ hours the test script was running with v1.9.5

@peterbitfly
Copy link
Author

peterbitfly commented Apr 25, 2018

Edit: With a more aggressive version of the test script I was able to get the request to fail also with v1.9.5 after a few minutes:

const Web3 = require("web3");
let web3 = new Web3("http://localhost:8545");

let i = 0;


function stress() {
  web3.eth.isSyncing((err, result) => {
    if (err) {
      console.log(err);
    }
    i++
    if (i % 1000 === 0) {
      console.log(i);
    }
  });
}

setInterval(stress, 1);
setInterval(stress, 1);

@tomusdrw
Copy link
Collaborator

I'm trying to reproduce it now. Would you mind testing if it also happens for you with --mode=offline?

@peterbitfly
Copy link
Author

Yes, it also happens when running the node with mode=offline

@tomusdrw
Copy link
Collaborator

tomusdrw commented Apr 25, 2018

@ppratscher It's mostly web3 issue not us. Let me explain:

  1. Web3 is using XMLHttpRequest to create new request to the server
  2. For node.js they are using a polyfill: https://github.com/driverdan/node-XMLHttpRequest which creates a new connection for each request. What's more those connections have keep-alive set to true (so the server is not closing them right away)

So the issue is that the benchmark you provided pretty much uses up all possible TCP connections that you can make, since we are trying to process them as fast possible, but they still queue up. Check your dmesg|tail for messages like this:

[142027.415369] TCP: request_sock_TCP: Possible SYN flooding on port 8545. Sending cookies.  Check SNMP counters.

Client queue can be increased using:

$ sysctl -w net.ipv4.tcp_max_syn_backlog=2048

and server's queue with:

$ sysctl net.core.somaxconn=1024

Another (more long term solutions would be):

  1. Use a different provided for web3 (for instance implement one that re-uses existing TCP connections)
  2. Connect over IPC/WS instead of HTTP, this should also increase the amount of requests per second you can make (I was using parity with 20k req/s over IPC, while with HTTP it's at most 200 req/s)
  3. We could add --jsonrpc-no-keep-alive option to Parity to clean up connections faster.

@folsen
Copy link
Contributor

folsen commented May 21, 2018

Closing this issue since it is becoming stale and @tomusdrw has answered to the best of our abilities right now. @ppratscher if the answer isn't satisfactory please reopen.

@tomusdrw Please open a separate issue for --jsonrpc-no-keep-alive if you believe we should add that.

@folsen folsen closed this as completed May 21, 2018
@5chdn 5chdn added this to the 1.12 milestone Jun 7, 2018
@5chdn 5chdn added M6-rpcapi 📣 RPC API. Z3-stale 🍃 Issue is in principle valid, but it is not relevant anymore or can not reproduced. labels Jun 13, 2018
@stone212
Copy link

@tomusdrw Can you confirm that this ticket explains what is still the behavior of Parity 2.0.6? Specifically does Parity now limit TCP connections to :8545 to 1024? I am having the same problem that @ppratscher reported and sysctl net.core.somaxconn=1024 and sysctl -w net.ipv4.tcp_max_syn_backlog=2048 did not change anything (even after reloading sysctl and verifying in proc, etc.)

The output of netstat -anp | grep :8545 | grep ESTABLISHED | wc -l is still 1024.

If you can confirm that Parity does not impose a limit on RCP connections then I will look elsewhere for my error. --jsonrpc-no-keep-alive would be a nice de-bugging tool if the answer to this question is that you confirm no limits set by Parity.

@tomusdrw
Copy link
Collaborator

@stone212 no such limitation in Parity HTTP RPC as of version 1.8+ I believe. As far as I can tell the issue described here was only caused by system configuration. Can you post more detailed logs of what errors you are running into? Also maybe check dmesg for some suspicious entries.

@stone212
Copy link

@tomusdrw Thank you for confirming. It is looking like our problem is completely unrelated. This was just a symptom. The problem is that Parity stops listening on RPC and there are many possible reasons, starting with systemd, bad toml options, etc. If I find an actual Parity issue I'll open a ticket and post details. Thanks again.

@tomusdrw
Copy link
Collaborator

@stone212 Maybe related #9102 ?

@uluhonolulu
Copy link

+1 for implementing --jsonrpc-no-keep-alive, since we're also flooded with web3's keep-alive connections that just won't close.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
M6-rpcapi 📣 RPC API. Z3-stale 🍃 Issue is in principle valid, but it is not relevant anymore or can not reproduced.
Projects
None yet
Development

No branches or pull requests

6 participants