-
Notifications
You must be signed in to change notification settings - Fork 1.7k
parity crashes with too many open files #8813
Comments
Please upgrade as this has been solved. Duplicates #8123 |
@Tbaut (cc @5chdn) - recarding this being solved, apparently not. Just noticed this: (from about 20 min ago, coincidentally only 10min before downgrading to 1.10.6 as per #8818) (this occurred in v1.11.3) (note: using default of 1024 (via
|
Can you tell us more regarding what you are doing with these nodes, querying the blockchain over WS according to your config? How heavy is the load? |
@Tbaut sure The purpose of the nodes is arbitrary public historical access. Technically we only need this from certain dates and on certain (somewhat unpredictable) contracts, but I don't think there's any option besides running a full archive node atm. This is to support our voting platform (most code is public atm at @secure-vote). We'd use infura or something like that except they don't provide historical access. We run 3 of these nodes (one in US, one in EU, and one in AUS). The machines are all i3.2xLarge (I think, they have 60GB of RAM and 1.8 TB ssds) These nodes don't need to deal with any local transactions, though the ability to broadcast transactions is useful (just that we don't want them being local) Some of the issues we were having recently:
We'll be using websockets soon - I just need to configure everything in AWS with loadbalancers I've also taken some steps on the aws side to prevent any direct connections to the nodes RPC port (which is, locally, 38545) - only the load balancer and localhost can connect directly now. The load is not heavy from our side yet, but was very heavy when the nodes were being abused. Let me know if there's anymore specific info that would help you. Very much appreciate the Parity client - it's been much better for us than Geth (which even on 1.8 - or whatever the latest version is) is much slower in archive mode. Also, as part of the above mitigation the nodes are now 1.10.6 (mentioned in #8818) |
Since upgrading to 1.11.3 I have had similar error out problems. Happening on two 1.11.3 production nodes - both serving mining functions. Peers set to 300 / 500. The peer boundary was lowered from 1500 to 500 as anything over 500 peers and this problem happened much faster. 2018-06-08 09:22:22 WARN jsonrpc_http_server Incoming streams error, closing sever: Os { code: 24, kind: Other, message: "Too many open files" } This was never a problem prior to the 1.11 build. I suspect this has to do with the parallel transaction processing? Rolling back solves this problem for me. I have temporarily increased the open file limit on this systemd process to see if that helps - that seems more of a band-aid though, as there was never an open file boundary issue before. |
Increasing open file limit from 4,092 to 500,000 has solved this for me for the moment. Overkill adjustment? Maybe. I adjusted the limit as a parameter in the systemd service file directly. It did not take with the ulimit console function. |
Having the same issue with nightly across different machines. |
@XertroV @Njcrypto can you confirm this is fixed (or not) with 1.11.4? |
@5chdn its not going to be that easy for me to check now. I think the original reasons we were getting this error is the "abuse" we were having via publicly accessible eth_sendRawTransaction (relating to #8820). I've made two (workaround) mitigations since then: upping the ulimit and cutting off an "easy way" they were sending RPC messages. I'd have to undo those and try and find some spammers :P. If anyone writes a script (or knows a way) that can spam eth_sendRawTransaction then I can run that against 1.11.4 Even if spam can crash the node, it should be able to be worked around with the new |
It has solved the open file issue for me, #8974 keeps it from being a production client though. More on that in that thread. |
Thanks |
Ever since upgrading to 1.11.5 parity seems to utilize a large number of open files and I have run into this issue even with an open file limit of 64000. An old v1.9 node running since more than 2 weeks only has 635 open files (output of A new 1.11.5 node running only for a day has already over 30000 open files. 30292 of them are from the type The node has medium rpc load (a few requests per second) and is used for mining. |
My four 1.11.5 nodes have 875, 872, 748, 621 open files. All nodes have been running 1.11.5 with > 5 days uptime. All nodes are mining. I was a victim of the 1.11.4 open file issue, and it has been solved for me under my specific configuration. |
Can you tripple-check your version string? @ppratscher |
Yeah, it could be the case that the binary on the affected node was updated but has not been restarted. We will continue to monitor the situation and reply in case it happens again. |
Node regularly crashes due to "too many open files"
Going to bump up ulimit and upgrade nodes - will let you know if this issue persists.
(Reporting because the logs tell me to)
Note: although we're having issues with the 1.10.4 node still, I can't find the same crash occurring.
One thing you'll note: (here's a sample of two lines in the logs
The node does nothing for like 75 minutes after the crash occurs. During this time the node is running at 100% cpu on one core and using like 1-2GB ram.
This is meant to be a production node! (Like we're using it - as a business - in production) (Side note: I've had a worse time with geth, so don't feel too bad)
It's also a full archive node.
Some machine stats:
(this is from a machine running 1.10.4, but the machines are the same - it's also sync'd)
(from the node that was crashing) - basically identical to other prod nodes
config:
(also note auto update doesn't seem to be working.. though a few months ago I deleted the download cache folder b/c we were having issues on the beta track)
log samples of crash
The text was updated successfully, but these errors were encountered: