-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise docker default ulimit for nofile to 65535 #278
Comments
@max-rocket-internet it is supposed to be 65535, and was originally but there was a series of PR mishaps such that several images had changes added that reduced that to 4096 or 8192. The fun started in #186 where someone thought the setting was lower and added a PR to ‘raise’ it to 8192. This actually reduced it from 65535 to 8192, which immediately caused problems (#193). People tried to revert that change in #206 but that didn’t work. Meanwhile a fix in #205 got closed in favor of #206. But #206 didn’t work because the latest commits weren’t being included in the AMI builds. So fresh builds in #233 tried to restore the #206 reversion of #186, while the ongoing issue was tracked in #234. In theory the current latest AMIs should be back to 65535. Any fixed versions were dated 31 March or later, as the problem still wasn’t fixed on 29 March. And even after that I heard GPU AMI’s still had the issue. #233 (comment) |
Hehe, thanks @whereisaaron for the comprehensive history write up of this issue! 👍 |
Haha thanks for the run down, @whereisaaron I've seen some of the previous issues around Elasticsearch but...
...is not true. We are running AMI version
But there is no released version later than the AMI we are using!? |
I can also confirm that this is not fixed in v20190327.
|
That's a very good question @max-rocket-internet! Seems like someone left and turned out the lights. No AMI's published for a while. |
OK I've changed the title of this issue to reflect a request to change the limit. Now, I'm curious how is the How about I make a PR to add a |
@mogren @micahhausler should I make a PR as I mentioned above? Or do you have something else in mind? |
We have been running into this error on EKS nodes:
The fix for us was to actually apply these changes in our userdata scripts where we bootstrap our EKS nodes in Terraform:
This overrides the |
Is this still an issue? The latest AMI version, currently The default systemd unit file for dockerd in [Service]
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity Validated by looking at
I plan to resolve this issue around July 10 if there is no confirmation that this is still an issue. @echoboomer the inotify limit seems like it's independent of this issue. Would you mind opening that in a new issue so we can track a fix for it outside of this one? Thanks. |
Looks resolve to me:
Let's hope this issue doesn't come back again 🙏 |
Would it be possible to raise it again to 82920 for TiKV? We're trying to run the TiDB database stack, but it requires higher ulimits. See pingcap/tidb-operator#299 for what I'm talking about - considering I've never run into ulimits like this in any other K8s providers, it shouldn't be set so low to prevent us from running applications. |
@max-rocket-internet sorry for reviving this discussion, but I am confused:
From reading the discussion history here those 2 different values were also mixed up I think. So this is not yet fixed, is it? |
@thjaeckle ulimit is user-based. Was the output of "ulimit -a" above run as ec2-user? |
I'm hitting this too trying to deploy tikv. |
Hi
|
We're seeing this issue again in the EKS-optimized GPU images.
|
@ajcann Update: container level setting on GPU AMI is still 2048:8192. basic AMI ulimit is 1048576 (we didn't specify ulimit, it inherits DAEMON_MAXFILES instead) |
Linux kernel put nofile under cgroup control, and they are kind of independent. We don't necessarily need to change host ulimit |
In the latest AMI version,
v20190327
, in the file/etc/sysconfig/docker
the file ulimit is set to4096
:We've already hit this limit with some java applications and have raised the limit to
65535
in user-data:Question: Isn't
4096
a little conservative for an EKS node? Is there anything wrong with just setting this to65535
by default in the AMI?The text was updated successfully, but these errors were encountered: