Support specifying SecurityContext for Pods and enable tcp keepalive for AWS #915

aylei · 2019-09-15T19:51:13Z

Signed-off-by: Aylei rayingecho@gmail.com

What problem does this PR solve?

close #880
close #795

What is changed and how does it work?

A new field podSecurityContext is introduced for TiKV/TiDB/PD's spec which can specify sysctls for Pods. Only the securityContext of TiDB is used now, but users can freely customize these fields as needed.
In terraform, enable configuration of net.* sysctls in kubelet args, and set proper defaults for AWS.

Check List

Tests

Manual test (add detailed scripts or steps below)

Tested upon AWS NLB with 350s idle timeout:

$ mysql -h <elb-host> -P 4000 -u root
MySQL [(none)]> select sleep(360); select tidb_version();
+------------+
| sleep(360) |
+------------+
|          0 |
+------------+
1 row in set (6 min 0.00 sec)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                                                              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v3.0.1
Git Commit Hash: 9e4e8da3c58c65123db5f26409759fe1847529f8
Git Branch: HEAD
UTC Build Time: 2019-07-16 01:03:40
GoVersion: go version go1.12 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
Check Table Before Drop: false |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Verify the sysctls are properly set:

kubectl -n my-cluster exec -it my-cluster-tidb-1 -c tidb -- sh
/ # sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 300
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 300

Code changes

Has Helm charts change
Has Go code change

Related changes

Need to cherry-pick to the release branch
Need to update the documentation

Does this PR introduce a user-facing change?:

Support specifying SecurityContext for PD, TiKV and TiDB Pods and enable tcp keepalive for AWS.

…elet in terraform Signed-off-by: Aylei <rayingecho@gmail.com>

aylei · 2019-09-15T19:53:17Z

deploy/modules/aws/tidb-cluster/values/default.yaml

+    - name: net.ipv4.tcp_keepalive_time
+      value: "300"
+    - name: net.ipv4.tcp_keepalive_intvl
+      value: "300"


send keepalive packet every 300s to survive the 350s fixed idle timeout of AWS NLB.

net.ipv4.tcp_keepalive_intvl defaults to 75 seconds, do you think it's necessary to increase it?

I have no preference actually, just want to make sure the heartbeat packet interval is less 350 despite the information from kernel compiling time.

I prefer not to change it. To prevent the connection from being closed by the load balancer which has shorter timeout, setting net.ipv4.tcp_keepalive_time is enough. net.ipv4.tcp_keepalive_intvl determines when the unresponding connection will be aborted. Increase it will increase the time the connection is kept on the server-side.

Good point! But net.ipv4.tcp_keepalive_intvl determines the interval of subsequent probes, so it should be less than 350s too. I would like to set it to 75s explicitly (as the well-known defaults), how do you think?

aylei · 2019-09-15T19:54:43Z

pkg/manager/member/tikv_member_manager.go

@@ -359,7 +359,7 @@ func (tkmm *tikvMemberManager) getNewSetForTidbCluster(tc *v1alpha1.TidbCluster)
 					SchedulerName: tc.Spec.SchedulerName,
 					Affinity:      tc.Spec.TiKV.Affinity,
 					NodeSelector:  tc.Spec.TiKV.NodeSelector,
-					HostNetwork:   tc.Spec.PD.HostNetwork,


This was a typo I suppose, is it?
@cofyc

yes, thanks!

pkg/apis/pingcap.com/v1alpha1/types.go

charts/tidb-cluster/values.yaml

Co-Authored-By: weekface <weekface@gmail.com>

aylei · 2019-09-16T03:28:53Z

/run-e2e-in-kind

Signed-off-by: Aylei <rayingecho@gmail.com>

aylei · 2019-09-16T08:55:06Z

/run-e2e-in-kind

cofyc · 2019-09-16T09:12:33Z

deploy/modules/aws/tidb-cluster/local.tf

        [
+          "--allowed-unsafe-sysctls=\\\"net.*\\\"",


is PodSecurityPolicy required? https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#podsecuritypolicy

No, by default all the sysctls are allowed by PodSecurityPolicy

pkg/apis/pingcap.com/v1alpha1/types.go

DanielZhangQD · 2019-09-17T02:42:02Z

deploy/modules/aws/tidb-cluster/values/default.yaml

+    sysctls:
+    - name: net.ipv4.tcp_keepalive_time
+      value: "300"
+    - name: net.ipv4.tcp_keepalive_intvl


Is it OK to add "net.core.somaxconn" here? It's 128 in container now.

Should we put these as default configs in values.yaml of tidb-cluster chart?

No, these configurations are specific to AWS.

For net.core.somaxconn, I think it's another problem and can be addressed in an separate issue.

BTW, what's the proper value of net.core.somaxconn?

32768

ref: https://pingcap.com/docs/v3.0/tidb-in-kubernetes/deploy/prerequisites/

The net.core.somaxconn is a general issue, so I think we can set this in the tidb-cluster chart values.yaml

not possible for now, it's marked unsafe because of kernel memory accounting issue, so it must be whitelisted via kubelet flag, otherwise the pod will fail to start

oh, I misunderstood here, it's ok to add in deploy/aws values.yaml file (what I meant is it cannot be set in charts/tidb-cluster default values.yaml file)

Most of the kernel parameters in the prerequisites document are namespaced, seems like we should configure the safe part for users by default and add document about how to configure these parameters via pod security context. Tracked in: #924

deploy/modules/aws/tidb-cluster/local.tf

charts/tidb-cluster/values.yaml

aylei · 2019-09-17T06:17:58Z

/run-e2e-in-kind

cofyc

LGTM (except deploy/aws which I'm not familiar with)

aylei · 2019-09-17T07:33:25Z

@tennix @DanielZhangQD PTAL again

tennix

LGTM

DanielZhangQD

LGTM

aylei · 2019-09-18T02:33:01Z

/run-e2e-in-kind

…nd enable net.* (#954) * Support configuring sysctls for Pods and enable net.* sysctls for kubelet in terraform Signed-off-by: Aylei <rayingecho@gmail.com> * Apply suggestions from code review Co-Authored-By: weekface <weekface@gmail.com> * Address review comments Signed-off-by: Aylei <rayingecho@gmail.com>

…upstream-release-1.0

…nd enable net.* (#1175) * Apply suggestions from code review Co-Authored-By: weekface <weekface@gmail.com> * Address review comments Signed-off-by: Aylei <rayingecho@gmail.com>

Support configuring sysctls for Pods and enable net.* sysctls for kub…

2fa1e2b

…elet in terraform Signed-off-by: Aylei <rayingecho@gmail.com>

aylei requested review from weekface, cofyc, xiaojingchen and tennix September 15, 2019 19:51

aylei commented Sep 15, 2019

View reviewed changes

cofyc reviewed Sep 16, 2019

View reviewed changes

pkg/apis/pingcap.com/v1alpha1/types.go Show resolved Hide resolved

weekface reviewed Sep 16, 2019

View reviewed changes

charts/tidb-cluster/values.yaml Outdated Show resolved Hide resolved

charts/tidb-cluster/values.yaml Outdated Show resolved Hide resolved

aylei and others added 2 commits September 16, 2019 11:24

Apply suggestions from code review

d81307d

Co-Authored-By: weekface <weekface@gmail.com>

Merge branch 'master' into issue-880

79219d1

cofyc mentioned this pull request Sep 16, 2019

dev, v3.0/tidb-in-kubernetes: tidb keep alive troubleshooting pingcap/docs-cn#1864

Merged

gregwebs requested a review from kolbe September 16, 2019 04:25

Address review comments

e2b3091

Signed-off-by: Aylei <rayingecho@gmail.com>

aylei requested review from weekface and cofyc September 16, 2019 08:55

weekface approved these changes Sep 16, 2019

View reviewed changes

cofyc reviewed Sep 16, 2019

View reviewed changes

tennix reviewed Sep 16, 2019

View reviewed changes

pkg/apis/pingcap.com/v1alpha1/types.go Show resolved Hide resolved

DanielZhangQD reviewed Sep 17, 2019

View reviewed changes

Merge branch 'master' into issue-880

f4e1fb3

cofyc reviewed Sep 17, 2019

View reviewed changes

aylei requested review from tennix and DanielZhangQD September 17, 2019 07:33

aylei mentioned this pull request Sep 17, 2019

Configure recommended namespaced kernel parameters by default #924

Closed

tennix approved these changes Sep 17, 2019

View reviewed changes

DanielZhangQD approved these changes Sep 18, 2019

View reviewed changes

Merge branch 'master' into issue-880

a82e0c9

aylei merged commit 1423093 into pingcap:master Sep 18, 2019

aylei mentioned this pull request Sep 25, 2019

Automated cherry pick of #915: Support configuring sysctls for Pods and enable net.* #954

Merged

This was referenced Oct 29, 2019

Unit test for PodSecurityContext of TidbCluster #1071

Closed

E2E cases for PodSecurityContext of TidbCluster #1072

Closed

Set podSecuriyContext to nil by default in favor of backward co… #1079

Merged

aylei mentioned this pull request Nov 18, 2019

Automated cherry pick of #915: Support configuring sysctls for Pods and enable net.* #1175

Merged

aylei added a commit to aylei/tidb-operator that referenced this pull request Nov 18, 2019

Merge branch 'release-1.0' into automated-cherry-pick-of-pingcap#915-…

e4a2a90

…upstream-release-1.0

aylei added a commit to aylei/tidb-operator that referenced this pull request Nov 18, 2019

Merge branch 'release-1.0' into automated-cherry-pick-of-pingcap#915-…

755e832

…upstream-release-1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support specifying SecurityContext for Pods and enable tcp keepalive for AWS #915

Support specifying SecurityContext for Pods and enable tcp keepalive for AWS #915

aylei commented Sep 15, 2019

aylei Sep 15, 2019

cofyc Sep 16, 2019

aylei Sep 16, 2019

cofyc Sep 16, 2019

aylei Sep 16, 2019

cofyc Sep 16, 2019

aylei Sep 15, 2019

cofyc Sep 16, 2019

aylei commented Sep 16, 2019

aylei commented Sep 16, 2019

cofyc Sep 16, 2019

aylei Sep 16, 2019

DanielZhangQD Sep 17, 2019

DanielZhangQD Sep 17, 2019

aylei Sep 17, 2019

aylei Sep 17, 2019

weekface Sep 17, 2019

tennix Sep 17, 2019

cofyc Sep 17, 2019 •

edited

Loading

cofyc Sep 17, 2019 •

edited

Loading

aylei Sep 17, 2019

aylei commented Sep 17, 2019

cofyc left a comment •

edited

Loading

aylei commented Sep 17, 2019

tennix left a comment

DanielZhangQD left a comment

aylei commented Sep 18, 2019

Support specifying SecurityContext for Pods and enable tcp keepalive for AWS #915

Support specifying SecurityContext for Pods and enable tcp keepalive for AWS #915

Conversation

aylei commented Sep 15, 2019

What problem does this PR solve?

What is changed and how does it work?

Check List

Does this PR introduce a user-facing change?:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aylei commented Sep 16, 2019

aylei commented Sep 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cofyc Sep 17, 2019 • edited Loading

Choose a reason for hiding this comment

cofyc Sep 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aylei commented Sep 17, 2019

cofyc left a comment • edited Loading

Choose a reason for hiding this comment

aylei commented Sep 17, 2019

tennix left a comment

Choose a reason for hiding this comment

DanielZhangQD left a comment

Choose a reason for hiding this comment

aylei commented Sep 18, 2019

cofyc Sep 17, 2019 •

edited

Loading

cofyc Sep 17, 2019 •

edited

Loading

cofyc left a comment •

edited

Loading