-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to use own certificate for metrics endpoint #10504
Conversation
e3c0bb2
to
112b962
Compare
Codecov Report
@@ Coverage Diff @@
## master #10504 +/- ##
==========================================
- Coverage 71.56% 71.52% -0.05%
==========================================
Files 393 393
Lines 36503 36525 +22
==========================================
Hits 26123 26123
- Misses 8546 8567 +21
- Partials 1834 1835 +1
Continue to review full report at Codecov.
|
/cc @brancz @jim-minter |
I lack expertise in the etcd code base to review this, but 👍 on the feature in general! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR!
tests/e2e/metrics_test.go
Outdated
@@ -57,3 +61,44 @@ func metricsTest(cx ctlCtx) { | |||
cx.t.Fatalf("failed get with curl (%v)", err) | |||
} | |||
} | |||
|
|||
func metricsTestCertAuth(cx ctlCtx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to unify this function with metricsTest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingyih Everything is possible. Now the test looks better?
@@ -763,6 +763,9 @@ func (e *Etcd) serveMetrics() (err error) { | |||
|
|||
for _, murl := range e.cfg.ListenMetricsUrls { | |||
tlsInfo := &e.cfg.ClientTLSInfo | |||
if !e.cfg.MetricsTLSInfo.Empty() { | |||
tlsInfo = &e.cfg.MetricsTLSInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we override tlsInfo, should we add a warning to log?
When MetricsTLSInfo is provided, client who has access to other APIs will not be able to access '/metrics' and '/health'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When MetricsTLSInfo is provided, client who has access to other APIs will not be able to access '/metrics' and '/health'?
Yes. Theoretically, we can make both work (add them to tls.Config.RootCAs
and to tls.Config.ClientCAs
), but it will be much more difficult solution because it requires to extend transport.TLSInfo
.
If we override tlsInfo, should we add a warning to log?
This is not a hidden behavior (the admin must pass the options explicitly). I am not sure what message do I need to write to log ?
You specified other certificates for metrics. Access by client certificates will not be possible
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a hidden behavior (the admin must pass the options explicitly). I am not sure what message do I need to write to log ?
I was thinking about the scenario where etcd clients with clientTLS (not metricsTLS) might try to access the metrics and health endpoint. Adding something like 'ignoring client certificates since metrics certificates given' would help them understand why the connection is rejected. I am not an expert in this area, please feel free to let me know if you think this is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingyih That's reasonable. I added the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, warning messages are often noted by administrators and automated systems which analyze log output for anomalies. The proposed metrics TLS configuration when used in conjunction with client TLS configuration is a valid state. So in that regard it's not clear to me a preemptive warning makes sense. It seems possible that when using both TLS configurations (again, a valid state), this warning might be interpreted as a false negative alert of some kind.
So, I would suggest we consider changing this to INFO level or omitting it entirely.
112b962
to
fa73854
Compare
/cc @wenjiaswe |
@jingyih looks good now ? |
@legionus Sorry for the delayed response. I will take a look later this week (hopefully on Wednesday). |
@@ -155,6 +156,8 @@ func cURLPrefixArgs(clus *etcdProcessCluster, method string, req cURLReq) []stri | |||
cmdArgs = append(cmdArgs, "--cacert", caPath, "--cert", certPath3, "--key", privateKeyPath3) | |||
} | |||
} | |||
} else if req.useCertAuth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having a hard time following the logic on how the command is generated. Does the if (line 145) make sense to you? For example, I tried to test TestV3MetricsSecure
, but the actual curl command in the test was curl -L http://localhost:20000/metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular test I mentioned above was not affected by the changes in your PR. But I want to make sure the tests are correct and reader-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having a hard time following the logic on how the command is generated.
@jingyih Me too. I had to add one more block to be able to use another certificate. It seems to me wrong that they are hardcoded in this function. On the other hand, they are pregenerated and their purpose is predefined. So maybe this makes sense.
Does the if (line 145) make sense to you?
No, it does not make sense to me. This code seems broken. It looks like someone tried to add the logic for secure connections, but failed.
For example, I tried to test
TestV3MetricsSecure
, but the actual curl command in the test wascurl -L http://localhost:20000/metrics
.
This is because this test does not test a secure connection :) The TestV3MetricsSecure
and TestV3MetricsInsecure
create custom cfg
, but don't use it:
etcd/tests/e2e/metrics_test.go
Lines 25 to 37 in 024f3df
func TestV3MetricsSecure(t *testing.T) { | |
cfg := configTLS | |
cfg.clusterSize = 1 | |
cfg.metricsURLScheme = "https" | |
testCtl(t, metricsTest) | |
} | |
func TestV3MetricsInsecure(t *testing.T) { | |
cfg := configTLS | |
cfg.clusterSize = 1 | |
cfg.metricsURLScheme = "http" | |
testCtl(t, metricsTest) | |
} |
To use custom config they must use withCfg()
. It means that these tests use the default config:
Lines 133 to 141 in 024f3df
func testCtl(t *testing.T, testFunc func(ctlCtx), opts ...ctlOption) { | |
defer testutil.AfterTest(t) | |
ret := ctlCtx{ | |
t: t, | |
cfg: configAutoTLS, | |
dialTimeout: 7 * time.Second, | |
} | |
ret.applyOpts(opts) |
As you can see there is no metricsURLScheme
definition here:
etcd/tests/e2e/cluster_test.go
Lines 42 to 47 in 024f3df
configAutoTLS = etcdProcessClusterConfig{ | |
clusterSize: 3, | |
isPeerTLS: true, | |
isPeerAutoTLS: true, | |
initialToken: "new", | |
} |
It means the cx.cfg.metricsURLScheme
will be empty as well:
etcd/tests/e2e/metrics_test.go
Line 43 in 024f3df
if err := cURLGet(cx.epc, cURLReq{endpoint: "/metrics", expected: `etcd_debugging_mvcc_keys_total 1`, metricsURLScheme: cx.cfg.metricsURLScheme}); err != nil { |
Your PR looks good to me. One option is that we get it merged first, and fix the tests in a separate PR. @hexfusion Any comment? |
@jingyih @hexfusion ping |
@legionus thanks for your patience. I will try to dig into this soon. |
/cc @hexfusion can you please take a look? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think this is still something that’s very desirable to have. Anyone continuing the work? :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
This would continue to be a valuable addition I think. |
@legionus do you intend to complete this? |
@ironcladlou I have already changed the scope of work, but I can complete this PR if someone finds this PR useful. |
Can you explain what you mean re: the scope change? I'm trying to decide how to proceed with #11993 and want to make sure my solution integrates with this work if the intent is to proceed as-is. I can't say whether there's demand for this one, as I haven't been involved. |
@ironcladlou I no longer develop openshift and etcd. Now I am involved in the development of completely different projects. |
Ah ha! Thanks! Okay, so as far as I can tell from the history of this PR all that's left is rebasing, but I can't speak to whether the demand for the change still exists. @brancz seems to say yes, but I'm not personally aware of anyone with use cases for it, so I'm neutral. Who can make the decision about whether this should proceed or not? |
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
…ndpoints Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
@tangcong, @ironcladlou Yes, I can. Done. |
@@ -211,6 +223,12 @@ func startGRPCProxy(cmd *cobra.Command, args []string) { | |||
go func() { errc <- srvhttp.Serve(httpl) }() | |||
go func() { errc <- m.Serve() }() | |||
if len(grpcProxyMetricsListenAddr) > 0 { | |||
if grpcProxyMetricsListenCert != "" && grpcProxyMetricsListenKey != "" { | |||
tlsinfo = newTLS(grpcProxyMetricsListenCA, grpcProxyMetricsListenCert, grpcProxyMetricsListenKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr #12114 adds health handler for grpcproxy self, if grpcProxyMetricsListenAddr uses own certificate for metrics endpoint, we should also update healthcheck client certs, otherwise, it is failed to access '/proxy/health' handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ironcladlou said For the health listener, honor global the global TLS configuration by default (for compatibility with today's behavior), but provide a new flag to allow control over health endpoint client certificate auth (e.g. --health-client-cert-auth=false).
in issue #11993.
i agrees with it. can we add a flag to access /health,/proxy/health without cert auth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr #12114 adds health handler for grpcproxy self, if grpcProxyMetricsListenAddr uses own certificate for metrics endpoint, we should also update healthcheck client certs, otherwise, it is failed to access '/proxy/health' handle.
I already mentioned this before. Therefore, when using separate certificates for metrics, a warning is displayed.
i agrees with it. can we add a flag to access /health,/proxy/health without cert auth?
Do you want the server to serve some endpoints without certificates? Theoretically, this is possible, but will require rewriting the server. To do this, you will need to make authorization optional and do it not at the TLS level. This is out of scope of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. thanks. I will try to add --health-client-cert-auth flag in another PR. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangcong Can this PR be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you add changelog and doc in another pr? thanks. @hexfusion can you take a final look? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangcong sorry for delay yes I will take a look.
/assign |
@hexfusion @tangcong, @ironcladlou Hi! Is there any news? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
As a admin I want to give the monitoring system access only to metrics without access to the rest of the API.