Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect: connect CA Roots in secondary datacenters should use a SigningKeyID derived from their local intermediate #6513

Merged
merged 2 commits into from
Sep 26, 2019

Conversation

rboyer
Copy link
Member

@rboyer rboyer commented Sep 20, 2019

Fixes #6507

This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.

  • correct previously stored CARoot objects?
  • tests

@rboyer rboyer added this to the 1.6.x milestone Sep 20, 2019
@rboyer rboyer requested a review from a team September 20, 2019 15:36
@rboyer rboyer self-assigned this Sep 20, 2019
@rboyer rboyer added the theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies label Sep 20, 2019
Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Might have been a little preemptive with the review as you have a couple todos for tests and some other things.

The changes look great and make total sense. Is there a reasonable way to unit test this? Something maybe that validates the root rotation logic no longer fires on intermediate certificates.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this thought inline?

agent/cache-types/connect_ca_leaf.go Outdated Show resolved Hide resolved
// the root changed.
// authorityKeyId is the key ID of the CA (intermediate) root that signed
// the current cert. This is just to save parsing the whole cert everytime
// we have to check if the root changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// we have to check if the root changed.
// we have to check if the signing key changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 actually I'm not sure this is quite right.

The only reason we care about rotating the certificate in this case is if the actual root changes. If for some reason a new intermediate was generated but for the same root, we don't need to rotate certs signed by the old intermediate because the will still validate against the same root cert.

We only care about rotating if the actual root key is different.

So this was kinda more correct before, but I think we misguidedly built the logic to look at the (intermediate) signing key assuming that was the right thing to do for intermediates and so broke this assumption.

So I think this PR is more correct than before it because it at least behaves well and has the same set of assumptions in the logic and the data we are propagating.

But actually in some sense it's not necessary to rotate if only the intermediate is changing and it would be more correct to only compare against the actual root key even when an intermediate is being used.

I think that boils down to SigningKeyID being misnamed in the CARoot since the thing we care about it more like RootKeyID.

What do you think? This is better than no change for sure but should we go back and make this all correct and optimal instead? Or am I missing a reason you would need to rotate if intermediate changed too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To play devil's advocate... the only time intermediates should change in Connect is if roots do too currently which possibly makes this argument moot on an optimality point of view...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to rotate the cert even when the intermediate changes. This could use verification but I am pretty sure we have to push both the root and intermediate cert into Envoy for verification. It has to have the whole chain and if the leaf cert was signed by an unknown intermediate (an old one that we have since regenerated) then I don't think the certs will validate and connections will be dropped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So unless there's something non obvious here, I can't see a field on a leaf cert that expresses a link back to the primary root, only a link to the intermediate that signed it.

We could walk the linked list of intermediates back to the primary root in the leaf caching code and compare the leaf.SigningKeyID->intermediate, intermediate->root, root=?=prevRoot but that would require unpacking the whole chain of certs every time the leaf cache Fetch is called.

But as you said given that intermediates rotate only when the primary root rotates (or possibly in rare other circumstances), we can just rely upon waiting until the intermediates rotate first because they detect that the primary root rotated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeeler the intermediate is delivered by the client Envoy along with the leaf. As long as the intermediate is still valid (i.e. not revoked and we don't have revocation checking yet..) then it's fine as long as it was signed by the same root as the current intermediate.

That said, this will be more important if we allow revocation of intermediates in future.

So unless there's something non obvious here, I can't see a field on a leaf cert that expresses a link back to the primary root, only a link to the intermediate that signed it.

Yep that is the root (pardon the pun) of the confusion here - it's my mistake as I thought the signing key was the important thing but then only verified against the root. We could instead have store the key ID of the root in that struct and just compared that to the current root instead.

That said on the basis of the future-proofing against intermediate revocation and my indication that it makes not actual difference currently I think this is good as it is.

@rboyer rboyer requested review from banks and mkeeler September 23, 2019 15:26
@rboyer
Copy link
Member Author

rboyer commented Sep 23, 2019

I've updated this to include tests and cleanup logic to fix previously persisted CARoot objects in the state store during secondary init after upgrade.

@rboyer rboyer force-pushed the stop-hex-encoding-connect-tls-fields branch from 0665f40 to 02dcbb1 Compare September 23, 2019 16:28
@rboyer rboyer force-pushed the secondary-leaf-churn branch from a33f90b to 00453e3 Compare September 23, 2019 16:28
@rboyer rboyer force-pushed the stop-hex-encoding-connect-tls-fields branch from 02dcbb1 to e2a1c3f Compare September 23, 2019 17:06
@rboyer rboyer force-pushed the secondary-leaf-churn branch from 00453e3 to 6612a78 Compare September 23, 2019 17:06
…ngKeyID derived from their local intermediate

This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.
@rboyer rboyer force-pushed the secondary-leaf-churn branch from 5be8002 to e9b3e66 Compare September 23, 2019 17:53
@rboyer rboyer changed the base branch from stop-hex-encoding-connect-tls-fields to master September 23, 2019 17:53
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@rboyer rboyer merged commit c4b92d5 into master Sep 26, 2019
rboyer added a commit that referenced this pull request Sep 26, 2019
…ngKeyID derived from their local intermediate (#6513)

This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.
@rboyer rboyer deleted the secondary-leaf-churn branch September 26, 2019 16:55
rboyer added a commit that referenced this pull request Dec 17, 2020
…ingKeyID derived from their local intermediate

This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513.
rboyer added a commit that referenced this pull request Jan 7, 2021
…ingKeyID derived from their local intermediate

This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513.
rboyer added a commit that referenced this pull request Jan 25, 2021
…ingKeyID derived from their local intermediate

This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513.
rboyer added a commit that referenced this pull request Feb 8, 2021
…ingKeyID derived from their local intermediate

This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513.
rboyer added a commit that referenced this pull request Feb 8, 2021
…ingKeyID derived from their local intermediate (#9428)

This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Envoy xds.clusters generation issue with federated Consul DCs
3 participants