-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine ID: Problems with the token
join method
#26885
Labels
c-ip
Internal Customer Reference
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
machine-id
Comments
strideynet
added
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
machine-id
labels
May 25, 2023
strideynet
changed the title
Machine ID: Problems with the
Machine ID: Problems with the May 25, 2023
token
join methodtoken
join method [wip]
strideynet
changed the title
Machine ID: Problems with the
Machine ID: Problems with the May 25, 2023
token
join method [wip]token
join method
So in terms of my personal thoughts on this, I'm attracted to the first option "Allow multiple bot instances to join to one bot user". This is for a number of reasons:
I do have a few doubts on it though:
|
Closing this in favour of the following tickets which have planned an explicit solution for this: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
c-ip
Internal Customer Reference
feature-request
Used for new features in Teleport, improvements to current should be #enhancements
machine-id
The current
token
join method, which is mostly used by customers with environments not served by delegated joining, has a number of problems that make it awkward to use. These problems are significantly worse in ephemeral environments like CI/CD. These problems combined mean customers are often implementing Machine ID in less than ideal ways.Background
Before a bot can interact with the Teleport cluster, it must perform an initial authentication. We refer to this initial authentication as "joining" or "registering". The way this initial authentication is completed is known as the "join method". There are two main classes of join method today for bots:
token
join methodThe first method introduced was
token
, where aProvisionToken
resource is created withspec.join_method == "token"
. In this case, the only thing the bot needs to join is the name of theProvisionToken
resource - this makes the name of this resource a secret.This presented a significant security risk, as if this secret was leaked, a bad actor could connect their own instance of
tbot
anywhere and would have access to all the resources the bot had access to. To mitigate this risk, when thetoken
join method is used, theProvisionToken
is consumed and cannot be used again.As the bot is issued short-lived certificates, this presents a challenge in how the bot is able to renew its identity. To work-around this, certificates generated when joining with the
token
join method are marked as renewable. This allows that certificate to be used in a process known as renewal to request a certificate with an expiry date further in the future. In an attempt mitigate to mitigate the danger of a stolen certificate, we use a "generation counter". This stores a generation count on the bot user and within the issued certificates which is incremented on each renewal. If a mismatch is detected when the bot attempts to renew, it's possible that the certificate has been stolen and a bad actor is trying to renew it so the bot is locked out.Delegated join methods
Later, Delegated join methods were introduced. Delegated Joining makes use of an identity issued to that workload by another identity provider (e.g the platform it is running on) in order to join. Delegated joining (e.g IAM, GitHub etc) currently offers the following benefits:
tbot
can use the sameProvisionToken
to join and be linked to the same bot user. This makes them ideal for ephemeral environments like CI/CD.Unfortunately, many workloads run on platforms where Delegated Joining is not possible (this is especially true in on-premises environments) - or on platforms we have not yet added support for the third party identity provider.
Problems with
token
join methodThe current implementation of the
token
join method a described in the background presents a number of problems.Single bot instance per bot user
Some customers have many bot instances that need the same privileges as one another. Currently, they need to create a bot user and join token for each of these instances. If they need to grant access to another role to their bots, they have to do this for each bot user. At scale, this becomes difficult to manage.
This becomes even more painful in ephemeral environments as once an instance spins down, the bot user created for that instance will not be deleted.
Workarounds
In some cases like this, customers have been running a single bot instance which pushes the output credentials into a single secret store that all consumers of a Teleport identity can pull from. This creates a SPOF on the single bot instance as well as on the secret store. It also discourages users from using more tightly scoped certificates.
Some more advanced customers may find themselves building an automation using the Teleport API to manage the creation of bot users and join tokens, and then providing this join token to the bot instance on creation.
Single use join tokens
Currently, each join token can only be used once. This is inconvenient in environments where many bot instances need to be stood up or where the bot instances are stood up in response to some trigger and are ephemeral (e.g CI/CD).
Workarounds
The same workarounds presented in "Single bot instance per bot user" can be applied here.
Potential solutions
Allow multiple bot instances to join to one bot user
Upon joining, the certificate would also be encoded with a bot instance ID extension as well as the generation counter extension. This generation counter would be specific to that bot instance ID. This ID could be a randomly generated UUID upon joining.
The generation attribute on the bot user would be replaced with a map of
bot-instance-id
to generation count, or, we would introduce an entirely new resource, the "bot instance" to hold this generation counter. This could be extended in future to include other information about that bot instance (e.g the time it last renewed, acting as a form of a heartbeat).We could also choose to remove, or make optional, the behaviour of the
ProvisionToken
being consumed by the join. My gut feeling is that for backwards compatibility reasons we'd want to make this optional, perhaps with an additional field calledallowMultipleBotJoins
.Key characteristics:
ProvisionToken
would not affect already joined bots - this is not consistent with how delegated joining bots behave. This means we can continue to support the immediate disposal of these long-lived secrets upon use.Additional complexities:
Introduce non-renewable re-usable
token
join methodIntroduce a new field to control this behaviour, or a new join method name. When set:
ProvisionToken
is not consumed on useProvisionToken
to join again, in a similar fashion to how delegated joining works.Key characteristics:
ProvisionToken
would lead to bots being unable to renew - this is consistent with how delegated joining bots behave.The text was updated successfully, but these errors were encountered: