Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting of SecretID and AccessorID on ACL tokens on Consul 1.4+ #4977

Closed
jdelic opened this issue Nov 19, 2018 · 18 comments · Fixed by #5307
Closed

Allow setting of SecretID and AccessorID on ACL tokens on Consul 1.4+ #4977

jdelic opened this issue Nov 19, 2018 · 18 comments · Fixed by #5307
Assignees
Labels
theme/acls ACL and token generation type/enhancement Proposed improvement or new feature
Milestone

Comments

@jdelic
Copy link

jdelic commented Nov 19, 2018

Feature Description

Pre v1.4 the token value of an ACL token could be specified by the client. This allowed for a simple workflow where my SaltStack masters would generate token UUIDs (securely) for every minion as they came up and later configure their ACLs. Those UUIDs were then easy to push to the minion even when the consul cluster was not yet available.

Post v1.4 the master needs to create a token for the client, saving its SecretID and AccessorID. Then the SecretID must be pushed to the minion.

This creates a chicken/egg problem where Consul must exist for new tokens to be created. Bringing up the consul cluster must now be done in two stages as tokens can't be pregenerated.

Use Case(s)

I would like to determine SecretID and possibly AccessorID values, thereby enabling completely automated cluster setups. Whether I read 128 bits from /dev/urandom to create a hard-to-guess UUID or Consul does it shouldn't matter security-wise. It would also be nice to be able to create other "well-known" SecretIDs and AccessorIDs like anonymous for my local cluster.

@jdelic jdelic changed the title Allow setting of SecretID on ACL tokens on Consul 1.4+ Allow setting of SecretID and AccessorID on ACL tokens on Consul 1.4+ Nov 19, 2018
@jdelic
Copy link
Author

jdelic commented Nov 19, 2018

Thinking about policies... it might also be worthwhile to allow policy AccessorIDs to be deterministic as some policy API endpoints can't (unike the Policies stanza on ACL tokens) resolve policy Names.

jdelic added a commit to jdelic/dynamicsecrets that referenced this issue Dec 1, 2018
…ng a way to detect when consul is not available to we can create a temporary ACL from Salt until the Salt master has a Consul server available to create actual ACL tokens
jdelic added a commit to jdelic/saltshaker that referenced this issue Dec 2, 2018
…ACL not registered with dynamicsecrets to solve the chicken/egg problem until the Salt master has a Consul server available to create actual ACL tokens. This is annoying, complex, brittle, but necessary.
@pearkes pearkes added the needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release label Dec 12, 2018
@jdelic
Copy link
Author

jdelic commented Dec 17, 2018

Since @pearkes added the needs-discussion tag, I thought I'd provide a more detailed write-up about what the problem is here.

Essentially, up to consul 1.3 the following workflow was possible:
consul-acl-13

In essence: a configuration management system that was able to create secure random identifiers was able to provision a consul cluster and agents in parallel. It didn't matter whether the consul cluster came up before or after the configuration management server.

With consul 1.4 this changed. The current process we're using looks like this:
consul-acl-14

Here we need to solve the chicken/egg problem of having a consul server available to be able to create ACLs.

  • So on the first configuration management pass we create a temporary ACL that is not persisted.
  • We use this temp ACL to set up service discovery for all subsequent services and all minions that are coming up in parallel.
  • On a second run, the configuration management system detects that the temp ACLs have not been persisted and calls the Consul API to create new ACL tokens that are then actually persisted.
  • If the consul server cluster ever reboots or is recreated, we have to query the ACL tokens through the API and when we learn that they are invalid, we create new ones and propagate them as necessary through the configuration management system.

An implementation of this process is now in my dynamicsecrets Saltstack plugin and salt configuration. It seems pretty brittle to me. I had to add a lot of states that block the configuration management system and use repeated polling on Consul's HTTP API to wait for ACLs to become available on the minions so that Consul can answer DNS API queries for service discovery so that subsequent states can find their dependencies.

I don't think forcing the creation of ACL tokens through consul leads to any security improvement here, but it makes automation a lot harder than it has to be.

@mkeeler mkeeler added the thinking More time is needed to research by the Consul Contributors label Jan 4, 2019
@mkeeler
Copy link
Member

mkeeler commented Jan 4, 2019

@jdelic Originally it was a specific goal of mine to not have those fields be user-modifiable. However I can see how that doesn't play so nice with config management and orchestrators.

Is it safe to assume that the problem really only exists at the time of initial token creation and that once created you wouldn't need to modify the accessor and secret ids?

@jdelic
Copy link
Author

jdelic commented Jan 9, 2019

@mkeeler
sorry for the late answer, I was away on Christmas break trying to have little internet contact ;).

You are right, at least in my case, setting accessor ids and secret ids at creation time would be completely sufficient.

@hatifnatt
Copy link

hatifnatt commented Jan 30, 2019

Facing exactly same chicken/egg problem in process of creating my own Consul installation and configuration state for SaltStack. First of all my thoughts was that only way to get "root / master / bootstrap" token is to create it with consul acl bootstrap but than I found that I can simply explicitly declare master token in server config file. But than, once again, I found that I still need to create agent token and write it to config file, otherwise server can't update information about himself with error agent: Coordinate update blocked by ACLs

I think ability to provide SecretID and probably AccessorID too, while creating new token is "must have" feature.

@lmb
Copy link

lmb commented Jan 31, 2019

We use a system similar to @jdelic at the moment, except with the twist that we have a horrible bash script which creates secrets via the deprecated ACL API, and then migrates them. We're in the unusual situation of deploying more than a hundred clusters, which we want to have uniform ACLs. Being able to specify secret IDs at creation time would make this a lot simpler.

@mkeeler mkeeler added type/enhancement Proposed improvement or new feature and removed needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release labels Jan 31, 2019
@mkeeler mkeeler added this to the 1.4.3 milestone Feb 1, 2019
@mkeeler mkeeler added theme/acls ACL and token generation and removed thinking More time is needed to research by the Consul Contributors labels Feb 1, 2019
@mkeeler mkeeler modified the milestones: 1.4.3, 1.4.4 Feb 28, 2019
@hanshasselberg hanshasselberg modified the milestones: 1.4.4, 1.5 Mar 19, 2019
@mkeeler mkeeler modified the milestones: 1.5.0, Upcoming Apr 16, 2019
@joewilliams
Copy link

I am running into this same sort of issue just doing some basic automation with bash scripts. Decoupling the uuid generation and the configuration of the token would make this work a lot more straight forward.

@mkeeler
Copy link
Member

mkeeler commented Apr 24, 2019

To give some updates here:

I have a PR open that would allow setting the various ids of tokens and policies. However there are some issues I ran into that I don't have a good solution for yet.

Setting token ids should be safe enough as no other parts of the ACL system reference those token IDs.

Setting policy ids is a little more of a foot gun. In the happy path you are only setting IDs like that when they are truly unique and have never been used in the cluster before. However it would be possible to create a policy, link some tokens to that policy, delete the policy and recreate with the same ID but different permissions. In that scenario all the existing tokens would inherit the new policy. This is mostly due to the original assumption that we could lazily handle policy deletions and not update all the tokens that reference them when a policy is deleted. In some ways this sounds like a cool feature to restore accidentally deleted policies but my fear is that instead it will result in accidentally granting privileges to tokens that you do not intend to.

I am thinking a decent middle ground to solve these problems and not introduce any new ones would be to:

  1. Allow setting the ids of tokens.
  2. Provide some secondary endpoints to manipulate policies by name.

I think this would be sufficient for all the use-cases as detailed in this issue.

@mkeeler
Copy link
Member

mkeeler commented Apr 24, 2019

Additionally it might help some to know that the Consul terraform provider recently gained the ability to manage acl tokens and policies. For automation purposes that might be a route to consider.

@joewilliams
Copy link

Thanks @mkeeler! I think that sounds right. If I can supply the tokens via configuration files for agents, services, etc and then afterwards associate those tokens with policies I think that would do the trick. I think the workflow would be something like:

  1. Generate a token using uuid or something
  2. Create service_abc.json that includes the token
  3. Create a policy via consul acl policy create -existing-token=123-abc -rules @service_abc_policy.hcl where -existing-token=123-abc is the switch to include the aforementioned token and associate it with the policy

This would eliminate the need to consul acl token create and rewriting the service_abc.json with the output.

Ideally I would be able to do the same with the agent tokens, setting master, default, etc via configuration file and then associating them with the appropriate policy via consul acl policy create. Rather than doing the create policy, create token and feed the output to set-agent-token, then rewriting the config if token persistency isn't turned on.

Does this make sense?

@mkeeler
Copy link
Member

mkeeler commented Apr 25, 2019

It mostly makes sense except maybe the part about wanting to create a policy and link it to tokens in one pass.

You would have to create the token, then create the policy and do the linking. Why not just create the policy first then create the token specifying that policy.

I am envisioning the CLI flow to be:

  1. Generate some token IDs using uuidgen or something similar and pre-populate configuration files with the secrets
  2. Spin up some infrastructure utilizing the consul tokens (that do not exist yet)
  3. consul acl bootstrap
  4. consul acl policy create -name service-abc -rules @service_abc_policy.hcl
  5. consul acl token create -accessor <accessor id> -secret <secret id> -policy-name service-abc - Specifying the accessor might not be necessary for you as only the secret is needed to put in config files for usage by the consul agents.

After step 5 those tokens which previously were not working should now start working correctly. The caveat is that the negative token resolution responses may be cached on the Consul clients (in all datacenters) and Consul servers (in non-primary datacenters if token replication is disabled). So if you distributed some token secret to something like consul-template or another tool and pointed it at the HTTP API then it could take up to the token_ttl time (which is configurable) for the now created token to be available.

@joewilliams
Copy link

Got it, I think this sounds good to me @mkeeler.

@scalp42
Copy link
Contributor

scalp42 commented May 4, 2019

Running into the same issue here with Chef. We'd like to be able to have the tokens created in advance, encrypted with KMS and stored in cookbook attributes and finally decrypted and passed to /acl/token endpoint to be created within Consul.

@mkeeler mkeeler modified the milestones: Upcoming, 1.5.0 May 4, 2019
@mkeeler
Copy link
Member

mkeeler commented May 4, 2019

This will be part of our next release (1.5.0)

@matthoey-okta
Copy link
Contributor

@mkeeler qq sort of on topic. You mention full automation, but the bootstrap is still a step, correct? Or is that just automated somehow else in salt?

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this issue May 11, 2019
SECURITY:

* connect: Envoy versions lower than 1.9.1 are vulnerable to CVE-2019-9900
  and CVE-2019-9901. Both are related to HTTP request parsing and so only
  affect Consul Connect users if they have configured HTTP routing rules
  via the "escape hatch". We recommend Envoy 1.9.1 be used. Note that while
  we officially deprecate support for older version of Envoy in 1.5.0, we
  recommend using Envoy 1.9.1 with all previous versions of Consul Connect
  too (back to 1.3.0 where Envoy support was introduced).

BREAKING CHANGES:

* /watch: (note this only affects downstream programs importing /watch
  package as a library not the watch feature in Consul) The watch package
  was moved from github.com/hashicorp/consul/watch to
  github.com/hashicorp/consul/api/watch to live in the API module. This was
  necessary after updating the repo to use Go modules or else various other
  bugs cropped up. The watch package API has not changed so projects
  depending on it should need to only update the import statement to get
  their code functioning again. [GH-5664]
* ui: Legacy UI has been removed. Setting the CONSUL_UI_LEGACY environment
  variable to 1 or true will no longer revert to serving the old
  UI. [GH-5643]

FEATURES:

* Connect Envoy Supports L7 Observability: We introduce features that allow
  configuring Envoy sidecars to emit metrics and tracing at L7 (http,
  http2, grpc supported). For more information see the Envoy Integration
  docs.
* Centralized Configuration: Enables central configuration of some service
  and proxy defaults. For more information see the Configuration Entries
  docs
* api: Implement data filtering for some endpoints using a new filtering
  language. [GH-5579]
* snapshot agent (Consul Enterprise): Added support for saving snapshots to
  Azure Blob Storage.
* acl: tokens can be created with an optional expiration time [GH-5353]
* acl: tokens can now be assigned an optional set of service identities
  [GH-5390]
* acl: tokens can now be assigned to roles [GH-5514]
* acl: adding support for kubernetes auth provider login [GH-5600]
* ui: Template-able Dashboard links for Service detail pages [GH-5704]
  [GH-5777]
* ui: support for ACL Roles [GH-5635]

IMPROVEMENTS:

* cli: allow to add ip addresses as Subject Alternative Names when creating
  certificates with consul tls cert create [GH-5602]
* dns: Allow for hot-reload of many DNS configurations. [GH-4875]
* agent: config is now read if json or hcl is set as the config-format or
  the extension is either json or hcl [GH-5723]
* acl: Allow setting token accessor ids and secret ids during token
  creation. [[GH-4977] (hashicorp/consul#4977)]
* ui: Service Instances page redesign and further visibility of Connect
  Proxies [GH-5326]
* ui: Blocking Query support / live updates for Services and Nodes,
  requires enabling per user via the UI Settings area [GH-5070] [GH-5267]
* ui: Finer grained searching for the Service listing page [GH-5507]
* ui: Add proxy icons to proxy services and instances where appropriate
  [GH-5463]

BUG FIXES:

* api: fix panic in 'consul acl set-agent-token' [GH-5533]
* api: fix issue in the transaction API where the health check definition
  struct wasn't being deserialized properly [GH-5553]
* acl: memdb filter of tokens-by-policy was inverted [GH-5575]
* acl: Fix legacy rules translation for JSON based rules. [GH-5493]
* agent: Fixed a bug causing RPC errors when the discovery_max_stale time
  was exceeded. [GH-4673]
* agent: Fix an issue with registering health checks for an agent service
  where the service name would be missing. [GH-5705]
* connect: fix an issue where Envoy would fail to bootstrap if some
  upstreams were unavailable [GH-5499]
* connect: fix an issue where health checks on proxies might be missed by
  watchers of /health/service/:service API [GH-5506]
* connect: fix a race condition that could leave proxies with no
  configuration for long periods on startup [GH-5793]
* logger: fix an issue where the log-file option was not respecting the
  log-level [GH-4778]
* catalog: fix an issue where renaming nodes could cause registration
  instability [GH-5518]
* network areas (Consul Enterprise): Fixed an issue that could cause a lock
  to be held unnecessarily causing other operations to hang.
@jdelic
Copy link
Author

jdelic commented May 11, 2019

@matthoey-okta
I'm the guy with the salt config. :)

Consul 1.4.x still allowed the initial master token to be specified via the config files and that's what I use to bootstrap the cluster in lieu of consul acl bootstrap

@mkeeler
Copy link
Member

mkeeler commented May 13, 2019

@matthoey-okta It was already mentioned but the acl.tokens.master configuration allows you to setup a token with that secret to be automatically bootstrapped when a particular server gains leadership of raft.

@jdelic
There is no reason I know of why we shouldn't keep that functionality around and no one has ever mentioned wanting it deprecated/removed. The same thing still works in v1.5.0.

Now that I am thinking about it, I probably should have modified the /v1/acl/bootstrap endpoint to allow setting the IDs too. That way you could create the privileged token with known IDs without having to ever store it in the servers config file. That is advantageous because if you put a master token in the config file and want to delete it in the future, it will require editing the server configurations and restarting them to prevent them from reinserting into raft.

@matthoey-okta
Copy link
Contributor

@mkeeler and @jdelic Thank you for the explanations. I didn't understand how it was possible to use that option as I figured you'd have to somehow store the master token in the datastore first before putting it in the file. It's interesting to note that when that server becomes the raft leader it would use that.

We currently use all random token secret ID's so automation and orchestration has been a challenge. I will look into these new features which would certainly make our lives easier. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/acls ACL and token generation type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants