Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several items from Cloud Foundry usage of Consul #2343

Open
7 of 13 tasks
Amit-PivotalLabs opened this issue Sep 16, 2016 · 7 comments
Open
7 of 13 tasks

Several items from Cloud Foundry usage of Consul #2343

Amit-PivotalLabs opened this issue Sep 16, 2016 · 7 comments
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/enhancement Proposed improvement or new feature

Comments

@Amit-PivotalLabs
Copy link

Amit-PivotalLabs commented Sep 16, 2016

Hey @slackpad, thanks for meeting with some of us from the Pivotal Cloud Foundry team, here's some of the points we talked about.

  • Migrate all gossip key stuff off RPC client so we don't have to use the RPC client at all to orchestrate how we start the consul binary.
  • Have an HTTP API endpoint for leave command, instead of only RPC interface.
  • "Stats for Raft -> consul operator healthy server replication OK CLI + API" -- so that we can simplify Consul orchestration, knowing that a server is synced rather than polling stats endpoint and comparing numbers.
  • "Bootstrap CLI + API -> are you in a cluster and are you synced" -- so that we can programmatically put a server into bootstrap mode instead of rewriting config and having to restart server.
  • Make a server leave as a first class thing.
  • Does leave give the right feedback? Seems like it returns before leave has happened.
  • Will stale Serf info add a failed server back in (do we check the health)?
  • Server name change issue: Failed to perform consul server rolling upgrade  #2172.
  • Client rapidly turning into a server gets ignored by other clients as a non-server.
  • Gossip keys install should say who failed and why (at least the IP).
  • If keys have been rotated new agents don't have the old key history - where are the keys stored? What files would they need to keep on a restart?
  • Do we need a special case operator command to help with leaves (like maybe take a list of server IPs to RPC to)?
  • Manual mode would need TLS configs and stuff, i.e. being able to cURL a Consul server with keys and certs, or even the consul CLI supporting commands to remote servers and supporting flags for TLS things -- needs to support just giving it an IP and maybe ignoring domain name verification (chicken-egg problem).

Some of the items above are maybe a bit roughly worded, we can work on refining them.

Also, please let us know how we can help on these via PRs.

Thanks!

@Amit-PivotalLabs
Copy link
Author

Hey @slackpad any update on any of these issues, or ideas on how we can help?

@slackpad
Copy link
Contributor

Keyring API will be done under #2502.

@slackpad
Copy link
Contributor

Leave was done under #2516.

@slackpad
Copy link
Contributor

slackpad commented May 2, 2017

"Bootstrap CLI + API -> are you in a cluster and are you synced" -- so that we can programmatically put a server into bootstrap mode instead of rewriting config and having to restart server.

An automatic check for this was added in 0.7 if you use bootstrap expect. The servers query each other and make sure the other servers aren't part of an existing cluster before bootstrapping, so it's always possible to leave bootstrap-expect set.

@Amit-PivotalLabs
Copy link
Author

/cc @christianang @evanfarrar @zankich

@slackpad slackpad added type/enhancement Proposed improvement or new feature theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner labels May 18, 2017
@slackpad
Copy link
Contributor

slackpad commented Jun 1, 2017

"Stats for Raft -> consul operator healthy server replication OK CLI + API" -- so that we can simplify Consul orchestration, knowing that a server is synced rather than polling stats endpoint and comparing numbers.

https://www.consul.io/docs/guides/autopilot.html#server-health-checking
https://www.consul.io/api/operator/autopilot.html#read-health

@slackpad
Copy link
Contributor

slackpad commented Jun 1, 2017

Will stale Serf info add a failed server back in (do we check the health)?

https://www.consul.io/docs/guides/autopilot.html#stable-server-introduction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

2 participants