Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement Consul Backend #767

Closed
wants to merge 13 commits into from
Closed

Conversation

ajvb
Copy link

@ajvb ajvb commented Feb 10, 2017

This is an initial attempt at a consul backend for Teleport.

It's heavily based off of the existing etcd backend, with the main difference of borrowing the TTL logic of the boltdb backend.

Initial Questions:

  1. In the etcd backend, there is stopC and cancelC but they don't seem to do anything. What is there purpose? Is this something I should add to the Consul backend?
  2. Can I get some guidance on how to test this? I have very little experience with Teleport, as this PR is really needed before my company can even consider Teleport.
  3. I haven't used godeps in a while, how do I properly add github.com/hashicorp/consul/api to it?

Fixes #423

@gravitational-jenkins
Copy link

Can one of the admins verify this patch?

@klizhentas
Copy link
Contributor

klizhentas commented Feb 10, 2017

thanks for giving it a try. One question I have - have you considered using sessions to achieve TTL semantics instead of custom implementation?

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/consul-tool/0Y034z4mo_4/eJ-1x97JNgAJ

I haven't used sessions for TTL myself, may be someone from consul team can give us advice.

Regarding your questions:

In the etcd backend, there is stopC and cancelC but they don't seem to do anything. What is there purpose? Is this something I should add to the Consul backend?

stopC and cancelC are used in tests, I will take a closer look, may be they are obsolete though

Can I get some guidance on how to test this? I have very little experience with Teleport, as this PR is really needed before my company can even consider Teleport.

We have an acceptance test suite used for all backends here:

https://github.com/gravitational/teleport/blob/master/lib/backend/test/suite.go

See for example how we use it for Etcd:

https://github.com/gravitational/teleport/blob/master/lib/backend/etcdbk/etcd_test.go

I haven't used godeps in a while, how do I properly add github.com/hashicorp/consul/api to it?

To properly add it, in your $GOPATH (caution this command updates your workspace!!!)

# this is to align your dependencies with teleport's ones
cd teleport
godep restore

Then

# this will add new consul values
godep save ./lib/... ./tool/... ./integration/...

@ekristen
Copy link

I would definitely suggest you take a look at the libkv implementation of consul and realistically it might be best to just switch to using libkv for all backends and that way the community can contribute back to libkv as well.

Personally I've had a love hate relationship with consul sessions when dealing with them directly in the past. It is an interesting spin on how to expire key/value pairs vs the more traditional TTL on a specific key/value pair.

Ultimately you have to first create a session and then any key/value pairs you create you have to reference that session. That session then should either be maintained for the duration or closed out.

Note: DO NOT create a session for every key/value pair, this will overwhelm any consul cluster and it is not how the sessions are designed to work.

@ajvb
Copy link
Author

ajvb commented Feb 11, 2017

@klizhentas

thanks for giving it a try. One question I have - have you considered using sessions to achieve TTL semantics instead of custom implementation?

I haven't used sessions for TTL myself, may be someone from consul team can give us advice.

I have not. I'm reaching out to my Hashicorp connection to see if I can get the relevant questions answered.

stopC and cancelC are used in tests, I will take a closer look, may be they are obsolete though

Ok awesome

We have an acceptance test suite used for all backends here:

Ok awesome, I may have further questions related to running it, but I'll give it a try.

To properly add it, in your $GOPATH (caution this command updates your workspace!!!)

Done, thanks.


@ekristen

I would definitely suggest you take a look at the libkv implementation of consul

Will do.

realistically it might be best to just switch to using libkv for all backends and that way the community can contribute back to libkv as well.

I'm not against this, but this is absolutely not my call and I cannot justify the needed time for it. I want to be able to try out Teleport at work, and simply having consul support is what is needed.

Personally I've had a love hate relationship with consul sessions when dealing with them directly in the past. It is an interesting spin on how to expire key/value pairs vs the more traditional TTL on a specific key/value pair.

Ultimately you have to first create a session and then any key/value pairs you create you have to reference that session. That session then should either be maintained for the duration or closed out.

Note: DO NOT create a session for every key/value pair, this will overwhelm any consul cluster and it is not how the sessions are designed to work.

Thank you so much for this information, I did not know all of this and Hashicorp's docs are lacking.

I will take a look at libkv's consul implementation, but I am uncertain at this time how I can use sessions for TTL in the way Teleport expects.

@ekristen
Copy link

FWIW @ajvb it might be easiest to just use libkv to do the consul backend in this PR, and once it is in, other backends can start to migrate to it. Just a thought.

@ajvb
Copy link
Author

ajvb commented Feb 11, 2017

@ekristen Looking at libkv, I am actually becoming fond of that idea. Would definitely make this implementation a lot simpler. Let give that a shot.

@klizhentas
Copy link
Contributor

@ekristen thanks for your concern about using sessions for TTL, looking at this:

https://github.com/docker/libkv/blob/master/store/consul/consul.go#L203
https://github.com/docker/libkv/blob/master/store/consul/consul.go#L133

So it seems that library does exactly what you are suggesting not to do.

I made several attempts at using libkv, but then backed off, to use it as a generic implementation backend mostly because I had different approaches to various backends.

@klizhentas
Copy link
Contributor

klizhentas commented Feb 11, 2017

@ajvb so here's the plan:

  • let's figure out whether the session per TTL approach is scalable first.
    • If it's not scalable we will have to stick to manual workaround that you proposed in your PR
    • If it is scalable and Consul team recommends using it, we will apply it to your PR.

@ekristen
Copy link

Caveat: it depends on how many keys you are doing and my experience was with Node.JS not golang, so there could be a difference in how things were implemented.

I will say that docker uses libkv with swarm and using consul as a backend and I have not seen problems with it.

When I suggest it not be done, it was definitely in reaction to doing thousands of writes/reads per minute which depending on session length can cause many many sessions to be created.

Here is some information on the session implementation hashicorp/consul#524

@ekristen
Copy link

Looking at the code you referenced, it looks like it will attempt to re-use an existing session should it be available before it tries to create a new one.

@ekristen
Copy link

One more thing to note, sorry for the multiple comments ...

I have this branch that I haven't opened a pull request on yet that enables using Consul ACL tokens
https://github.com/ekristen/libkv/tree/consul-acl-token

@ajvb
Copy link
Author

ajvb commented Feb 11, 2017

@klizhentas Sounds good. Shot off an email to my Hashicorp AE (we have an enterprise account with them), so hopefully I can get answers by Tuesday or so.

@klizhentas
Copy link
Contributor

klizhentas commented Feb 11, 2017

awesome @ajvb I will be glad to assist you with testing/updating/reviewing PR, looking forward to getting it merged!

@ajvb
Copy link
Author

ajvb commented Feb 13, 2017

@klizhentas Do we have numbers for "scope and scale of key/value or session usage"? What is an expected number of sessions to be created within an hour or a day for sessions-per-kvpair?

@kontsevoy
Copy link
Contributor

@ajvb sessions are used when people SSH into boxes. So... the answer is number of employees x number of SSH sessions an average employee does per second ;)

The highest pressure point for the key/value subsystem is actually server pings. the auth server will call "upsert()" on every node in a cluster every 30 seconds AFAIK. So for a 1,000 node clusters expect 33 upserts per second.

@ajvb
Copy link
Author

ajvb commented Feb 14, 2017

@kontsevoy Server pings and when a user ssh's into a server are the only times a new K/V pair is added to the backend?

@klizhentas
Copy link
Contributor

@ajvb There are other keys stored in the DB (like certificate authorities and users) but the amount of writes is negligible compared to the scenarios mentioned by @kontsevoy

@ajvb
Copy link
Author

ajvb commented Feb 14, 2017

@kontsevoy @klizhentas

What I told them:

As I understand it, the norm will be something like 10-40 KVPairs
created a second. The variation depends on the size of your Teleport deployment.

Response from Hashicorp Support:

Hi AJ,

For your use case in the Teleport backend, the answers to the two questions you
posed are as follows:

>>> When I attach a session with a TTL to a KVPair in Consul, when the TTL
>>> expires, is it deleted? 

If you choose the delete behavior when creating the session
(https://www.consul.io/docs/agent/http/session.html#session_create)
then the k/v is deleted when the TTL expires.

>>> Is it reasonable to create a session for each KVPair? Will this overwhelm consul
>>> or is this alright?

For this number of keys/sessions and even into the 100s, a 1:1 key/session mapping
will be alright and not overwhelm Consul.

👍 🎉 💃 I should have some time this week to swap my custom sessions implementation.

Copy link
Author

@ajvb ajvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klizhentas
Copy link
Contributor

@ajvb then we have it all cleared, let me know once you are ready and I will review and test

@ajvb
Copy link
Author

ajvb commented Feb 15, 2017

@klizhentas So I have a fun problem. If you go to https://www.consul.io/docs/agent/http/session.html#session_create and look at the TTL parameter, you will see this line:

If specified, it must be between 10s and 86400s currently.

The current test suite uses sessions with a TTL of 1 second and 2 second, which totally blows up.

If shorter than 10s TTL's is a requirement then we will need to go back to using the embedded TTL like BoltDB.

@klizhentas
Copy link
Contributor

klizhentas commented Feb 15, 2017

@ajvb 1 sec TTL is just for test, you can adjust for consul if that's a problem. We never really use anything less than 30 sec in prod anyways

@klizhentas
Copy link
Contributor

@ajvb I'm going to close this due to the lack of activity. Feel free to reopen if you get a chance to work on it.

@klizhentas klizhentas closed this Apr 25, 2017
@flyinprogrammer
Copy link

so i know this was closed because of inactivity, but any chance we want to revisit getting this working?

hatched pushed a commit to hatched/teleport-merge that referenced this pull request Nov 30, 2022
hatched pushed a commit that referenced this pull request Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants