snap high availability #773

andrzej-k · 2016-03-16T11:08:28Z

Let's assume that I have a cluster of nodes that I'd like to monitor. I have a possibility (or must) to do this out-of-band (proxy mode), meaning that snapd is not installed on a target nodes but instead is installed on a dedicated nodes that can communicate with targets and retrieve metrics out-of-band (for example via. REST API or IPMI). Now, since it's important to have all the metrics all the time, I'd like to be sure that failure of the nodes hosting snapd is mitigated. As an example, there are 3 nodes hosting snapd, and only one is retrieving the metrics, but when it fails other node(s) take over flawlessly. I also don't want duplicated metrics, so those 3 snapds cannot run the same workflow as this will result in the same metrics being retrieved and published 3 times. Can tribe somehow help achieving HA in such scenario? Would we need new features to support it?

woodsaj · 2016-03-22T08:46:06Z

i would also be interested in this. When creating a task for a "tribe" it would be great if there was an option for the task to run on only 1 of the tribe members rather then on all. If the 1 tribe member went off line, the task should then be picked up by another tribe member.

simonpasquier · 2016-03-30T20:03:39Z

+1, this is something that would be useful for integrating Snap with LMA/StackLight (a monitoring solution for OpenStack clouds). Typically StackLight get part of its metrics by querying OpenStack API endpoints, we do it from a single client at fixed intervals and if that client fails, we fail over to another client instance.

mbbroberg · 2016-11-30T17:50:17Z

I'd like to consider this for our next round of roadmap planning @bjray @jcooklin

mbbroberg · 2017-06-12T22:53:21Z

Given that this is a question and it's covered by the other two RFCs, closing this thread out as a successful discussion of the need 👍

andrzej-k added the type/question label Mar 17, 2016

andrzej-k added the tracked label Mar 22, 2016

jcooklin mentioned this issue Mar 24, 2016

RFC: Tribe - subtribes, policy, message encryption and calling remote plugins #640

Open

andrzej-k mentioned this issue Mar 23, 2017

No Duplicate Polling #1558

Open

dishmael mentioned this issue Apr 3, 2017

RFC: Tribe Clusters and Worker Pattern #1584

Open

mbbroberg closed this as completed Jun 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snap high availability #773

snap high availability #773

andrzej-k commented Mar 16, 2016

woodsaj commented Mar 22, 2016

simonpasquier commented Mar 30, 2016

mbbroberg commented Nov 30, 2016

mbbroberg commented Jun 12, 2017

snap high availability #773

snap high availability #773

Comments

andrzej-k commented Mar 16, 2016

woodsaj commented Mar 22, 2016

simonpasquier commented Mar 30, 2016

mbbroberg commented Nov 30, 2016

mbbroberg commented Jun 12, 2017