Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

jkur · 2013-10-02T08:39:47Z

This patch (#6734) seems to break the id grain on debian systems (and maybe many others)!

We just got an update from 0.16.4 to 0.17.0 through unattended upgrades (maybe a bad idea for salt).
Now all our minions are gone, because the id changed and new minion keys are generated, which of course are unknown to the master.
These ids do not include the domain part anymore.
This is really bad. It's not the first time the id generation changes and breaks everything. With such a crucial part of the system, i think there should be taken more care.

Sorry for the angry tone, but this is a bit frustrating.

terminalmage · 2013-10-02T13:51:33Z

Do you have an /etc/hostname on these hosts?

jkur · 2013-10-02T14:17:14Z

Yes, and this is the reason why the minion id changes. But on Debian the /etc/hostname does not include the domainname.
Before #6734 the next rule would match (fqdn = socket.getfqdn() ) and then the fullname including the domain part is used.

basepi · 2013-10-02T20:56:46Z

For the record, I totally agree with your frustration. I was under the impression that the minion cached its ID after its first run, and that is apparently incorrect.

I think the desired behavior will be to guess the ID on the first run of the minion, then store that ID in a file somewhere. On subsequent runs, unless the ID is specified in the minion config (which should always be the authoritative source of minion ID), it should use the minion ID that was cached.

@terminalmage WRT that pull req, on systems like arch, was socket.getfqdn() not working? what was the rationale? Should we remove the /etc/hostname check, or make it lower priority, or what?

basepi · 2013-10-02T20:57:27Z

I actually labeled this as high severity -- I think this ID caching is important, since the master does all its key management based on minion ID.

terminalmage · 2013-10-02T21:12:05Z

We were failing back to localhost on Arch Linux machines. socket.getfqdn() was returning localhost, which Salt tries to avoid, but if all else fails it does settle on localhost rather than simply bailing.

basepi · 2013-10-02T21:28:28Z

Ah, so what if we were to try socket.getfqdn() first, and then try /etc/hostname if it returns localhost?

basepi · 2013-10-02T21:28:41Z

Just switch the priority, basically.

basepi · 2013-10-03T00:03:19Z

Alright, just got the go-ahead from @thatch45.

The fix will be two-fold. First, we'll make socket.getfqdn() the first source for guessing the minion ID, and if it returns localhost, we'll move on to /etc/hostname. This will solve this original regression.

Second, we add caching. This means that once the minion has guessed its ID once, it will just read that ID from the cache on future starts. This will prevent future regressions. Also, if you define a minion ID in the config, that will override all other guessing/caching. The minion config is the source of truth for minion IDs.

If nobody has any objections, I'll probably start on this tomorrow or Friday.

jkur · 2013-10-03T12:30:44Z

Sounds great. Especially the caching thing is highly welcome. Many thanks for all your efford.

joehealy · 2013-10-03T13:11:47Z

@BASPI - if you finish the patch and a 0.17.1 is delayed, i'd be happy to do a 0.17.0-2 for debian that contained the patch.

Let me know if this would be worthwhile.

joehealy · 2013-10-03T13:42:55Z

@jkur probably also worth having a look at the "Release Streams" section on debian.saltstack.com.

Aiming to give users more control over timing of upgrades.

basepi · 2013-10-03T23:24:38Z

@joehealy There are a few minor show-stopping bugs in salt-ssh in 0.17.0, so I expect we will be doing a 0.17.1 next week. I don't think a -2 release will be required.

malinoff · 2013-10-07T05:42:51Z

@basepi what about my notice?

Right now minion's id is the only possible way to target minions without being afraid that some minion has changed it's id and remains authenticated, instead of grains.
If a minion's id will be cached, this solution won't work.

basepi · 2013-10-07T19:07:53Z

@malinoff Wait, I'm confused.

So you're saying targeting by ID is the only way to target without being afraid that the ID has changed? That doesn't make sense.

Can you clarify your use case?

Note that this ID "caching" is only on the minion, so that it keeps the same ID each time it starts up. Looks like you might be thinking it's on the master.

malinoff · 2013-10-08T01:34:16Z

@basepi No, i wasn't talking about that.
Imagine that a malefactor knows that we're using G@roles:project-node notation. And he's got access to a minion that has no such grain. When targeting with grains, a malefactor can specify roles:project-node grain, and he will get all information about the project. When targeting by ID, such situation would never happen.

But yes, i see that caching will not break it. Thanks for the clarify.
I mixed caching ID with caching minion's key.

joehealy · 2013-10-08T13:33:42Z

Andrii Senkovych (@jollyroger) has prepared a temporary packaging fix of this. It has been uploaded to debian.saltstack.com for squeeze, wheezy and unstable. The version is 0.17.0-2.

basepi · 2013-10-08T19:47:34Z

Opened a pull req with the fix: #7678

Feedback welcome.

Cache minion ID after first time guessing it, Fixes #7558

LeTink · 2013-10-10T03:43:40Z

I can confirm that the freshly applied update on ubuntu 12.04 has the same result. I had to re-accept 2 ubuntu minions, both now w/o the FQDN. 1 Centos minion went unharmed.

drags · 2013-10-10T07:04:09Z

Appreciate the quick response and re-prioritizing socket.getfqdn(). 👍

I would like to present a counter argument that minion IDs should not be cached.

Minion IDs are set to the system hostname. They are also used as the most basic (and common to new users) form of targeting jobs. Having the minion ID drift from the hostname is jarring and unexpected from a user perspective.

Two scenarios come to mind where the hostname is likely to change:

Machines that do not have their final hostname during bootstrap, when the minion is most likely to be installed.
Clusters that are renamed as they move from staging/beta to production.

basepi · 2013-10-11T16:23:26Z

Right, but even if a minion ID changes, the difficulty comes because the master doesn't smoothly transition to the new minion ID. Rather, it sees the new ID and assumes it's a new minion. You have to re-accept the key and delete the old one to prevent delays because of the old key. In most cases, the headaches caused by this change happening without you triggering it are greater than the slight annoyance of having to either define the minion ID explicitly or delete the cache to get the minion to fetch the correct hostname.

In the end, we always recommend that for most deployments you should really just be defining your minion IDs explicitly in the minion config -- this will prevent any sort of problem along these lines.

That being said, it's really easy to get the minion to guess its new ID after the bootstrapping is complete -- just delete the minion_id file from /etc/salt (or wherever your config files are stored)

korylprince · 2013-10-14T17:28:38Z

Just upgraded the ubuntu packages today and ran into this issue. Was about to post about it to the mailing list and saw this. Glad to see it's already fixed. Will this be a fix that goes into the package repos (0.17.0.2 or something) or will we have to wait until 0.18.0?

Also, I like the cached minion_id. It should help with deployment also.

basepi · 2013-10-15T21:56:56Z

It will be in 0.17.1, which we will hopefully cut tonight or tomorrow morning, to be in the PPA by end of the week.

DaveQB · 2014-06-16T06:22:11Z

Interesting. I just got bitten by this. I setup a new SOE image for us. In doing so, I delete ssh and salt keys, but I was not aware of this /etc/salt/minion_id file. You can imagine the hair pulling out and the frustrations I have been facing wondering by my new hosts keys are not appearing on the salt master.

A bunch of md5sum checking and debugging finally found they were all turning up as the identical hostname of the original machine the SOE was created off.

I don't think I am adding anything to this conversation, just getting it off my chest. :)

basepi · 2014-06-16T16:57:44Z

Oh no! Sorry for the inconvenience, @DaveQB -- in the current version of salt, we report where the minion is getting its ID at the INFO log level, to try to make it blatantly obvious that we're using a cached minion ID. Guess we need to see if we can message it better.

DaveQB · 2014-06-17T07:13:32Z

@basepi I didn't notice this in the log. I used debug level too.

What has happened for me, is I have updated our SOE image. The previous had salt 13.x and so the updated one has 17.6. So now salt is getting and setting this /etc/salt/minion_id file very early in the start up process, before the final hostname is set. Causing me troubles.

I am thinking about setting an empty, immutable file at /etc/salt/minion_id in the image, so it can't be set at all.

Would an empty /etc/salt/minion_id file mean salt-minion skips this caching look up and use the systems FQDN?

basepi · 2014-06-17T19:18:00Z

Sorry, I should have mentioned, you can turn off minion ID caching completely by setting minion_id_caching: False in the minion config.

DaveQB · 2014-06-17T23:08:44Z

@basepi Oh great. I was meaning to put that question in my comment. I'll be adding that to my SOE.
Thanks!

Alivefish · 2018-01-24T23:23:43Z

If one doesn't have bootstrap setup, and remote windows execution via smbclient/winRM tcp445 for salt-cloud isn't allowed through any firewall besides the one in your basement (or for AD and Security tools), is it possible to create a minion config for the sole purpose of baking into a golden windows image that would behave in a such a way that it would produce a dynamic minion id based on hostname_ip for the minion id vs hostname upon startup? Using hostname/fqdn for standard (understandable by design and as intended for normal ops) you end up with 10s or 100s of vm's with a minion id of the lab host you built the image from which doesn't exist in your production dns and it just goes down hill to fix from there. The master gets flooded with minions with the same id with no easy way to track down source node and automate remediation without parsing a very high volume event bus or using bad remote execution tool we are trying to rid ourselves of; I'm sure that is up for debate but bare with me.
Of course, this isn't your go forward production minion config. We'd create some kind of beacon or watcher to scan enterprise or stat postgres for id's containing 32bit address in the name (e.g. labsvr_10.0.0.5) target host by ip, push standard config, restart agent, delete old newsvr_10.0.0.5 and viola. Yes, this is dumb, dangerous, and different but eliminates a few dependencies and selling tcp/445 (which every windows exploit on the planet targets, is locked down by group policy in most cases, and requires an act of congress to open up through a firewall). Also, assuming tcp/445 is only required using winrepo method for windows vm vmware module and not for linux methods of the same IaaS. Laughs, comments, bad jokes anyone?

ghost assigned basepi Oct 3, 2013

basepi mentioned this issue Oct 8, 2013

Cache minion ID after first time guessing it, Fixes #7558 #7678

Merged

thatch45 closed this as completed in #7678 Oct 9, 2013

thatch45 added a commit that referenced this issue Oct 9, 2013

Merge pull request #7678 from basepi/idcache7558

d6c5d4a

Cache minion ID after first time guessing it, Fixes #7558

basepi mentioned this issue Oct 11, 2013

After upgrade to 0.17 all hostnames changed #7729

Closed

rallytime mentioned this issue Jun 2, 2016

Always update minion ID on restart, by default. #33703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

jkur commented Oct 2, 2013

terminalmage commented Oct 2, 2013

jkur commented Oct 2, 2013

basepi commented Oct 2, 2013

basepi commented Oct 2, 2013

terminalmage commented Oct 2, 2013

basepi commented Oct 2, 2013

basepi commented Oct 2, 2013

basepi commented Oct 3, 2013

jkur commented Oct 3, 2013

joehealy commented Oct 3, 2013

joehealy commented Oct 3, 2013

basepi commented Oct 3, 2013

malinoff commented Oct 7, 2013

basepi commented Oct 7, 2013

malinoff commented Oct 8, 2013

joehealy commented Oct 8, 2013

basepi commented Oct 8, 2013

LeTink commented Oct 10, 2013

drags commented Oct 10, 2013

basepi commented Oct 11, 2013

korylprince commented Oct 14, 2013

basepi commented Oct 15, 2013

DaveQB commented Jun 16, 2014

basepi commented Jun 16, 2014

DaveQB commented Jun 17, 2014

basepi commented Jun 17, 2014

DaveQB commented Jun 17, 2014

Alivefish commented Jan 24, 2018 •

edited

Loading

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

Comments

jkur commented Oct 2, 2013

terminalmage commented Oct 2, 2013

jkur commented Oct 2, 2013

basepi commented Oct 2, 2013

basepi commented Oct 2, 2013

terminalmage commented Oct 2, 2013

basepi commented Oct 2, 2013

basepi commented Oct 2, 2013

basepi commented Oct 3, 2013

jkur commented Oct 3, 2013

joehealy commented Oct 3, 2013

joehealy commented Oct 3, 2013

basepi commented Oct 3, 2013

malinoff commented Oct 7, 2013

basepi commented Oct 7, 2013

malinoff commented Oct 8, 2013

joehealy commented Oct 8, 2013

basepi commented Oct 8, 2013

LeTink commented Oct 10, 2013

drags commented Oct 10, 2013

basepi commented Oct 11, 2013

korylprince commented Oct 14, 2013

basepi commented Oct 15, 2013

DaveQB commented Jun 16, 2014

basepi commented Jun 16, 2014

DaveQB commented Jun 17, 2014

basepi commented Jun 17, 2014

DaveQB commented Jun 17, 2014

Alivefish commented Jan 24, 2018 • edited Loading

Alivefish commented Jan 24, 2018 •

edited

Loading