-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import maghemite mg-ddm service #1249
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
sled-agent/src/bootstrap/server.rs
Outdated
.get_link() | ||
.map_err(|err| format!("Failed to find physical link: {err}"))?; | ||
|
||
// TODO-correctness (a) Is "linklocal" always right? (b) Do we need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be correct with regards to RFD 63 section 6.3.1:
Peering and traffic forwarding between routers takes place over link-local addresses. When a router first starts, it sends out neighbor discovery protocol (NDP) router solicitations over link-local multicast to discover neighboring routers. Neighboring routers respond to multicast solicitations sourced from a link-local unicast address over unicast thereby establishing bi-directional unicast link-local communication. From there, the routers engage in a peering protocol and subsequently begin to exchange prefix advertisements.
I don't see why it would break single node behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should've been more specific. IIUC the name "linklocal"
is just convention and not special, but maybe that's good enough? E.g., when I create fresh VMs, I don't actually have a $PHYS/linklocal
address object without creating one manually.
The breakage is that starting the service fails if the address object we specify here doesn't exist. I think that failure is correct (since we won't be able to route off node without mg-ddm), but I don't want it to fail based on a bad assumption about this name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. I was assuming this created the addrconf object. In production we should definitely fail if the link-local address doesn't exist right? I'd prefer not to qualify this for test only in that case as it's mighty easy to create a link-local address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely have to have a link-local address, yes. I don't know who or when should create that - should bootstrap-agent do that here if doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure as to who creates the address, but I believe the intention is for that to be over the loopback interface lo0
. (See this for some notes.)
There are multiple Chelsio physical links on the sled, both of which provide a potential path to talk to the bootstrap server. Maghemite is responsible for advertising the prefix over both Chelio links, and updating the OS's routing tables accordingly. Although @rcgoodfellow can talk in much more depth about it, my understanding is that traffic will be dynamically routed through one or the other Chelsio depending on the expected delay sending a packet down each of them. But it expects to be able to reach the bootstrap address through either link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maghemite needs link-local/addrconf addresses to be set up for both sled physical interfaces it will route over. So if our interfaces on the sled are
- cxgbe0
- cxgbe1
Then the addrconf addresses
- cxgbe0/linklocal
- cxgbe1/linklocal
should be set up (not advocating specifically for "linklocal" there, but that seems to be the trend so rolling with it, I don't have a good name to suggest) early in the sled initialization process.
I don't have an opinion on who sets those up, only that they are set up before Maghemite starts so the address object names (i.e. "cxgbe0/linklocal") can be provided to Maghemite on startup. If the addrconf addresses are set up as temporary, this ordering of addresses before Maghemite will need to be ensured on each boot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although @rcgoodfellow can talk in much more depth about it, my understanding is that traffic will be dynamically routed through one or the other Chelsio depending on the expected delay sending a packet down each of them. But it expects to be able to reach the bootstrap address through either link.
That's correct, but in order to do that, we need a distinct link-local address on each physical interface which provides a path to the interface-independent underlay ULA /64 prefix of the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rcgoodfellow!
I think the main followup question comes from
I don't have an opinion on who sets those up, only that they are set up before Maghemite starts
i.e., should sled-agent set up the addrconf addresses itself, or assume they already exist? Here in the code I'm assuming both that (a) it already exists and (b) it has the name $DATALINK/linklocal
, and perhaps both of those assumptions are wrong. Could we extend AddressRequest, adding AddrConf
, and then use ensure_address()
? At that point we could pick whatever name we want, since sled-agent would be responsible for setting it up (and therefore knowing it to pass to Maghemite).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poking around, we don't need to extend AddressRequest
at all; we already have a function that will ensure an AddrObject
exists with a link-local address. Added a call to it in 6309c7b. I believe this will fail if the interface already has an addrconf
address of a different name, but hopefully that's okay? If something else is creating addrconf addresses before sled-agent, we should sync up on the name.
// TODO-cleanup mg-ddm supports multiple interfaces, but `smf` currently | ||
// doesn't expose an equivalent of `svccfg addpropvalue`. If we need | ||
// multiple interfaces we'll need to extend smf. | ||
let interface = interface.to_string(); | ||
info!(log, "Setting mg-ddm interface"; "interface" => interface.as_str()); | ||
smf::Config::set_property(SERVICE_FMRI).run(smf::Property::new( | ||
smf::PropertyName::new("config", "interfaces").unwrap(), | ||
smf::PropertyValue::Astring(interface), | ||
))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this library is just a wrapper around the /usr/sbin/svccfg
command - if it's too inflexible, it's totally okay to call out to that command manually.
(extending the library is also a viable option, as you suggested).
Propolis changes: Update h2 dependency Add NPT ops API definitions from illumos#15639 server: return better HTTP errors when not ensured (#649) Crucible changes: Make Region test suite generic across backends (#1263) Remove async from now-synchronous functions (#1264) Agent update to support cloning. (#1262) Remove the Active → Faulted transition (#1260) Avoid race condition in crutest rand-read/write (#1261) Add Active -> Offline -> Faulted tests (#1257) Reorganize dummy downstairs tests (#1253) Switch to unbounded queues (#1256) Add Upstairs session ID to dtrace stat probe, cleanup closure (#1254) Panic instead of returning errors in unit tests (#1251) Add a clone option to downstairs create (#1249)
Propolis changes: Update h2 dependency Add NPT ops API definitions from illumos#15639 server: return better HTTP errors when not ensured (#649) Crucible changes: Make Region test suite generic across backends (#1263) Remove async from now-synchronous functions (#1264) Agent update to support cloning. (#1262) Remove the Active → Faulted transition (#1260) Avoid race condition in crutest rand-read/write (#1261) Add Active -> Offline -> Faulted tests (#1257) Reorganize dummy downstairs tests (#1253) Switch to unbounded queues (#1256) Add Upstairs session ID to dtrace stat probe, cleanup closure (#1254) Panic instead of returning errors in unit tests (#1251) Add a clone option to downstairs create (#1249) Co-authored-by: Alan Hanson <alan@oxide.computer>
This is PR 1 of 3 to get to the point of advertising bootstrap addresses via maghemite instead of UDP multicast. It adds the
mg-ddm
global zone service as an external package (from the private maghemite repo). I tweakedomicron-package
to untar all global zone packages (previouslysled-agent
was the only global zone service), and have bootstrap-agent starting the service early in its life.There are a few TODOs in this; I'm happy to address them on the PR or open issues to follow up later. The most critical is the one about using
"linklocal"
when constructing anAddrObject
; if that assumption is wrong it might break even single-sled deployments.