Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import maghemite mg-ddm service #1249

Merged
merged 5 commits into from
Jun 23, 2022
Merged

Import maghemite mg-ddm service #1249

merged 5 commits into from
Jun 23, 2022

Conversation

jgallagher
Copy link
Contributor

This is PR 1 of 3 to get to the point of advertising bootstrap addresses via maghemite instead of UDP multicast. It adds the mg-ddm global zone service as an external package (from the private maghemite repo). I tweaked omicron-package to untar all global zone packages (previously sled-agent was the only global zone service), and have bootstrap-agent starting the service early in its life.

There are a few TODOs in this; I'm happy to address them on the PR or open issues to follow up later. The most critical is the one about using "linklocal" when constructing an AddrObject; if that assumption is wrong it might break even single-sled deployments.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

.get_link()
.map_err(|err| format!("Failed to find physical link: {err}"))?;

// TODO-correctness (a) Is "linklocal" always right? (b) Do we need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be correct with regards to RFD 63 section 6.3.1:

Peering and traffic forwarding between routers takes place over link-local addresses. When a router first starts, it sends out neighbor discovery protocol (NDP) router solicitations over link-local multicast to discover neighboring routers. Neighboring routers respond to multicast solicitations sourced from a link-local unicast address over unicast thereby establishing bi-directional unicast link-local communication. From there, the routers engage in a peering protocol and subsequently begin to exchange prefix advertisements.

I don't see why it would break single node behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should've been more specific. IIUC the name "linklocal" is just convention and not special, but maybe that's good enough? E.g., when I create fresh VMs, I don't actually have a $PHYS/linklocal address object without creating one manually.

The breakage is that starting the service fails if the address object we specify here doesn't exist. I think that failure is correct (since we won't be able to route off node without mg-ddm), but I don't want it to fail based on a bad assumption about this name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. I was assuming this created the addrconf object. In production we should definitely fail if the link-local address doesn't exist right? I'd prefer not to qualify this for test only in that case as it's mighty easy to create a link-local address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely have to have a link-local address, yes. I don't know who or when should create that - should bootstrap-agent do that here if doesn't exist?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure as to who creates the address, but I believe the intention is for that to be over the loopback interface lo0. (See this for some notes.)

There are multiple Chelsio physical links on the sled, both of which provide a potential path to talk to the bootstrap server. Maghemite is responsible for advertising the prefix over both Chelio links, and updating the OS's routing tables accordingly. Although @rcgoodfellow can talk in much more depth about it, my understanding is that traffic will be dynamically routed through one or the other Chelsio depending on the expected delay sending a packet down each of them. But it expects to be able to reach the bootstrap address through either link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maghemite needs link-local/addrconf addresses to be set up for both sled physical interfaces it will route over. So if our interfaces on the sled are

  • cxgbe0
  • cxgbe1

Then the addrconf addresses

  • cxgbe0/linklocal
  • cxgbe1/linklocal

should be set up (not advocating specifically for "linklocal" there, but that seems to be the trend so rolling with it, I don't have a good name to suggest) early in the sled initialization process.

I don't have an opinion on who sets those up, only that they are set up before Maghemite starts so the address object names (i.e. "cxgbe0/linklocal") can be provided to Maghemite on startup. If the addrconf addresses are set up as temporary, this ordering of addresses before Maghemite will need to be ensured on each boot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although @rcgoodfellow can talk in much more depth about it, my understanding is that traffic will be dynamically routed through one or the other Chelsio depending on the expected delay sending a packet down each of them. But it expects to be able to reach the bootstrap address through either link.

That's correct, but in order to do that, we need a distinct link-local address on each physical interface which provides a path to the interface-independent underlay ULA /64 prefix of the host.

Copy link
Contributor Author

@jgallagher jgallagher Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rcgoodfellow!

I think the main followup question comes from

I don't have an opinion on who sets those up, only that they are set up before Maghemite starts

i.e., should sled-agent set up the addrconf addresses itself, or assume they already exist? Here in the code I'm assuming both that (a) it already exists and (b) it has the name $DATALINK/linklocal, and perhaps both of those assumptions are wrong. Could we extend AddressRequest, adding AddrConf, and then use ensure_address()? At that point we could pick whatever name we want, since sled-agent would be responsible for setting it up (and therefore knowing it to pass to Maghemite).

Copy link
Contributor Author

@jgallagher jgallagher Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poking around, we don't need to extend AddressRequest at all; we already have a function that will ensure an AddrObject exists with a link-local address. Added a call to it in 6309c7b. I believe this will fail if the interface already has an addrconf address of a different name, but hopefully that's okay? If something else is creating addrconf addresses before sled-agent, we should sync up on the name.

sled-agent/src/bootstrap/server.rs Outdated Show resolved Hide resolved
Comment on lines +43 to +51
// TODO-cleanup mg-ddm supports multiple interfaces, but `smf` currently
// doesn't expose an equivalent of `svccfg addpropvalue`. If we need
// multiple interfaces we'll need to extend smf.
let interface = interface.to_string();
info!(log, "Setting mg-ddm interface"; "interface" => interface.as_str());
smf::Config::set_property(SERVICE_FMRI).run(smf::Property::new(
smf::PropertyName::new("config", "interfaces").unwrap(),
smf::PropertyValue::Astring(interface),
))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this library is just a wrapper around the /usr/sbin/svccfg command - if it's too inflexible, it's totally okay to call out to that command manually.

(extending the library is also a viable option, as you suggested).

@jgallagher jgallagher merged commit 3a09961 into main Jun 23, 2022
@jgallagher jgallagher deleted the maghemite-service branch June 23, 2022 20:09
leftwo pushed a commit that referenced this pull request Apr 19, 2024
Propolis changes:
Update h2 dependency
Add NPT ops API definitions from illumos#15639
server: return better HTTP errors when not ensured (#649)

Crucible changes:
Make Region test suite generic across backends (#1263)
Remove async from now-synchronous functions (#1264)
Agent update to support cloning. (#1262)
Remove the Active → Faulted transition (#1260)
Avoid race condition in crutest rand-read/write (#1261)
Add Active -> Offline -> Faulted tests (#1257)
Reorganize dummy downstairs tests (#1253)
Switch to unbounded queues (#1256)
Add Upstairs session ID to dtrace stat probe, cleanup closure (#1254)
Panic instead of returning errors in unit tests (#1251)
Add a clone option to downstairs create (#1249)
leftwo added a commit that referenced this pull request Apr 19, 2024
Propolis changes:
Update h2 dependency
Add NPT ops API definitions from illumos#15639
server: return better HTTP errors when not ensured (#649)

Crucible changes:
Make Region test suite generic across backends (#1263) Remove async from
now-synchronous functions (#1264) Agent update to support cloning.
(#1262)
Remove the Active → Faulted transition (#1260)
Avoid race condition in crutest rand-read/write (#1261) Add Active ->
Offline -> Faulted tests (#1257)
Reorganize dummy downstairs tests (#1253)
Switch to unbounded queues (#1256)
Add Upstairs session ID to dtrace stat probe, cleanup closure (#1254)
Panic instead of returning errors in unit tests (#1251) Add a clone
option to downstairs create (#1249)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants