Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advertise bootstrap addresses via maghemite #1251

Merged
merged 12 commits into from
Jun 24, 2022

Conversation

jgallagher
Copy link
Contributor

@jgallagher jgallagher commented Jun 22, 2022

This is PR 3 of 3 to get to the point of advertising bootstrap addresses via maghemite instead of UDP multicast. It removes the PeerDiscovery type and all its supporting multicast machinery, and replaces it with ddm-api-client calls:

  1. bootstrap-agent tells ddmd to advertise its own bootstrap prefix
  2. RSS queries ddmd while waiting for a sufficient number of peers to arrive

This PR does not advertise underlay prefixes, so multisled deployments don't fully start up; it should make it past the bootstrapping process and then fail when trying to init services, because they won't be able to talk to off-sled nexus instances. E.g., in my VM logs, I see

[2022-06-22T18:59:36.291101087Z]  INFO: SledAgent/RSS/356 on helios1: initializing sled services: [ServiceRequest { id: 678c0120-0de0-4246-b709-37bc3125a5dc, name: "nexus", addresses: [fd00:1122:3344:102::3], gz_addresses: [], service_type: Nexus { internal_address: [fd00:1122:3344:102::3]:12221, external_address: [fd00:1122:3344:102::3]:12220 } }]
[2022-06-22T18:59:36.291905552Z]  WARN: SledAgent/RSS/356 on helios1: failed to initialize services
    error: Communication Error: error sending request for url (http://[fd00:1122:3344:102::1]:12345/services): error trying to connect: tcp connect error: Network is unreachable (os error 128)

which is entirely expected and will be fixed in a future PR. Single-sled deployments should be unaffected; I tested that with a standalone VM, but please feel free to test and confirm on your setup(s).

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @jgallagher. This small PR doesn't show all the effort it took to get here. It's really excellent to see this!

@@ -29,6 +30,12 @@ use std::sync::Arc;
use thiserror::Error;
use tokio::sync::Mutex;

/// Initial octet of IPv6 for bootstrap addresses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't explain why these constants make me happy but they do. Maybe it's seeing much discussion and writing codified.

.await
.map_err(BootstrapError::DdmError)
.map_err(|err| BackoffError::transient(err))?
.collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a Set instead? Is it guaranteed that peer_addrs() won't return duplicates? I'd say better to be on the safe side, as the extra check doesn't hurt. Note that if there are duplicates, the shared secret creation will fail anyway, but it happens later after extra network traffic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a Set instead? Is it guaranteed that peer_addrs() won't return duplicates?

It's not guaranteed by peer_addrs() itself, which is a thin wrapper around querying maghemite for advertised prefixes from peers. But if we get duplicates from maghemite, something has gone horribly wrong and it's likely routing between sleds is fubar'd; that would mean maghemite thinks two different sleds are advertising the same bootstrap address.

If that's the only case where we could get duplicates here, I'm not sure whether Vec or HashSet would be nicer from a debuggability point of view. If HashSet, we lose dups and logs look like we're continuing to wait for all the sleds to show up; if we keep Vec, we'll see more evidence of weird things happening (i.e., it looks like all sleds have shown up but then combining secrets fails). I'm slightly inclined to leave it as Vec based on this, but could be missing something or mis-analyzing this. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. You are right about debugging! How about keeping it a Vec, but returning an error in the case there is duplicates? Is that too much extra work to be done? How are we going to discover if this ever happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I replaced .collect() with a manual for loop that builds up a HashSet but fails loudly on duplicates? That might be the best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. Please just add a related comment so nobody is confused why you didn't write it as it is now.

let prefixes = self.client.get_prefixes().await?.into_inner();
info!(self.log, "Received prefixes from ddmd"; "prefixes" => ?prefixes);
Ok(prefixes.into_iter().filter_map(|(_, prefixes)| {
// TODO-correctness What if a single peer is advertising multiple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work fine, since that same peer will have the same keys no matter what the address is. The trust quorum and sprockets were specifically designed not to care about addresses :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I'll remove TODO-correctness and update the comment with this note - thanks!

) -> Result<Vec<Ipv6Addr>, DdmError> {
let ddm_admin_client = DdmAdminClient::new(self.log.clone())?;
let addrs = retry_notify(
// TODO-correctness `internal_service_policy()` has potentially-long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like exponential backoff for this either. The call is cheap enough, and local, so I'd just make it use a fixed timeout. But opinions vary :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note about this as #1270; I'll update the comment to point to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK if you want me to backport the internal_service_policy_with_max function; I could try to make it land faster if we need

Base automatically changed from maghemite-client to main June 23, 2022 21:20
@jgallagher jgallagher force-pushed the maghemite-bootstrap-address branch from b346df2 to 6740bff Compare June 24, 2022 14:55
@@ -1,169 +0,0 @@
// This Source Code Form is subject to the terms of the Mozilla Public
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - Love to see this gone

) -> Result<Vec<Ipv6Addr>, DdmError> {
let ddm_admin_client = DdmAdminClient::new(self.log.clone())?;
let addrs = retry_notify(
// TODO-correctness `internal_service_policy()` has potentially-long
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK if you want me to backport the internal_service_policy_with_max function; I could try to make it land faster if we need

@jgallagher
Copy link
Contributor Author

LMK if you want me to backport the internal_service_policy_with_max function; I could try to make it land faster if we need

(github won't let me reply directly to this, I think because of rebasing? 😞)

I don't think that's necessary; I haven't noticed this causing any actual problems as I've been testing, it was more of a note that the policy didn't seem right. I'm happy to update this after your work lands as part of addressing #1270 more generally.

@jgallagher jgallagher merged commit deab7db into main Jun 24, 2022
@jgallagher jgallagher deleted the maghemite-bootstrap-address branch June 24, 2022 16:39
leftwo pushed a commit that referenced this pull request Apr 19, 2024
Propolis changes:
Update h2 dependency
Add NPT ops API definitions from illumos#15639
server: return better HTTP errors when not ensured (#649)

Crucible changes:
Make Region test suite generic across backends (#1263)
Remove async from now-synchronous functions (#1264)
Agent update to support cloning. (#1262)
Remove the Active → Faulted transition (#1260)
Avoid race condition in crutest rand-read/write (#1261)
Add Active -> Offline -> Faulted tests (#1257)
Reorganize dummy downstairs tests (#1253)
Switch to unbounded queues (#1256)
Add Upstairs session ID to dtrace stat probe, cleanup closure (#1254)
Panic instead of returning errors in unit tests (#1251)
Add a clone option to downstairs create (#1249)
leftwo added a commit that referenced this pull request Apr 19, 2024
Propolis changes:
Update h2 dependency
Add NPT ops API definitions from illumos#15639
server: return better HTTP errors when not ensured (#649)

Crucible changes:
Make Region test suite generic across backends (#1263) Remove async from
now-synchronous functions (#1264) Agent update to support cloning.
(#1262)
Remove the Active → Faulted transition (#1260)
Avoid race condition in crutest rand-read/write (#1261) Add Active ->
Offline -> Faulted tests (#1257)
Reorganize dummy downstairs tests (#1253)
Switch to unbounded queues (#1256)
Add Upstairs session ID to dtrace stat probe, cleanup closure (#1254)
Panic instead of returning errors in unit tests (#1251) Add a clone
option to downstairs create (#1249)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants