-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supply real volume data to propolis #771
Conversation
This is in no way ready to go in, but opening this for early reviews |
migrate, | ||
cloud_init_bytes: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @iliana
Note you'll need oxidecomputer/typify#41 for this to compile |
Note I have to do a rebase on top of the "Remove 'sled-agent' API parameters from 'common'" PR, so expect a force push. |
.windows(2) | ||
.all(|w| w[0].1.block_size == w[1].1.block_size); | ||
|
||
if !all_region_have_same_block_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, I had some bad experience panicking during a saga. The recovery mechanisms will basically just retry this node indefinitely, not unwind the whole thing. I'd consider returning an error instead.
Rebasing again, will comment when it's done. |
c833176
to
35b5a38
Compare
Instance creation now can specify that existing disks be attached to the instance (disk creation during instance creation is not yet implemented). There's a few important changes here. Nexus will instruct the sled agent to send volume construction requests when it ensures the instance, and therefore has to build these requests. This is done in sdc_regions_ensure as regions are allocated. sic_create_instance_record no longer returns the runtime of an instance, but the instance's name. Importantly, the instance creation saga now calls Nexus' instance_set_runtime instead of duplicating the logic to call instance_put. Nexus' instance_set_runtime will populate the InstanceHardware with NICs and disks. Nexus' disk_set_runtime now *optionally* calls the sled agent's disk_put if the instance is running already, otherwise the disk will be attached as part of the instance ensure by specifying volume requests. request_body_max_bytes had to be increased to 1M because the volume requests could be arbitrarily large (and eventually, the cloud init bytes will add to the body size too). Note: spawning an instance with this *will not work* unless the propolis zone has a valid static IPv6 address. This PR has grown enough that I think that work should be separate.
Rebase #N completed, but oxidecomputer/crucible#243 is required to test. |
Note I've updated this PR's first comment with the text of the commit. |
This is ready for review now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass of comments, this looks awesome. So sick to see this coming together.
nexus/src/sagas.rs
Outdated
|
||
if attached { | ||
osagactx | ||
.nexus() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each step in the saga must be idempotent - namely, repeatedly calling ensure_instance_disk_attach_state
, either as part of the action or undo-action must:
- Result in the same effective state
- Return the same result
Is this currently the case?
(I think it is - especially since we're operating at exclusively a DB layer here - but wanted to check that assumption)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is true for the same reason but I'm not sure. At the very least, repeated calls to instance_attach_disk and instance_detach_disk is tested by some of the disk related integration tests.
What isn't tested is doing that through the sled agent for a running instance but that code path isn't hit here (?).
nexus/src/sagas.rs
Outdated
saga_params.organization_name.clone().into(); | ||
let project_name: db::model::Name = saga_params.project_name.clone().into(); | ||
|
||
for disk in saga_disks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not explicitly a request, more just out-loud thinking)
To clarify the path, from a high-level:
sic_attach_disks_to_instance
happens right before actually sending the "instance ensure" request- It calls
Nexus::instance_attach_disk
, which callsNexus::disk_set_runtime
- If the instance state is
InstanceState::Creating
and the requested disk state isDiskStateRequested::Attached
, we write the requested disk state ("attached") to the database, but don't send any requests to the sled agent (yet).
I wonder if we want to bifurcate disk_set_runtime
into two different functions, within Nexus. The fact that it "may or may not make a request to sled agent" feels a little unintuitive to me - I 100% understand why it's acting that way, and I think deferring the request until actually calling "instance ensure" makes sense, but understanding when it would / would not make a call to sled agent feels a little non-obvious.
WDYT? I think this is subjective, but just wanna make sure the flows are as straightforward as they can be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true (though the actual path traversed through the code has changed as part of 581bdee), with the idea that attaching disks to the instance only in the DB is preparing for the logic in instance_set_runtime that queries the disks attached to the instance.
We may want to remove the disk_put
path and do everything through instance_put
. That way we can funnel changes through instance_set_runtime
and let propolis handle the disk attach or remove as it needs to.
// TODO this will probably involve volume construction requests as | ||
// well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this remarking that the API to disk_set_runtime
is insufficient? I know most of this hot-plug path isn't implemented (which is 100% fine) but I just wanna make sure we're very explicit about what we expect to work / not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propolis needs to be handed the volume construction request, so sled agent's disk_put is currently insufficient, yeah, but what do you think about
We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change sounds good to me!
I had an incorrect mental model of how distributed sagas work and this caused problems - disks would not be detached when instance creation failed. I tried to correct the code in a slightly different way than was done for NICs. Instead of pre-allocating IDs and having a previous saga node delete based on these IDs, I chose to split the attach disks to instance node into 8, each with its own undo. This is now tested by trying to attach 8 disks, one of which is marked as faulted. The whole saga should unwind and the test checks for semantic equivalence. Added some tests for the new arbitrary 8 disk limit. This required bumping the request max body size because of the larger instance ensure bodies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only remaining comments are basically nits
// conditional logic depending on if that disk index is going to be used. | ||
// Steno does not currently support the saga node graph changing shape. | ||
for i in 0..8 { | ||
template_builder.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about calling append_parallel
here, and running these concurrently? Same for attach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't, unfortunately, unless I store the strings somewhere persistent. append_parallel wants tuples that look like (&str, &str, Arc<dyn steno::Action<SagaInstanceCreate>>)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that not be possible through:
template_builder.append_parallel(
(0..MAX_DISKS).map(|i| {
(
&format!("create_disks{}", i),
"CreateDisksForInstance",
AsyncFunc::new_action(....),
)
}).collect::<Vec<_>>()
);
(same for "AttachDisks" below)
If Rust gives you issues with lifetimes, then maybe:
let names = (0...MAX_DISKS).map(|i| format!("create_disks{}", i)).collect::<Vec<String>>();
template_builder.append_parallel(
(0..MAX_DISKS).map(|i| {
(
names[i],
"CreateDisksForInstance",
AsyncFunc::new_action(....),
)
}).collect::<Vec<_>>()
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust does give you issues with lifetimes for the first one, and for the second I'm pretty sure names
has to live longer than the template builder function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure names
doesn't need to last longer than the function - the signature doesn't imply that, and the implementation actually copies whatever string is provided by calling .to_string()
: https://github.com/oxidecomputer/steno/blob/578498bbb43f9061c636b938dff98e7ece0b0efc/src/saga_template.rs#L315-L359
:( Looks like this PR is hitting an issue due to zeroize being pinned by aes-gcm-siv:
the linked issue mentions that four days ago there was a prereleases with unpinned zeroize, hopefully that gets pushed soon. |
All TODOs are now tracked with items. |
Rolling Sled Agent's dependency on |
This didn't work for me? |
Instance creation now can specify that existing disks be attached to
the instance (disk creation during instance creation is not yet
implemented).
There's a few important changes here.
Nexus will instruct the sled agent to send volume construction requests
when it ensures the instance, and therefore has to build these requests.
This is done in sdc_regions_ensure as regions are allocated.
sic_create_instance_record no longer returns the runtime of an instance,
but the instance's name.
Importantly, the instance creation saga now calls Nexus'
instance_set_runtime instead of duplicating the logic to call
instance_put. Nexus' instance_set_runtime will populate the
InstanceHardware with NICs and disks.
Nexus' disk_set_runtime now optionally calls the sled agent's
disk_put if the instance is running already, otherwise the disk will be
attached as part of the instance ensure by specifying volume requests.
request_body_max_bytes had to be increased to 1M because the volume
requests could be arbitrarily large (and eventually, the cloud init
bytes will add to the body size too).
Note: spawning an instance with this will not work unless the propolis
zone has a valid static IPv6 address. This PR has grown enough that I
think that work should be separate.