Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supply real volume data to propolis #771

Merged
merged 18 commits into from
Mar 25, 2022
Merged

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Mar 15, 2022

Instance creation now can specify that existing disks be attached to
the instance (disk creation during instance creation is not yet
implemented).

There's a few important changes here.

Nexus will instruct the sled agent to send volume construction requests
when it ensures the instance, and therefore has to build these requests.
This is done in sdc_regions_ensure as regions are allocated.

sic_create_instance_record no longer returns the runtime of an instance,
but the instance's name.

Importantly, the instance creation saga now calls Nexus'
instance_set_runtime instead of duplicating the logic to call
instance_put. Nexus' instance_set_runtime will populate the
InstanceHardware with NICs and disks.

Nexus' disk_set_runtime now optionally calls the sled agent's
disk_put if the instance is running already, otherwise the disk will be
attached as part of the instance ensure by specifying volume requests.

request_body_max_bytes had to be increased to 1M because the volume
requests could be arbitrarily large (and eventually, the cloud init
bytes will add to the body size too).

Note: spawning an instance with this will not work unless the propolis
zone has a valid static IPv6 address. This PR has grown enough that I
think that work should be separate.

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 15, 2022

This is in no way ready to go in, but opening this for early reviews

migrate,
cloud_init_bytes: None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @iliana

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 15, 2022

Note you'll need oxidecomputer/typify#41 for this to compile

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 16, 2022

Note I have to do a rebase on top of the "Remove 'sled-agent' API parameters from 'common'" PR, so expect a force push.

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 16, 2022

@smklein fyi rebased

@bnaecker this commit impacts how nics are attached a bit, please have a look - I tested by booting the regular alpine ISO via the instance_create_demo and saw eth0, so I don't think I screwed up the refactor :)

.windows(2)
.all(|w| w[0].1.block_size == w[1].1.block_size);

if !all_region_have_same_block_size {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I had some bad experience panicking during a saga. The recovery mechanisms will basically just retry this node indefinitely, not unwind the whole thing. I'd consider returning an error instead.

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 18, 2022

Rebasing again, will comment when it's done.

@jmpesp jmpesp force-pushed the volumes branch 2 times, most recently from c833176 to 35b5a38 Compare March 18, 2022 19:56
Instance creation now can specify that existing disks be attached to
the instance (disk creation during instance creation is not yet
implemented).

There's a few important changes here.

Nexus will instruct the sled agent to send volume construction requests
when it ensures the instance, and therefore has to build these requests.
This is done in sdc_regions_ensure as regions are allocated.

sic_create_instance_record no longer returns the runtime of an instance,
but the instance's name.

Importantly, the instance creation saga now calls Nexus'
instance_set_runtime instead of duplicating the logic to call
instance_put. Nexus' instance_set_runtime will populate the
InstanceHardware with NICs and disks.

Nexus' disk_set_runtime now *optionally* calls the sled agent's
disk_put if the instance is running already, otherwise the disk will be
attached as part of the instance ensure by specifying volume requests.

request_body_max_bytes had to be increased to 1M because the volume
requests could be arbitrarily large (and eventually, the cloud init
bytes will add to the body size too).

Note: spawning an instance with this *will not work* unless the propolis
zone has a valid static IPv6 address. This PR has grown enough that I
think that work should be separate.
@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 18, 2022

Rebase #N completed, but oxidecomputer/crucible#243 is required to test.

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 18, 2022

Note I've updated this PR's first comment with the text of the commit.

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 18, 2022

This is ready for review now

@jmpesp jmpesp marked this pull request as ready for review March 18, 2022 20:29
@jmpesp jmpesp changed the title [DRAFT] supply real volume data to propolis supply real volume data to propolis Mar 18, 2022
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments, this looks awesome. So sick to see this coming together.

common/src/api/external/mod.rs Outdated Show resolved Hide resolved
nexus/src/db/model.rs Outdated Show resolved Hide resolved
nexus/src/nexus.rs Outdated Show resolved Hide resolved
nexus/tests/integration_tests/instances.rs Show resolved Hide resolved
nexus/src/nexus.rs Outdated Show resolved Hide resolved
nexus/src/nexus.rs Outdated Show resolved Hide resolved
nexus/src/nexus.rs Outdated Show resolved Hide resolved
nexus/src/sagas.rs Outdated Show resolved Hide resolved

if attached {
osagactx
.nexus()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each step in the saga must be idempotent - namely, repeatedly calling ensure_instance_disk_attach_state, either as part of the action or undo-action must:

  • Result in the same effective state
  • Return the same result

Is this currently the case?

(I think it is - especially since we're operating at exclusively a DB layer here - but wanted to check that assumption)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is true for the same reason but I'm not sure. At the very least, repeated calls to instance_attach_disk and instance_detach_disk is tested by some of the disk related integration tests.

What isn't tested is doing that through the sled agent for a running instance but that code path isn't hit here (?).

saga_params.organization_name.clone().into();
let project_name: db::model::Name = saga_params.project_name.clone().into();

for disk in saga_disks {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not explicitly a request, more just out-loud thinking)

To clarify the path, from a high-level:

  • sic_attach_disks_to_instance happens right before actually sending the "instance ensure" request
  • It calls Nexus::instance_attach_disk, which calls Nexus::disk_set_runtime
  • If the instance state is InstanceState::Creating and the requested disk state is DiskStateRequested::Attached, we write the requested disk state ("attached") to the database, but don't send any requests to the sled agent (yet).

I wonder if we want to bifurcate disk_set_runtime into two different functions, within Nexus. The fact that it "may or may not make a request to sled agent" feels a little unintuitive to me - I 100% understand why it's acting that way, and I think deferring the request until actually calling "instance ensure" makes sense, but understanding when it would / would not make a call to sled agent feels a little non-obvious.

WDYT? I think this is subjective, but just wanna make sure the flows are as straightforward as they can be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true (though the actual path traversed through the code has changed as part of 581bdee), with the idea that attaching disks to the instance only in the DB is preparing for the logic in instance_set_runtime that queries the disks attached to the instance.

We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.

nexus/src/nexus.rs Outdated Show resolved Hide resolved
Comment on lines +1435 to +1436
// TODO this will probably involve volume construction requests as
// well!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this remarking that the API to disk_set_runtime is insufficient? I know most of this hot-plug path isn't implemented (which is 100% fine) but I just wanna make sure we're very explicit about what we expect to work / not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propolis needs to be handed the volume construction request, so sled agent's disk_put is currently insufficient, yeah, but what do you think about

We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change sounds good to me!

nexus/src/sagas.rs Outdated Show resolved Hide resolved
nexus/src/sagas.rs Outdated Show resolved Hide resolved
nexus/src/sagas.rs Outdated Show resolved Hide resolved
jmpesp added 2 commits March 24, 2022 13:30
I had an incorrect mental model of how distributed sagas work and this
caused problems - disks would not be detached when instance creation
failed.

I tried to correct the code in a slightly different way than was done
for NICs. Instead of pre-allocating IDs and having a previous saga node
delete based on these IDs, I chose to split the attach disks to instance
node into 8, each with its own undo.

This is now tested by trying to attach 8 disks, one of which is marked
as faulted. The whole saga should unwind and the test checks for
semantic equivalence.

Added some tests for the new arbitrary 8 disk limit. This required
bumping the request max body size because of the larger instance ensure
bodies.
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only remaining comments are basically nits

nexus/src/sagas.rs Outdated Show resolved Hide resolved
// conditional logic depending on if that disk index is going to be used.
// Steno does not currently support the saga node graph changing shape.
for i in 0..8 {
template_builder.append(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about calling append_parallel here, and running these concurrently? Same for attach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't, unfortunately, unless I store the strings somewhere persistent. append_parallel wants tuples that look like (&str, &str, Arc<dyn steno::Action<SagaInstanceCreate>>).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not be possible through:

template_builder.append_parallel(
  (0..MAX_DISKS).map(|i| {
    (
      &format!("create_disks{}", i),
      "CreateDisksForInstance",
      AsyncFunc::new_action(....),
    )
  }).collect::<Vec<_>>()
);

(same for "AttachDisks" below)

If Rust gives you issues with lifetimes, then maybe:

let names = (0...MAX_DISKS).map(|i| format!("create_disks{}", i)).collect::<Vec<String>>();
template_builder.append_parallel(
  (0..MAX_DISKS).map(|i| {
    (
      names[i],
      "CreateDisksForInstance",
      AsyncFunc::new_action(....),
    )
  }).collect::<Vec<_>>()
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust does give you issues with lifetimes for the first one, and for the second I'm pretty sure names has to live longer than the template builder function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure names doesn't need to last longer than the function - the signature doesn't imply that, and the implementation actually copies whatever string is provided by calling .to_string() : https://github.com/oxidecomputer/steno/blob/578498bbb43f9061c636b938dff98e7ece0b0efc/src/saga_template.rs#L315-L359

nexus/src/sagas.rs Show resolved Hide resolved
@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 25, 2022

:( Looks like this PR is hitting an issue due to zeroize being pinned by aes-gcm-siv:

cargo build
    Updating git repository `https://github.com/oxidecomputer/crucible`
error: failed to select a version for `zeroize`.
    ... required by package `aes-gcm-siv v0.10.3`
    ... which satisfies dependency `aes-gcm-siv = "^0.10.3"` (locked to 0.10.3) of package `crucible v0.0.1 (https://github.com/oxidecomputer/crucible?rev=4090a023b77dcab7a5000057cf2c96cdbb0469b6#4090a023)`
    ... which satisfies git dependency `crucible` (locked to 0.0.1) of package `propolis-client v0.1.0 (https://github.com/oxidecomputer/propolis?rev=832a86afce308d3210685af987ace1ba74c2ecd6#832a86af)`
    ... which satisfies git dependency `propolis-client` (locked to 0.1.0) of package `omicron-sled-agent v0.1.0 (/home/jwm/src/oxidecomputer/omicron/sled-agent)`
    ... which satisfies path dependency `omicron-sled-agent` (locked to 0.1.0) of package `nexus-test-utils v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus/test-utils)`
    ... which satisfies path dependency `nexus-test-utils` (locked to 0.1.0) of package `omicron-nexus v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus)`
versions that meet the requirements `>=1, <1.4` are: 1.3.0, 1.2.0, 1.1.1, 1.1.0, 1.0.0

all possible versions conflict with previously selected packages.

  previously selected package `zeroize v1.4.3`
    ... which satisfies dependency `zeroize = ">=1, <1.5"` (locked to 1.4.3) of package `crypto-bigint v0.2.11`
    ... which satisfies dependency `crypto-bigint = "^0.2.4"` (locked to 0.2.11) of package `elliptic-curve v0.10.6`
    ... which satisfies dependency `elliptic-curve = "^0.10.4"` (locked to 0.10.6) of package `ecdsa v0.12.4`
    ... which satisfies dependency `ecdsa-core = "^0.12"` (locked to 0.12.4) of package `p256 v0.9.0`
    ... which satisfies dependency `p256 = "^0.9.0"` (locked to 0.9.0) of package `omicron-sled-agent v0.1.0 (/home/jwm/src/oxidecomputer/omicron/sled-agent)`
    ... which satisfies path dependency `omicron-sled-agent` (locked to 0.1.0) of package `nexus-test-utils v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus/test-utils)`
    ... which satisfies path dependency `nexus-test-utils` (locked to 0.1.0) of package `omicron-nexus v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus)`

failed to select a version for `zeroize` which could resolve this conflict

the linked issue mentions that four days ago there was a prereleases with unpinned zeroize, hopefully that gets pushed soon.

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 25, 2022

All TODOs are now tracked with items.

@smklein
Copy link
Collaborator

smklein commented Mar 25, 2022

Rolling Sled Agent's dependency on propolis-client to 3a4fd8fa54ce8e1117bfa259bea39bca87f8ea14 will also fix, with everyone using zeroize = 1.3.0

@jmpesp
Copy link
Contributor Author

jmpesp commented Mar 25, 2022

Rolling Sled Agent's dependency on propolis-client to 3a4fd8fa54ce8e1117bfa259bea39bca87f8ea14 will also fix, with everyone using zeroize = 1.3.0

This didn't work for me?

@jmpesp jmpesp merged commit 3424aeb into oxidecomputer:main Mar 25, 2022
@jmpesp jmpesp deleted the volumes branch March 25, 2022 20:57
@iliana iliana mentioned this pull request Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants