supply real volume data to propolis #771

jmpesp · 2022-03-15T21:38:52Z

Instance creation now can specify that existing disks be attached to
the instance (disk creation during instance creation is not yet
implemented).

There's a few important changes here.

Nexus will instruct the sled agent to send volume construction requests
when it ensures the instance, and therefore has to build these requests.
This is done in sdc_regions_ensure as regions are allocated.

sic_create_instance_record no longer returns the runtime of an instance,
but the instance's name.

Importantly, the instance creation saga now calls Nexus'
instance_set_runtime instead of duplicating the logic to call
instance_put. Nexus' instance_set_runtime will populate the
InstanceHardware with NICs and disks.

Nexus' disk_set_runtime now optionally calls the sled agent's
disk_put if the instance is running already, otherwise the disk will be
attached as part of the instance ensure by specifying volume requests.

request_body_max_bytes had to be increased to 1M because the volume
requests could be arbitrarily large (and eventually, the cloud init
bytes will add to the body size too).

Note: spawning an instance with this will not work unless the propolis
zone has a valid static IPv6 address. This PR has grown enough that I
think that work should be separate.

jmpesp · 2022-03-15T21:39:14Z

This is in no way ready to go in, but opening this for early reviews

smklein · 2022-03-15T21:39:49Z

sled-agent/src/instance.rs

            migrate,
+            cloud_init_bytes: None,


FYI @iliana

jmpesp · 2022-03-15T21:42:55Z

Note you'll need oxidecomputer/typify#41 for this to compile

jmpesp · 2022-03-16T15:25:52Z

Note I have to do a rebase on top of the "Remove 'sled-agent' API parameters from 'common'" PR, so expect a force push.

jmpesp · 2022-03-16T21:18:54Z

@smklein fyi rebased

@bnaecker this commit impacts how nics are attached a bit, please have a look - I tested by booting the regular alpine ISO via the instance_create_demo and saw eth0, so I don't think I screwed up the refactor :)

bnaecker · 2022-03-16T21:33:06Z

nexus/src/sagas.rs

+        .windows(2)
+        .all(|w| w[0].1.block_size == w[1].1.block_size);
+
+    if !all_region_have_same_block_size {


Just FYI, I had some bad experience panicking during a saga. The recovery mechanisms will basically just retry this node indefinitely, not unwind the whole thing. I'd consider returning an error instead.

jmpesp · 2022-03-18T19:03:37Z

Rebasing again, will comment when it's done.

Instance creation now can specify that existing disks be attached to the instance (disk creation during instance creation is not yet implemented). There's a few important changes here. Nexus will instruct the sled agent to send volume construction requests when it ensures the instance, and therefore has to build these requests. This is done in sdc_regions_ensure as regions are allocated. sic_create_instance_record no longer returns the runtime of an instance, but the instance's name. Importantly, the instance creation saga now calls Nexus' instance_set_runtime instead of duplicating the logic to call instance_put. Nexus' instance_set_runtime will populate the InstanceHardware with NICs and disks. Nexus' disk_set_runtime now *optionally* calls the sled agent's disk_put if the instance is running already, otherwise the disk will be attached as part of the instance ensure by specifying volume requests. request_body_max_bytes had to be increased to 1M because the volume requests could be arbitrarily large (and eventually, the cloud init bytes will add to the body size too). Note: spawning an instance with this *will not work* unless the propolis zone has a valid static IPv6 address. This PR has grown enough that I think that work should be separate.

jmpesp · 2022-03-18T19:58:51Z

Rebase #N completed, but oxidecomputer/crucible#243 is required to test.

jmpesp · 2022-03-18T20:25:59Z

Note I've updated this PR's first comment with the text of the commit.

jmpesp · 2022-03-18T20:29:45Z

This is ready for review now

smklein

First pass of comments, this looks awesome. So sick to see this coming together.

common/src/api/external/mod.rs

nexus/src/db/model.rs

nexus/src/nexus.rs

nexus/tests/integration_tests/instances.rs

nexus/src/nexus.rs

nexus/src/sagas.rs

smklein · 2022-03-18T21:08:45Z

nexus/src/sagas.rs

+
+                if attached {
+                    osagactx
+                        .nexus()


Each step in the saga must be idempotent - namely, repeatedly calling ensure_instance_disk_attach_state, either as part of the action or undo-action must:

Result in the same effective state

Return the same result

Is this currently the case?

(I think it is - especially since we're operating at exclusively a DB layer here - but wanted to check that assumption)

I'm pretty sure this is true for the same reason but I'm not sure. At the very least, repeated calls to instance_attach_disk and instance_detach_disk is tested by some of the disk related integration tests.

What isn't tested is doing that through the sled agent for a running instance but that code path isn't hit here (?).

smklein · 2022-03-18T21:20:19Z

nexus/src/sagas.rs

+        saga_params.organization_name.clone().into();
+    let project_name: db::model::Name = saga_params.project_name.clone().into();
+
+    for disk in saga_disks {


(Not explicitly a request, more just out-loud thinking)

To clarify the path, from a high-level:

sic_attach_disks_to_instance happens right before actually sending the "instance ensure" request

It calls Nexus::instance_attach_disk, which calls Nexus::disk_set_runtime

If the instance state is InstanceState::Creating and the requested disk state is DiskStateRequested::Attached, we write the requested disk state ("attached") to the database, but don't send any requests to the sled agent (yet).

I wonder if we want to bifurcate disk_set_runtime into two different functions, within Nexus. The fact that it "may or may not make a request to sled agent" feels a little unintuitive to me - I 100% understand why it's acting that way, and I think deferring the request until actually calling "instance ensure" makes sense, but understanding when it would / would not make a call to sled agent feels a little non-obvious.

WDYT? I think this is subjective, but just wanna make sure the flows are as straightforward as they can be.

This is true (though the actual path traversed through the code has changed as part of 581bdee), with the idea that attaching disks to the instance only in the DB is preparing for the logic in instance_set_runtime that queries the disks attached to the instance.

We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.

nexus/src/nexus.rs

smklein · 2022-03-22T20:08:00Z

nexus/src/nexus.rs

+            // TODO this will probably involve volume construction requests as
+            // well!


Is this remarking that the API to disk_set_runtime is insufficient? I know most of this hot-plug path isn't implemented (which is 100% fine) but I just wanna make sure we're very explicit about what we expect to work / not work.

propolis needs to be handed the volume construction request, so sled agent's disk_put is currently insufficient, yeah, but what do you think about

We may want to remove the disk_put path and do everything through instance_put. That way we can funnel changes through instance_set_runtime and let propolis handle the disk attach or remove as it needs to.

?

That change sounds good to me!

nexus/src/sagas.rs

I had an incorrect mental model of how distributed sagas work and this caused problems - disks would not be detached when instance creation failed. I tried to correct the code in a slightly different way than was done for NICs. Instead of pre-allocating IDs and having a previous saga node delete based on these IDs, I chose to split the attach disks to instance node into 8, each with its own undo. This is now tested by trying to attach 8 disks, one of which is marked as faulted. The whole saga should unwind and the test checks for semantic equivalence. Added some tests for the new arbitrary 8 disk limit. This required bumping the request max body size because of the larger instance ensure bodies.

smklein

Looks good! Only remaining comments are basically nits

nexus/src/sagas.rs

smklein · 2022-03-24T20:44:06Z

nexus/src/sagas.rs

+    // conditional logic depending on if that disk index is going to be used.
+    // Steno does not currently support the saga node graph changing shape.
+    for i in 0..8 {
+        template_builder.append(


WDYT about calling append_parallel here, and running these concurrently? Same for attach?

I can't, unfortunately, unless I store the strings somewhere persistent. append_parallel wants tuples that look like (&str, &str, Arc<dyn steno::Action<SagaInstanceCreate>>).

Would that not be possible through:

template_builder.append_parallel( (0..MAX_DISKS).map(|i| { ( &format!("create_disks{}", i), "CreateDisksForInstance", AsyncFunc::new_action(....), ) }).collect::<Vec<_>>() );

(same for "AttachDisks" below)

If Rust gives you issues with lifetimes, then maybe:

let names = (0...MAX_DISKS).map(|i| format!("create_disks{}", i)).collect::<Vec<String>>(); template_builder.append_parallel( (0..MAX_DISKS).map(|i| { ( names[i], "CreateDisksForInstance", AsyncFunc::new_action(....), ) }).collect::<Vec<_>>() );

Rust does give you issues with lifetimes for the first one, and for the second I'm pretty sure names has to live longer than the template builder function.

I'm pretty sure names doesn't need to last longer than the function - the signature doesn't imply that, and the implementation actually copies whatever string is provided by calling .to_string() : https://github.com/oxidecomputer/steno/blob/578498bbb43f9061c636b938dff98e7ece0b0efc/src/saga_template.rs#L315-L359

nexus/src/sagas.rs

jmpesp · 2022-03-25T14:17:51Z

:( Looks like this PR is hitting an issue due to zeroize being pinned by aes-gcm-siv:

cargo build
    Updating git repository `https://github.com/oxidecomputer/crucible`
error: failed to select a version for `zeroize`.
    ... required by package `aes-gcm-siv v0.10.3`
    ... which satisfies dependency `aes-gcm-siv = "^0.10.3"` (locked to 0.10.3) of package `crucible v0.0.1 (https://github.com/oxidecomputer/crucible?rev=4090a023b77dcab7a5000057cf2c96cdbb0469b6#4090a023)`
    ... which satisfies git dependency `crucible` (locked to 0.0.1) of package `propolis-client v0.1.0 (https://github.com/oxidecomputer/propolis?rev=832a86afce308d3210685af987ace1ba74c2ecd6#832a86af)`
    ... which satisfies git dependency `propolis-client` (locked to 0.1.0) of package `omicron-sled-agent v0.1.0 (/home/jwm/src/oxidecomputer/omicron/sled-agent)`
    ... which satisfies path dependency `omicron-sled-agent` (locked to 0.1.0) of package `nexus-test-utils v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus/test-utils)`
    ... which satisfies path dependency `nexus-test-utils` (locked to 0.1.0) of package `omicron-nexus v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus)`
versions that meet the requirements `>=1, <1.4` are: 1.3.0, 1.2.0, 1.1.1, 1.1.0, 1.0.0

all possible versions conflict with previously selected packages.

  previously selected package `zeroize v1.4.3`
    ... which satisfies dependency `zeroize = ">=1, <1.5"` (locked to 1.4.3) of package `crypto-bigint v0.2.11`
    ... which satisfies dependency `crypto-bigint = "^0.2.4"` (locked to 0.2.11) of package `elliptic-curve v0.10.6`
    ... which satisfies dependency `elliptic-curve = "^0.10.4"` (locked to 0.10.6) of package `ecdsa v0.12.4`
    ... which satisfies dependency `ecdsa-core = "^0.12"` (locked to 0.12.4) of package `p256 v0.9.0`
    ... which satisfies dependency `p256 = "^0.9.0"` (locked to 0.9.0) of package `omicron-sled-agent v0.1.0 (/home/jwm/src/oxidecomputer/omicron/sled-agent)`
    ... which satisfies path dependency `omicron-sled-agent` (locked to 0.1.0) of package `nexus-test-utils v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus/test-utils)`
    ... which satisfies path dependency `nexus-test-utils` (locked to 0.1.0) of package `omicron-nexus v0.1.0 (/home/jwm/src/oxidecomputer/omicron/nexus)`

failed to select a version for `zeroize` which could resolve this conflict

the linked issue mentions that four days ago there was a prereleases with unpinned zeroize, hopefully that gets pushed soon.

jmpesp · 2022-03-25T14:25:09Z

All TODOs are now tracked with items.

smklein · 2022-03-25T14:56:26Z

Rolling Sled Agent's dependency on propolis-client to 3a4fd8fa54ce8e1117bfa259bea39bca87f8ea14 will also fix, with everyone using zeroize = 1.3.0

jmpesp · 2022-03-25T18:45:02Z

Rolling Sled Agent's dependency on propolis-client to 3a4fd8fa54ce8e1117bfa259bea39bca87f8ea14 will also fix, with everyone using zeroize = 1.3.0

This didn't work for me?

smklein reviewed Mar 15, 2022

View reviewed changes

sled-agent/src/instance.rs

migrate,

cloud_init_bytes: None,

Copy link

Collaborator

smklein Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @iliana

iliana reacted with eyes emoji

jmpesp force-pushed the volumes branch from 777ee64 to e766404 Compare March 16, 2022 21:16

bnaecker reviewed Mar 16, 2022

View reviewed changes

jmpesp force-pushed the volumes branch 2 times, most recently from c833176 to 35b5a38 Compare March 18, 2022 19:56

jmpesp force-pushed the volumes branch from 35b5a38 to 1727f5f Compare March 18, 2022 19:58

jmpesp marked this pull request as ready for review March 18, 2022 20:29

jmpesp changed the title ~~[DRAFT] supply real volume data to propolis~~ supply real volume data to propolis Mar 18, 2022

smklein reviewed Mar 18, 2022

View reviewed changes

jmpesp added 6 commits March 21, 2022 10:31

don't expose Generation as u64

8d38855

&str

04b12c8

comments

e6a7c8c

instance typo

a2ab203

hoist "If propolis zone" logic out of disk_set_runtime, eliminate todos

581bdee

mock up saga step for creating disks during instance create

e8981d5

jmpesp force-pushed the volumes branch from f5c0195 to e8981d5 Compare March 21, 2022 15:37

update crucible rev

ea23469

smklein approved these changes Mar 22, 2022

View reviewed changes

jmpesp added 3 commits March 23, 2022 10:13

use internal_error for block size mismatch

44a3f3f

remove some XXX

db55979

internal_error for serialization error

8c8324d

jmpesp added 2 commits March 24, 2022 13:30

no more magic constant, use MAX_DISKS_PER_INSTANCE

98cf0e6

smklein approved these changes Mar 24, 2022

View reviewed changes

jmpesp added 3 commits March 24, 2022 22:09

missed magic constant

859d4a3

return unsupported error

959d49d

Merge remote-tracking branch 'upstream/main' into volumes

94d3ca3

smklein mentioned this pull request Mar 25, 2022

Bump p256 from 0.9.0 to 0.10.1 #607

Closed

jmpesp added 2 commits March 25, 2022 16:05

update propolis-client so that builds work, also fix code after merge

003d9f0

Merge remote-tracking branch 'upstream/main' into volumes

4a415f1

jmpesp merged commit 3424aeb into oxidecomputer:main Mar 25, 2022

jmpesp deleted the volumes branch March 25, 2022 20:57

david-crespo mentioned this pull request Mar 25, 2022

[Instance create] Include attached disks in POST body oxidecomputer/console#732

Closed

iliana mentioned this pull request Apr 13, 2022

User data support #911

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supply real volume data to propolis #771

supply real volume data to propolis #771

jmpesp commented Mar 15, 2022 •

edited

Loading

jmpesp commented Mar 15, 2022

smklein Mar 15, 2022

jmpesp commented Mar 15, 2022

jmpesp commented Mar 16, 2022

jmpesp commented Mar 16, 2022

bnaecker Mar 16, 2022

jmpesp commented Mar 18, 2022

jmpesp commented Mar 18, 2022

jmpesp commented Mar 18, 2022

jmpesp commented Mar 18, 2022

smklein left a comment

smklein Mar 18, 2022

jmpesp Mar 21, 2022

smklein Mar 18, 2022

jmpesp Mar 21, 2022

smklein Mar 22, 2022

jmpesp Mar 23, 2022

smklein Mar 24, 2022

smklein left a comment

smklein Mar 24, 2022

jmpesp Mar 25, 2022

smklein Mar 25, 2022

jmpesp Mar 25, 2022

smklein Mar 25, 2022

jmpesp commented Mar 25, 2022

jmpesp commented Mar 25, 2022

smklein commented Mar 25, 2022

jmpesp commented Mar 25, 2022

		// TODO this will probably involve volume construction requests as
		// well!

supply real volume data to propolis #771

supply real volume data to propolis #771

Conversation

jmpesp commented Mar 15, 2022 • edited Loading

jmpesp commented Mar 15, 2022

Choose a reason for hiding this comment

jmpesp commented Mar 15, 2022

jmpesp commented Mar 16, 2022

jmpesp commented Mar 16, 2022

Choose a reason for hiding this comment

jmpesp commented Mar 18, 2022

jmpesp commented Mar 18, 2022

jmpesp commented Mar 18, 2022

jmpesp commented Mar 18, 2022

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmpesp commented Mar 25, 2022

jmpesp commented Mar 25, 2022

smklein commented Mar 25, 2022

jmpesp commented Mar 25, 2022

jmpesp commented Mar 15, 2022 •

edited

Loading