Don't create container scratch per base layer #2002

ambarve · 2024-01-18T17:48:53Z

For WCIFS based layers, a container scratch base VHD (and a differencing VHD) both are created per unique base layer. This is because we add reparse points for the base layer files inside the VHD. However, with UnionFS we don't add any reparse points and the VHD is empty, so we don't need to create a VHD per unique base layer. Now the CimFS snapshotter will handle container scratch VHD creation and the LayerWriter will only created a VHD for the UtilityVM. (The UtilityVM VHD still needs to be created per unique base layer since the BCD of that layer is configured to boot from the UtilityVM VHD and the BCD is unique per image)

ambarve · 2024-01-23T20:31:38Z

Corresponding containerd PR for this, which updates the CimFS snapshotter to create the scratch VHD is here.

kevpar

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

ambarve · 2024-01-25T00:09:22Z

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

Even in case of CimFS layers scratch VHDs are used by both UVM & the container just like legacy layers. In the case of legacy layers we create template scratch VHD for every base layer and populate these scratch VHDs (both container & UVM scratch) with WCIFS reparse points for the files within that layer. Then we make differencing VHDs from those template VHDs and copy that differencing VHD for every scratch snapshot. The point of using a differencing VHD is that it is smaller, so copying takes less time but we still get to keep all the reparse points that we added.

With CimFS, we don't use reparse points at all. So the scratch VHDs (both container & UVM) can be completely empty. This also means we don't need any differencing VHDs either.
So now for containers we can just have one fully empty scratch VHD and copy that for every scratch snapshot. Since this scratch VHD creation needs to be done only once (instead of doing for every base layer), I moved this code out of the layer writer and added it in the CimFS snapshotter.

We could do that same for UVM scratch too, but, the BCD file within the base layer needs to be configured with the partition GUIDs of the partitions in the UVM scratch. I didn't want to have one common blank VHD for the UVM and then use the partition GUIDs of that same VHD in the base layer of every base layer's BCD. That's why we still create a blank VHD for the UVM during layer import, format it and then configure the layer's BCD with that VHDs GUIDs.

Let me know if you have further questions.

kevpar · 2024-01-25T01:09:46Z

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

Even in case of CimFS layers scratch VHDs are used by both UVM & the container just like legacy layers. In the case of legacy layers we create template scratch VHD for every base layer and populate these scratch VHDs (both container & UVM scratch) with WCIFS reparse points for the files within that layer. Then we make differencing VHDs from those template VHDs and copy that differencing VHD for every scratch snapshot. The point of using a differencing VHD is that it is smaller, so copying takes less time but we still get to keep all the reparse points that we added.

With CimFS, we don't use reparse points at all. So the scratch VHDs (both container & UVM) can be completely empty. This also means we don't need any differencing VHDs either. So now for containers we can just have one fully empty scratch VHD and copy that for every scratch snapshot. Since this scratch VHD creation needs to be done only once (instead of doing for every base layer), I moved this code out of the layer writer and added it in the CimFS snapshotter.

We could do that same for UVM scratch too, but, the BCD file within the base layer needs to be configured with the partition GUIDs of the partitions in the UVM scratch. I didn't want to have one common blank VHD for the UVM and then use the partition GUIDs of that same VHD in the base layer of every base layer's BCD. That's why we still create a blank VHD for the UVM during layer import, format it and then configure the layer's BCD with that VHDs GUIDs.

Let me know if you have further questions.

I poked a bit and think I understand where most of my confusion was coming from. Your information is slightly wrong (and my memory here was very hazy :)), the blank-base.vhdx does not contain reparse points.

My confusion was this: WCIFS requires reparse points in the scratch pointing at the various files and directories in the read-only layers, but if these reparse points are only created for the base layers, then delta layers won't work properly since e.g. if C:\foo.txt is added in a layer, the base layer won't have a reparse point for it, so it won't show up in the container. So I tried mounting a blank-base.vhdx from my machine, and I see only these files on it:

    Directory: K:\

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d--hs          12/20/2023  9:54 PM                System Volume Information
d--hs          12/20/2023  9:54 PM                WcSandboxState

tl;dr there are no reparse points in the blank-base.vhdx. If memory serves, the reparse points are actually created as part of the PrepareLayer call, right before the WCIFS filter is attached.

Now that I understand this better, it makes sense to me why we would remove blank-base.vhdx and blank.vhdx from the base layers. We don't need these files at all for CimFS layers. (aside: It's possible we don't need these to be per-base-layer for WCIFS either, not sure if what's in WcSandboxState may vary?).

That said, I'm still not clear why we're still creating SystemTemplateBase.vhdx, but not SystemTemplate.vhdx. Are we copying the full SystemTemplateBase.vhdx for each CimFS UVM that boots, or else what serves as the UVM's scratch disk?

ambarve · 2024-01-25T17:58:08Z

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

Even in case of CimFS layers scratch VHDs are used by both UVM & the container just like legacy layers. In the case of legacy layers we create template scratch VHD for every base layer and populate these scratch VHDs (both container & UVM scratch) with WCIFS reparse points for the files within that layer. Then we make differencing VHDs from those template VHDs and copy that differencing VHD for every scratch snapshot. The point of using a differencing VHD is that it is smaller, so copying takes less time but we still get to keep all the reparse points that we added.
With CimFS, we don't use reparse points at all. So the scratch VHDs (both container & UVM) can be completely empty. This also means we don't need any differencing VHDs either. So now for containers we can just have one fully empty scratch VHD and copy that for every scratch snapshot. Since this scratch VHD creation needs to be done only once (instead of doing for every base layer), I moved this code out of the layer writer and added it in the CimFS snapshotter.
We could do that same for UVM scratch too, but, the BCD file within the base layer needs to be configured with the partition GUIDs of the partitions in the UVM scratch. I didn't want to have one common blank VHD for the UVM and then use the partition GUIDs of that same VHD in the base layer of every base layer's BCD. That's why we still create a blank VHD for the UVM during layer import, format it and then configure the layer's BCD with that VHDs GUIDs.
Let me know if you have further questions.

I poked a bit and think I understand where most of my confusion was coming from. Your information is slightly wrong (and my memory here was very hazy :)), the blank-base.vhdx does not contain reparse points.

My confusion was this: WCIFS requires reparse points in the scratch pointing at the various files and directories in the read-only layers, but if these reparse points are only created for the base layers, then delta layers won't work properly since e.g. if C:\foo.txt is added in a layer, the base layer won't have a reparse point for it, so it won't show up in the container. So I tried mounting a blank-base.vhdx from my machine, and I see only these files on it:
    Directory: K:\

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d--hs          12/20/2023  9:54 PM                System Volume Information
d--hs          12/20/2023  9:54 PM                WcSandboxState
tl;dr there are no reparse points in the blank-base.vhdx. If memory serves, the reparse points are actually created as part of the PrepareLayer call, right before the WCIFS filter is attached.

Now that I understand this better, it makes sense to me why we would remove blank-base.vhdx and blank.vhdx from the base layers. We don't need these files at all for CimFS layers. (aside: It's possible we don't need these to be per-base-layer for WCIFS either, not sure if what's in WcSandboxState may vary?).

That said, I'm still not clear why we're still creating SystemTemplateBase.vhdx, but not SystemTemplate.vhdx. Are we copying the full SystemTemplateBase.vhdx for each CimFS UVM that boots, or else what serves as the UVM's scratch disk?

Yeah, I too was misremembering a bit. We create reparse points only for the UVM scratch and not the container scratch. Now that we don't need these reparse points for UVM scratch we can only have the base scratch VHD for the UVM and copy that for every UVM. The CreateWCOW method copies the SystemTemplate.vhdx as the UVM scratch so we should name the UVM scratch as SystemTemplate.vhdx instead of SystemTemplateBase.vhdx (fixed now). None of the UVM related changes could be tested because we don't yet have support for running hyper-v isoalted containers with CimFS on containerd, that's why I didn't catch the naming issue earlier. Thanks for catching that.

kevpar · 2024-01-26T18:49:05Z

So, is the implicit decision here that the UVM base VHD for CimFS is small enough compared to the diff VHD, that it's not worth using a diff VHD anymore? Do you have numbers on those sizes?

ambarve · 2024-02-01T17:17:49Z

So, is the implicit decision here that the UVM base VHD for CimFS is small enough compared to the diff VHD, that it's not worth using a diff VHD anymore? Do you have numbers on those sizes?

I looked into this and the base VHD size is around 27MB while diff VHD size is around 4MB. Copying the diff VHD takes roughly 3 ms whereas copying the base VHD takes 25ms. So, we do get a small perf improvement by having the differencing disks. Not having a differencing VHD makes cimfs code a little less complicated, but I guess it's okay if we are getting some perf benefits. I will update the PR to keep the differencing VHD.

kevpar · 2024-02-08T19:05:58Z

internal/wclayer/cim/process.go

-	// func createUtilityVMLayerVHDs(ctx context.Context, layerPath string) error {
-	baseVhdPath := filepath.Join(layerPath, wclayer.UtilityVMPath, wclayer.UtilityVMBaseVhd)
-	diffVhdPath := filepath.Join(layerPath, wclayer.UtilityVMPath, wclayer.UtilityVMScratchVhd)
+	baseVhdPath := filepath.Join(layerPath, wclayer.UtilityVMPath, wclayer.UtilityVMScratchVhd)


Based on previous discussion I thought you were going to change this back to creating a diff VHD?

Yes, should be fixed now.
Diff VHD creation for container is done in the snapshotter here: https://github.com/containerd/containerd/pull/9659/files#diff-082dce0a0ee622ef2cc238f2e296ddc3a7a9c6c4af0be76295cc571352cd27a6

For WCIFS based layers, a container scratch base VHD (and a differencing VHD) both are created per unique base layer. However, with UnionFS we don't add any reparse points and the VHD is empty, so we don't need to create a VHD per unique base layer. Now the CimFS snapshotter will handle container scratch VHD creation and the LayerWriter will only create the VHD for the UtilityVM. (The UtilityVM VHD still needs to be created per unique base layer since the BCD of that layer is configured to boot from the UtilityVM VHD and the BCD is unique per image) Signed-off-by: Amit Barve <ambarve@microsoft.com>

kevpar

LGTM

For WCIFS based layers, a container scratch base VHD (and a differencing VHD) both are created per unique base layer. However, with UnionFS we don't add any reparse points and the VHD is empty, so we don't need to create a VHD per unique base layer. Now the CimFS snapshotter will handle container scratch VHD creation and the LayerWriter will only create the VHD for the UtilityVM. (The UtilityVM VHD still needs to be created per unique base layer since the BCD of that layer is configured to boot from the UtilityVM VHD and the BCD is unique per image) Signed-off-by: Amit Barve <ambarve@microsoft.com>

ambarve requested a review from a team as a code owner January 18, 2024 17:48

ambarve assigned kevpar and katiewasnothere Jan 23, 2024

msscotb assigned apurv15 and unassigned katiewasnothere Jan 24, 2024

kevpar reviewed Jan 24, 2024

View reviewed changes

ambarve force-pushed the cimfs_single_vhd branch from ed3de0e to c088cf4 Compare January 25, 2024 17:56

ambarve mentioned this pull request Feb 7, 2024

Don't create new scratch VHD per image for CimFS containerd/containerd#9659

Merged

kevpar reviewed Feb 8, 2024

View reviewed changes

ambarve force-pushed the cimfs_single_vhd branch from c088cf4 to e936e98 Compare February 13, 2024 23:37

kevpar approved these changes Feb 14, 2024

View reviewed changes

apurv15 approved these changes Feb 15, 2024

View reviewed changes

ambarve merged commit c767380 into microsoft:main Feb 15, 2024
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't create container scratch per base layer #2002

Don't create container scratch per base layer #2002

ambarve commented Jan 18, 2024

ambarve commented Jan 23, 2024

kevpar left a comment

ambarve commented Jan 25, 2024

kevpar commented Jan 25, 2024 •

edited

Loading

ambarve commented Jan 25, 2024

kevpar commented Jan 26, 2024

ambarve commented Feb 1, 2024

kevpar Feb 8, 2024

ambarve Feb 14, 2024

kevpar left a comment

Don't create container scratch per base layer #2002

Don't create container scratch per base layer #2002

Conversation

ambarve commented Jan 18, 2024

ambarve commented Jan 23, 2024

kevpar left a comment

Choose a reason for hiding this comment

ambarve commented Jan 25, 2024

kevpar commented Jan 25, 2024 • edited Loading

ambarve commented Jan 25, 2024

kevpar commented Jan 26, 2024

ambarve commented Feb 1, 2024

kevpar Feb 8, 2024

Choose a reason for hiding this comment

ambarve Feb 14, 2024

Choose a reason for hiding this comment

kevpar left a comment

Choose a reason for hiding this comment

kevpar commented Jan 25, 2024 •

edited

Loading