Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create container scratch per base layer #2002

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

ambarve
Copy link
Contributor

@ambarve ambarve commented Jan 18, 2024

For WCIFS based layers, a container scratch base VHD (and a differencing VHD) both are created per unique base layer. This is because we add reparse points for the base layer files inside the VHD. However, with UnionFS we don't add any reparse points and the VHD is empty, so we don't need to create a VHD per unique base layer. Now the CimFS snapshotter will handle container scratch VHD creation and the LayerWriter will only created a VHD for the UtilityVM. (The UtilityVM VHD still needs to be created per unique base layer since the BCD of that layer is configured to boot from the UtilityVM VHD and the BCD is unique per image)

@ambarve ambarve requested a review from a team as a code owner January 18, 2024 17:48
@ambarve
Copy link
Contributor Author

ambarve commented Jan 23, 2024

Corresponding containerd PR for this, which updates the CimFS snapshotter to create the scratch VHD is here.

Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

@ambarve
Copy link
Contributor Author

ambarve commented Jan 25, 2024

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

Even in case of CimFS layers scratch VHDs are used by both UVM & the container just like legacy layers. In the case of legacy layers we create template scratch VHD for every base layer and populate these scratch VHDs (both container & UVM scratch) with WCIFS reparse points for the files within that layer. Then we make differencing VHDs from those template VHDs and copy that differencing VHD for every scratch snapshot. The point of using a differencing VHD is that it is smaller, so copying takes less time but we still get to keep all the reparse points that we added.

With CimFS, we don't use reparse points at all. So the scratch VHDs (both container & UVM) can be completely empty. This also means we don't need any differencing VHDs either.
So now for containers we can just have one fully empty scratch VHD and copy that for every scratch snapshot. Since this scratch VHD creation needs to be done only once (instead of doing for every base layer), I moved this code out of the layer writer and added it in the CimFS snapshotter.

We could do that same for UVM scratch too, but, the BCD file within the base layer needs to be configured with the partition GUIDs of the partitions in the UVM scratch. I didn't want to have one common blank VHD for the UVM and then use the partition GUIDs of that same VHD in the base layer of every base layer's BCD. That's why we still create a blank VHD for the UVM during layer import, format it and then configure the layer's BCD with that VHDs GUIDs.

Let me know if you have further questions.

@kevpar
Copy link
Member

kevpar commented Jan 25, 2024

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

Even in case of CimFS layers scratch VHDs are used by both UVM & the container just like legacy layers. In the case of legacy layers we create template scratch VHD for every base layer and populate these scratch VHDs (both container & UVM scratch) with WCIFS reparse points for the files within that layer. Then we make differencing VHDs from those template VHDs and copy that differencing VHD for every scratch snapshot. The point of using a differencing VHD is that it is smaller, so copying takes less time but we still get to keep all the reparse points that we added.

With CimFS, we don't use reparse points at all. So the scratch VHDs (both container & UVM) can be completely empty. This also means we don't need any differencing VHDs either. So now for containers we can just have one fully empty scratch VHD and copy that for every scratch snapshot. Since this scratch VHD creation needs to be done only once (instead of doing for every base layer), I moved this code out of the layer writer and added it in the CimFS snapshotter.

We could do that same for UVM scratch too, but, the BCD file within the base layer needs to be configured with the partition GUIDs of the partitions in the UVM scratch. I didn't want to have one common blank VHD for the UVM and then use the partition GUIDs of that same VHD in the base layer of every base layer's BCD. That's why we still create a blank VHD for the UVM during layer import, format it and then configure the layer's BCD with that VHDs GUIDs.

Let me know if you have further questions.

I poked a bit and think I understand where most of my confusion was coming from. Your information is slightly wrong (and my memory here was very hazy :)), the blank-base.vhdx does not contain reparse points.

My confusion was this: WCIFS requires reparse points in the scratch pointing at the various files and directories in the read-only layers, but if these reparse points are only created for the base layers, then delta layers won't work properly since e.g. if C:\foo.txt is added in a layer, the base layer won't have a reparse point for it, so it won't show up in the container. So I tried mounting a blank-base.vhdx from my machine, and I see only these files on it:

    Directory: K:\

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d--hs          12/20/2023  9:54 PM                System Volume Information
d--hs          12/20/2023  9:54 PM                WcSandboxState

tl;dr there are no reparse points in the blank-base.vhdx. If memory serves, the reparse points are actually created as part of the PrepareLayer call, right before the WCIFS filter is attached.

Now that I understand this better, it makes sense to me why we would remove blank-base.vhdx and blank.vhdx from the base layers. We don't need these files at all for CimFS layers. (aside: It's possible we don't need these to be per-base-layer for WCIFS either, not sure if what's in WcSandboxState may vary?).

That said, I'm still not clear why we're still creating SystemTemplateBase.vhdx, but not SystemTemplate.vhdx. Are we copying the full SystemTemplateBase.vhdx for each CimFS UVM that boots, or else what serves as the UVM's scratch disk?

@ambarve
Copy link
Contributor Author

ambarve commented Jan 25, 2024

Can you explain the cases where VHDs are used with CimFS? I see you've removed the UVM diff VHD creation as well, but confused on the reasoning for that.

Even in case of CimFS layers scratch VHDs are used by both UVM & the container just like legacy layers. In the case of legacy layers we create template scratch VHD for every base layer and populate these scratch VHDs (both container & UVM scratch) with WCIFS reparse points for the files within that layer. Then we make differencing VHDs from those template VHDs and copy that differencing VHD for every scratch snapshot. The point of using a differencing VHD is that it is smaller, so copying takes less time but we still get to keep all the reparse points that we added.
With CimFS, we don't use reparse points at all. So the scratch VHDs (both container & UVM) can be completely empty. This also means we don't need any differencing VHDs either. So now for containers we can just have one fully empty scratch VHD and copy that for every scratch snapshot. Since this scratch VHD creation needs to be done only once (instead of doing for every base layer), I moved this code out of the layer writer and added it in the CimFS snapshotter.
We could do that same for UVM scratch too, but, the BCD file within the base layer needs to be configured with the partition GUIDs of the partitions in the UVM scratch. I didn't want to have one common blank VHD for the UVM and then use the partition GUIDs of that same VHD in the base layer of every base layer's BCD. That's why we still create a blank VHD for the UVM during layer import, format it and then configure the layer's BCD with that VHDs GUIDs.
Let me know if you have further questions.

I poked a bit and think I understand where most of my confusion was coming from. Your information is slightly wrong (and my memory here was very hazy :)), the blank-base.vhdx does not contain reparse points.

My confusion was this: WCIFS requires reparse points in the scratch pointing at the various files and directories in the read-only layers, but if these reparse points are only created for the base layers, then delta layers won't work properly since e.g. if C:\foo.txt is added in a layer, the base layer won't have a reparse point for it, so it won't show up in the container. So I tried mounting a blank-base.vhdx from my machine, and I see only these files on it:

    Directory: K:\

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d--hs          12/20/2023  9:54 PM                System Volume Information
d--hs          12/20/2023  9:54 PM                WcSandboxState

tl;dr there are no reparse points in the blank-base.vhdx. If memory serves, the reparse points are actually created as part of the PrepareLayer call, right before the WCIFS filter is attached.

Now that I understand this better, it makes sense to me why we would remove blank-base.vhdx and blank.vhdx from the base layers. We don't need these files at all for CimFS layers. (aside: It's possible we don't need these to be per-base-layer for WCIFS either, not sure if what's in WcSandboxState may vary?).

That said, I'm still not clear why we're still creating SystemTemplateBase.vhdx, but not SystemTemplate.vhdx. Are we copying the full SystemTemplateBase.vhdx for each CimFS UVM that boots, or else what serves as the UVM's scratch disk?

Yeah, I too was misremembering a bit. We create reparse points only for the UVM scratch and not the container scratch. Now that we don't need these reparse points for UVM scratch we can only have the base scratch VHD for the UVM and copy that for every UVM. The CreateWCOW method copies the SystemTemplate.vhdx as the UVM scratch so we should name the UVM scratch as SystemTemplate.vhdx instead of SystemTemplateBase.vhdx (fixed now). None of the UVM related changes could be tested because we don't yet have support for running hyper-v isoalted containers with CimFS on containerd, that's why I didn't catch the naming issue earlier. Thanks for catching that.

@kevpar
Copy link
Member

kevpar commented Jan 26, 2024

So, is the implicit decision here that the UVM base VHD for CimFS is small enough compared to the diff VHD, that it's not worth using a diff VHD anymore? Do you have numbers on those sizes?

@ambarve
Copy link
Contributor Author

ambarve commented Feb 1, 2024

So, is the implicit decision here that the UVM base VHD for CimFS is small enough compared to the diff VHD, that it's not worth using a diff VHD anymore? Do you have numbers on those sizes?

I looked into this and the base VHD size is around 27MB while diff VHD size is around 4MB. Copying the diff VHD takes roughly 3 ms whereas copying the base VHD takes 25ms. So, we do get a small perf improvement by having the differencing disks. Not having a differencing VHD makes cimfs code a little less complicated, but I guess it's okay if we are getting some perf benefits. I will update the PR to keep the differencing VHD.

Comment on lines 90 to 30
// func createUtilityVMLayerVHDs(ctx context.Context, layerPath string) error {
baseVhdPath := filepath.Join(layerPath, wclayer.UtilityVMPath, wclayer.UtilityVMBaseVhd)
diffVhdPath := filepath.Join(layerPath, wclayer.UtilityVMPath, wclayer.UtilityVMScratchVhd)
baseVhdPath := filepath.Join(layerPath, wclayer.UtilityVMPath, wclayer.UtilityVMScratchVhd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on previous discussion I thought you were going to change this back to creating a diff VHD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be fixed now.
Diff VHD creation for container is done in the snapshotter here: https://github.com/containerd/containerd/pull/9659/files#diff-082dce0a0ee622ef2cc238f2e296ddc3a7a9c6c4af0be76295cc571352cd27a6

For WCIFS based layers, a container scratch base VHD (and a differencing VHD) both are created per unique base
layer. However, with UnionFS we
don't add any reparse points and the VHD is empty, so we don't need to create a VHD per unique base layer. Now
the CimFS snapshotter will handle container scratch VHD creation and the LayerWriter will only create the VHD
for the UtilityVM. (The UtilityVM VHD still needs to be created per unique base layer since the BCD of that
layer is configured to boot from the UtilityVM VHD and the BCD is unique per image)

Signed-off-by: Amit Barve <ambarve@microsoft.com>
Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ambarve ambarve merged commit c767380 into microsoft:main Feb 15, 2024
18 of 19 checks passed
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
For WCIFS based layers, a container scratch base VHD (and a differencing VHD) both are created per unique base
layer. However, with UnionFS we
don't add any reparse points and the VHD is empty, so we don't need to create a VHD per unique base layer. Now
the CimFS snapshotter will handle container scratch VHD creation and the LayerWriter will only create the VHD
for the UtilityVM. (The UtilityVM VHD still needs to be created per unique base layer since the BCD of that
layer is configured to boot from the UtilityVM VHD and the BCD is unique per image)

Signed-off-by: Amit Barve <ambarve@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants