Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dmverity-vhd -d performance #2086

Closed
wants to merge 7 commits into from
Closed

Conversation

DomAyre
Copy link
Contributor

@DomAyre DomAyre commented Mar 27, 2024

When using the -d option for dmverity-vhd, HTTP requests are made for each layer of the image. On a 2 core codespace runner, this takes roughly 10-15 seconds per layer, and ends up being the vast majority of the runtime especially for images with many layers.

To fix this, this PR instead uses the docker package to save an image into a tar file then follows the same code path as if the user used the -t flag to specify a local tar file. When v1.Layer.Uncompressed() is called, it just opens the file instead of making a new HTTP request via daemon.(*imageOpener).saveImage(), saving the request time of ~10-15 seconds per layer.

On the previously mentioned 2 core runner, this results in execution time going from ~150 seconds to ~20 seconds for a simple 9 layer image.

@DomAyre DomAyre requested a review from a team as a code owner March 27, 2024 16:49
@DomAyre DomAyre force-pushed the tarball branch 3 times, most recently from e6110e9 to 2d64f26 Compare March 27, 2024 17:00
Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
if err != nil {
return nil, fmt.Errorf("failed to create tar file: %w", err)
}
defer tarFile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like defer os.Remove(tarFile.Name()) is also needed to clean up

https://pkg.go.dev/os#CreateTemp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite do that otherwise the tar file is cleaned up before it's read for layer hashing.
But after hashing we can cleanup any temporary file matching the pattern we use to create the tar.

Done in 2352140 and d79678f

Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
Signed-off-by: Dominic Ayre <domayre@outlook.com>
Copy link
Contributor

@SethHollandsworth SethHollandsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! I'm good as long as a fallback/option gets added if there isn't enough disk space to save the tar file.

update: and double-checking building on windows works since that's what the release pipeline uses

cmd/dmverity-vhd/main.go Show resolved Hide resolved
docker-ubuntu.dockerfile Outdated Show resolved Hide resolved
cmd/dmverity-vhd/dmverity-vhd Outdated Show resolved Hide resolved
@DomAyre
Copy link
Contributor Author

DomAyre commented Mar 28, 2024

This PR is superseded by #2089 which doesn't need to compromise on disk space, I suggest we close this

Signed-off-by: Dominic Ayre <domayre@outlook.com>
Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
@DomAyre DomAyre closed this Apr 3, 2024
@DomAyre
Copy link
Contributor Author

DomAyre commented Apr 3, 2024

Closed this as it's superseded by #2089

anmaxvl pushed a commit that referenced this pull request Apr 12, 2024
The current implementation of dmverity-vhd -d has to make one of the
following tradeoffs:

Runtime: By default this option calls the docker daemon to fetch the
entire image for each layer as it doesn't provide an endpoint to get
a specific layer
Memory: The user can include a -b option that makes this call buffered,
keeping the image in memory the whole time, this is much faster but
at the cost of keeping the whole image in memory, which is a problem
with runners with low memory

#2086 Proposed a new tradeoff of disk space, by saving the image to
disk and accessing the layers locally, this is a problem for runners with
smaller disks as the image is stored twice.

This solution makes a single request to the docker daemon, and
processes both the layer hashes and the manifest to assign layer
numbers in a single pass, making it performant in all three aspects.

---------

Signed-off-by: Dominic Ayre <dominicayre@microsoft.com>
Signed-off-by: Dominic Ayre <domayre@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants