-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdi-importer: import virtualbox ova or tar archive in oneshot #2748
Conversation
Hi @lxs137. Thanks for your PR. I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test all |
Hi, thanks for the PR, looks like there are some unit tests failures. You can run the unit tests by calling |
/test pull-containerized-data-importer-e2e-hpp-latest |
@@ -46,6 +46,7 @@ nbdkit-basic-filters | |||
nbdkit-curl-plugin | |||
nbdkit-xz-filter | |||
nbdkit-gzip-filter | |||
nbdkit-tar-filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to understand if/how nbdkit does caching with this filter. I bet some caching is going on as it would be pretty expensive to always open/seek to the appropriate offset in the archive. If there is any significant on disk/memory caching I'd prefer we use simply the golang tar reader to seek to the file and download it to scratch. Then convert from scratch to target. Thus not getting nbdkit involved at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I can't get the point, why nbdkit cache will be the concern.
I believe nbdkit-tar-filter will only retrieve tar header block, if find the header name match "tar-entry", then get the target file data offset. (I think golang tar reader will do the same thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like tar plugin writes to tmp which is something we have to avoid: https://github.com/libguestfs/nbdkit/blob/45b72f5bd8fc1b475fa130d06c86cd877bf595d5/filters/tar/tar.c#L146
But I'm not an expert on nbdkit.
@rwmjones care to comment on how curl + tar filter uses tmp space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tar filter will not use tmp space (or more accurately, only a few bytes are used for a temporary file used to parse offsets printed by tar). Here's how it works:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise the code isn't very easy to follow. What it does is this:
(1) When we connect first time, we start the command tar --no-auto-compress -t --block-number -v -f - filename > /tmp/output
(2) We read over the first blocks of the tar file and feed them to stdin of tar
. This does mean that if the file inside the tar is not early on inside the tar then we end up reading through the earlier files. But we do not write them to disk.
(3) As we go along we look at the output of tar which will eventually print the offset of the file we want in the tar to the randomly named /tmp/output
file.
(4) We are then able to read at the right offset/size within the tar directly when the NBD client reads the disk image.
/ok-to-test |
I'd like to think about this PR more as "tar file support" then "ova support" so a couple suggestions. If the tar archive only has one file, use that regardless of extension. Support for other extensions like ".iso" ".raw" etc What do you think @awels? |
It should be possible to tell cheaply if a tar file contains either just one file or (more realistic and still usable) the large file is contained early on. Tar files are basically:
(There is no central index unlike zip or any modern format.) You could read the first NN bytes (eg. 10M or whatever you consider small enough), feed that into In fact this would be a simple modification to nbdkit-tar-file, adding some sort of "read limit". (@ebblake) |
FYI: single file in a tar came from a discussion with @rmohr regarding how GCE imports disk images as described here: https://cloud.google.com/compute/docs/import/import-existing-image#requirements_for_the_image_file |
Patch for above posted: https://listman.redhat.com/archives/libguestfs/2023-June/031837.html |
Supporting "tar.gz" is probably beyond the scope of this PR though. Edit: for now (this pr), I think supporting the additional file extensions may be good enough |
a907711
to
e158570
Compare
Update:
|
e158570
to
546fa75
Compare
/test pull-containerized-data-importer-e2e-ceph |
1 similar comment
/test pull-containerized-data-importer-e2e-ceph |
I am trying to use nbdkit serve a remote fast:
slow (60x):
I find it's because when combined tar filter and gzip filter, nbdkit will try to download whole remote file. @rwmjones will you take a look on it, thank you :-) |
ᵃWell, it's a complicated story. It may be possible to make a modifiable seekable gzip: https://rwmj.wordpress.com/2022/12/01/creating-a-modifiable-gzipped-disk-image/ |
OK, I test a If user provide an unseekable compressed tar archive (gzip, xz with large block size), run qemu-img info command over nbdkit will be super slow. @mhenriks What do you think? |
This can be used to ensure that the tar filter does not read indefinite amounts of input when opening the tar file. See: kubevirt/containerized-data-importer#2748 (comment)
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7a86f10
to
293f7fe
Compare
@mhenriks Sorry for my late reply, I have done all the work. :-)
|
Signed-off-by: menyakun <lxs137@hotmail.com>
* if find a compressed file, it will look inner file format * for archive format, user can read archive inner file content directly Signed-off-by: menyakun <lxs137@hotmail.com>
* use nbdkit-tar-filter to extract archive inner file content streamly * for unseekable compressed archive (gzip, xz with large block size), qemu-img over nbdkit will be super slow, so use FormatReaders to download inner file to scratch space (fallback to previous method) Signed-off-by: menyakun <lxs137@hotmail.com>
Signed-off-by: menyakun <lxs137@hotmail.com>
f122355
to
d8b9611
Compare
/test pull-cdi-unit-test |
/test pull-containerized-data-importer-non-csi-hpp |
@@ -52,6 +52,8 @@ func main() { | |||
[]string{".qcow2", ".gz"}, | |||
[]string{".qcow2", ".xz"}, | |||
[]string{".qcow2", ".zst"}, | |||
[]string{".tar", ".gz"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondering, if it can deal with file like xxx.raw.tar.gz ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a180285 Sure, actually we match file by magic number instead of extension.
@mhenriks Hi, do you have some time to review on this PR ~ |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Sorry this needs a rebase before we can review it. |
Read some comments from #2809. Maybe this PR should hold unitl ndkit-1.35.8 is availavle in centos 9 stream, |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. /close |
@kubevirt-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
This PR try to make CDI support import Virtualbox OVA file (tar archive is also supported) in oneshot.
Without the PR, we need to download OVA file, and extract vmdk file in it, then convert it to QCOW2 file, upload the QCOW2 file into the PVC.
something need help
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: