Improve http import flow to decide whether to use scratch space or not #3219

alromeros · 2024-04-22T17:56:35Z

What this PR does / why we need it:

Follow up for #3212.

Due to some performance issues discussed in #2809, we stopped using nbdkit for most http imports and decided to use our custom scratch space method for most imports. This new behavior introduced minor differences in some specific flows, such as stopping the conversion of uncompressed raw images.

The conversion process made the actual size of raw images significantly smaller, probably because of qemu's handling of sparse images. The import of raw images without conversion ended up causing failures in some tests, as imported images had a significant increase in size.

This PR aims to fix this behavior by using scratch space with more import flows: All archived kubevirt imgs will be transfered direcly to file, while all non-archived kubevirt imgs will be imported to scratch.

Example:

Fresh image import before this PR:

sh-5.1$ qemu-img info disk.img 
image: disk.img
file format: raw
virtual size: 70 GiB (75161927680 bytes)
disk size: 55 GiB

Fresh image import after this PR (due to convert):

$ qemu-img info disk.img 
image: disk.img
file format: raw
virtual size: 70 GiB (75161927680 bytes)
disk size: 9.95 GiB

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # https://issues.redhat.com/browse/CNV-36026

Special notes for your reviewer:

Check #2809 and #2832 for more context about the original change.
Release note:

Bugfix: Use scratch space when importing non-archived images

alromeros · 2024-04-22T17:58:06Z

pkg/importer/format-readers.go

 	case "xz":
 		r, err = fr.xzReader()
 		if err == nil {
 			fr.Archived = true
 			fr.ArchiveXz = true
 		}
+	case "qcow2":


Note for reviewer: Just moving this to group archived/converted formats together.

mhenriks · 2024-04-23T01:54:38Z

pkg/importer/http-datasource.go

@@ -133,7 +133,7 @@ func (hs *HTTPDataSource) Info() (ProcessingPhase, error) {
 	if hs.contentType == cdiv1.DataVolumeArchive {
 		return ProcessingPhaseTransferDataDir, nil
 	}
-	if !hs.readers.Convert {
+	if hs.readers.Archived {
 		return ProcessingPhaseTransferDataFile, nil


I say we transfer to scratch for archived as well so I think we can delete this if

I fear we might uncover issues like the one this PR addressed (#2845), but I'm ok with the change. Let's see how tests behave.

mhenriks · 2024-04-23T01:58:07Z

wonder if the number of cases in which we don't use scratch for import (I think just registry node pull?) is so limited that we should get rid of the "recreate pod with scratch space" flow? import controller can be in charge of deciding when scratch is NOT needed

Since we dropped the use of nbdkit we've started prioritizing the use of scratch space for most import flows. However, this new behavior introduced minor differences such as stop converting raw images, which caused inconsistencies in some tests. This commit improves the importer flow to better determine whether to use scratch space or not. Signed-off-by: Alvaro Romero <alromero@redhat.com>

alromeros · 2024-04-23T15:05:44Z

/retest-required

mhenriks · 2024-04-23T16:44:28Z

pkg/importer/http-datasource.go

@@ -133,19 +133,13 @@ func (hs *HTTPDataSource) Info() (ProcessingPhase, error) {
 	if hs.contentType == cdiv1.DataVolumeArchive {
 		return ProcessingPhaseTransferDataDir, nil
 	}
-	if !hs.readers.Convert {
-		return ProcessingPhaseTransferDataFile, nil
-	}
 	if pullMethod, _ := util.ParseEnvVar(common.ImporterPullMethod, false); pullMethod == string(cdiv1.RegistryPullNode) {


Planning to try getting rid of nbdkit here? I don't think it's necessary

Didn't this PR (#2845) address some bug for using scratch space with pull node imports? @akalenyu wdyt?

The major change in that PR is returning "convert" for registry node pull. Can still do that. I just don't think we need nbdkit

I didn't have much time to test this today but I added a commit to get rid of nbdkit for pull node. Let's see how tests behave and I'll do more testing tomorrow.

@mhenriks seems the version of qemu-img we use doesn't support running convert/info on http endpoints, that's why we need either scratch space or nbdkit. It works on my local version though (odd since I'm using an older version), but since we expect to backport this I think we should keep using nbdkit here.

That does not sound correct to me. What is the error you see? Can you check the logs to see what args we are we passing to qemu-img?

Just a simple qemu-img info --output=json http://localhost:8100/tinyCore.iso returns qemu-img: Could not open 'http://localhost:8100/tinyCore.iso': Unknown protocol 'http'. I've tried creating a debugging pod and manually running the command inside and it happens the same. However, running it locally works.

Ah the container must not have qemu-block-curl package.

Since updating rpms in old branches is annoying. We can hold off getting rid of nbdkit in this pr but I think we should get rid of it in a subsequent pr that is not backported

alromeros · 2024-04-26T14:49:06Z

wonder if the number of cases in which we don't use scratch for import (I think just registry node pull?) is so limited that we should get rid of the "recreate pod with scratch space" flow? import controller can be in charge of deciding when scratch is NOT needed

I like that idea, other than archive imports, vddk and registry node pull I think we can use scratch for every import type. We can leave that for other PR but good thing to consider for sure.

mhenriks · 2024-04-26T14:55:28Z

/lgtm
/approve

Thanks @alromeros, this PR is something that can be backported. For the furure we should:

Get rid of nbdkit, add qemu-img-block-curl
Looks like import-controller can determine exactly when scratch space is required so let's get rid of the "pod restarts to get scratch space" machinery

kubevirt-bot · 2024-04-26T14:55:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mhenriks

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mhenriks]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alromeros · 2024-04-26T16:54:05Z

/cherrypick release-v1.57
/cherrypick release-v1.58
/cherrypick release-v1.59

kubevirt-bot · 2024-04-26T16:55:07Z

@alromeros: new pull request created: #3222

In response to this:

/cherrypick release-v1.57
/cherrypick release-v1.58
/cherrypick release-v1.59

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alromeros · 2024-04-26T16:59:17Z

/cherrypick release-v1.58

kubevirt-bot · 2024-04-26T17:00:07Z

@alromeros: new pull request created: #3223

In response to this:

/cherrypick release-v1.58

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alromeros · 2024-04-26T17:01:02Z

/cherrypick release-v1.59

kubevirt-bot · 2024-04-26T17:02:13Z

@alromeros: new pull request created: #3224

In response to this:

/cherrypick release-v1.59

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 22, 2024

kubevirt-bot requested review from aglitke and arnongilboa April 22, 2024 17:56

kubevirt-bot added the size/XS label Apr 22, 2024

alromeros commented Apr 22, 2024

View reviewed changes

mhenriks reviewed Apr 23, 2024

View reviewed changes

alromeros force-pushed the fix-raw-import-flow branch from 1e0ef53 to 684ba92 Compare April 23, 2024 08:23

kubevirt-bot added size/S and removed size/XS labels Apr 23, 2024

alromeros force-pushed the fix-raw-import-flow branch from 684ba92 to db3c151 Compare April 23, 2024 08:27

mhenriks reviewed Apr 23, 2024

View reviewed changes

kubevirt-bot added size/M and removed size/S labels Apr 25, 2024

alromeros force-pushed the fix-raw-import-flow branch from f332dfc to db3c151 Compare April 26, 2024 11:30

kubevirt-bot added size/S and removed size/M labels Apr 26, 2024

kubevirt-bot assigned mhenriks Apr 26, 2024

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2024

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2024

kubevirt-bot merged commit cd56b6c into kubevirt:main Apr 26, 2024
18 checks passed

kubevirt-bot mentioned this pull request Apr 26, 2024

[release-v1.57] Improve http import flow to decide whether to use scratch space or not #3222

Closed

kubevirt-bot mentioned this pull request Apr 26, 2024

[release-v1.58] Improve http import flow to decide whether to use scratch space or not #3223

Closed

kubevirt-bot mentioned this pull request Apr 26, 2024

[release-v1.59] Improve http import flow to decide whether to use scratch space or not #3224

Merged

akalenyu mentioned this pull request May 1, 2024

Handle lost sparseness in non http data sources #3213

Open

This was referenced May 2, 2024

[release-v1.58] Manual backport of improve http import flow to decide whether to use scratch space or not #3230

Merged

[release-v1.57] Manual backport of improve http import flow to decide whether to use scratch space or not #3231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve http import flow to decide whether to use scratch space or not #3219

Improve http import flow to decide whether to use scratch space or not #3219

alromeros commented Apr 22, 2024

alromeros Apr 22, 2024

mhenriks Apr 23, 2024

alromeros Apr 23, 2024

mhenriks commented Apr 23, 2024

alromeros commented Apr 23, 2024

mhenriks Apr 23, 2024

alromeros Apr 24, 2024

mhenriks Apr 24, 2024

alromeros Apr 25, 2024

alromeros Apr 26, 2024

mhenriks Apr 26, 2024

alromeros Apr 26, 2024

mhenriks Apr 26, 2024

alromeros commented Apr 26, 2024 •

edited

Loading

mhenriks commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

Improve http import flow to decide whether to use scratch space or not #3219

Improve http import flow to decide whether to use scratch space or not #3219

Conversation

alromeros commented Apr 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhenriks commented Apr 23, 2024

alromeros commented Apr 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alromeros commented Apr 26, 2024 • edited Loading

mhenriks commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024

kubevirt-bot commented Apr 26, 2024

alromeros commented Apr 26, 2024 •

edited

Loading