`container-encapsulate` should be more intelligent #4012

cgwalters · 2022-09-14T18:20:01Z

Today, the algorithm backing rpm-ostree compose container-encapsulate (and rpm-ostree compose image) is pretty simplistic. It basically tries to stick the largest RPMs in layers, and then "spills" the rest to a final layer.

The status quo works okay-ish for Fedora CoreOS:

$ oc image info --filter-by-os=linux/amd64 quay.io/coreos-assembler/fcos:stable
Name:       quay.io/coreos-assembler/fcos:stable
Digest:     sha256:58e8a1a9363d9c2861029c8dfc0373a77da97a2c88fbc87cc5b9f2643b5f15ba
Media Type: application/vnd.oci.image.manifest.v1+json
Created:    <unknown>
Image Size: 695.5MB in 51 layers
Layers:     1.37MB  sha256:2803614192289ed0e6bf617de6a6f511b3c82a502fa04e91e3cef9305ec906fd
            141.3MB sha256:5e0b358421b94bef22890f9c67701314ff728ae26ffe50fc8f568413ea7a4153
            47.76MB sha256:685c4deef0f88eabe2a2d5a6355e7a59d8630e75278b4a79fa395f542f05dfd7
            41.02MB sha256:ebb42facc844d6087656db5261679136d27201703461e7a2f5e48a959e330c99
            86.06MB sha256:3cd7db3afe7c619bc73d15b97c0ce85775bd904ca12f15e4e581b8772ddf546a
            20.03MB sha256:43e9b1e61be550cf751d51f5298066b09208571c6c0dd10ffcb95cb9fac8290a
            59.68MB sha256:98aa1f3a32c335a4649964ce1135a59e53a1fc9e2cc929f0a40739fbd3201806
            51.32MB sha256:762b86a9b7cb94e54887252926111788ad44f0c61511989f5de11813c89be27e
            21.14MB sha256:04eba45cf01935b31fd5d4b3867f6b02edb1d4d00b9b991e0974ae13fda77c9b
            15.02MB sha256:648756ecc13f556778be371caff38b7c0fcd84a02c80807446c1ff75042d143f
            13.81MB sha256:919d0a5eeac9c782c4c500862f47ea05b26b304343cde69050774a07f0adc8d5
            8.417MB sha256:4f2b3e4f0d06265051c442b26925afce53a11e5763de600cc67fff74a1ad6ae2
            7.007MB sha256:819f2a83ce2bb2b75bb69623ad1d1f23873ccd70e6c98f76186f8595c1d833c9
            6.468MB sha256:db0335db35dbf0b08548876c47b0f819be757102e64999ae7ee3ea5f12278a7d
            7.756MB sha256:0fa87ffef469be4c66fcce340f0045c373ce702c40524f00689f182a5f1e7c06
            4.453MB sha256:262c16e2ca6ccb3c8b41f69baea62c3040dbfb08eca5648823554b5a8770416b
            5.577MB sha256:50e2c7801e3aada05c85c4685096ef3c3696be62fdae6f8737151237bdc183d4
            4.781MB sha256:6a78b4a8aea7a43020feefd37a79bdca8379ea58e9f54956f47affd39a736634
            5.179MB sha256:40b50588539e1c29f94e9c31d6e4ea7e42c63ecc99050e8a3adabad39eb4fd36
            3.91MB  sha256:016b965dd8a585c18e54288275575b5b5473f93ec131e2c95f5749a7996ac8b3
            2.378MB sha256:90be2a199fe59f2e9f48891a04d5c49134f74753a4ff942d7780e8cd34130470
            3.652MB sha256:9b492debab68c2b595eafae0d5679da0c5ff82f1017896f8f483929bce8c37b8
            3.363MB sha256:fc1dca39d09b3fbd4fb11efe14064d761b38543af2c734ce605b53bb324466c6
            3.073MB sha256:6e06589b5bc234fdefc6629843fc359dcc1be14e07cf4f0e5b6c7f40f039d571
            3.084MB sha256:3dbbaeae481fb76b4a92f5a354c11596c7702d3d03a4e7fca8a6f03cfbbc1ea5
            851.7kB sha256:46bc077f5512fdfb06e8cbb1976059d8a80b7a220a13a78af129ef1c28f8ea4a
            3.476MB sha256:cd32b4bbd519e0ad366d68fd079269856e150ae5dca504655ae928000851a256
            3.155MB sha256:7ab9d2c4200351c4d0a6981768a854aee4eaaeb7ec048e965d8f311e7f8cdb7c
            2.752MB sha256:665638f3344c7ab95517ccd3b0dbe1728fd787827205702c985db9284969386c
            3.155MB sha256:49af014e21761faa00d8185e230098c08333eb29f3fd0ccd755776c50a91d416
            2.848MB sha256:cef938bc1d5e5f58f8a6f34b2e52f9e6c2e0468aeffedfba4662f3b7b76bdfa5
            3.188MB sha256:826c13797a1cceb341b9909ce1334fdcea78a9adb230f30ba5b5a88f220eafda
            2.993MB sha256:fb7f46adbe0d899293e217aef642734d82cf745b514797e09a7ea0e76ae6acc2
            2.728MB sha256:7a036be11f55fb36c0df200cb835bcb723fb1e225de1c4b33b51b409e3aebfe9
            2.208MB sha256:ed7c5de19fe8efa265ca3a857ed6bbf694fc3fe61228a34241a90ad51a97a477
            2.505MB sha256:0604656e9ca68dc8ecfbf67493f4fdda364c40a32a848d9a56b2d2252f5baab3
            2.543MB sha256:76e5463dee3c810d160c1a701745d412f48c87eb711ef38162fbe91efb18a172
            2.437MB sha256:1b148eda522b8552b1fd69bdf013f0b3c8c7dba410d25cb577b73e9fa7842417
            1.978MB sha256:e07d6a9b7895c9a2705f41190b1712a4688528b277e880339e921d481b1133fb
            5.098MB sha256:55c2a5bc9dfae10952569b4e54f06643ade0e9df4db00c66abdee8a99d02ddc6
            1.026MB sha256:c86a6b4b38e15c8c0a9bd568176eaadc7dab22a6f87bc329b1c98fb6f4edd5d1
            2.742MB sha256:3b8c90c29ed1976a3686576285c0f5473c3eba427b946d0ce370c07b105524b7
            1.606MB sha256:723268fb62e5c7b65602f2ca9ec6aaa804ca0179220453b9dff548f7f54a3ebc
            2.075MB sha256:294bf0d7a637683e9cd5d3bcfe718ff5d1d94619595ae0ba819d58352cde0260
            1.685MB sha256:5f6f8702934c89bfd25f2130166c16b8393847b715de27f54a2cf4d901dea8f0
            2.047MB sha256:6860cf0478050c60a4df41b86d9484348ceb75169bc7c36ee60b550091a01a80
            1.599MB sha256:274425971f91d55e6668a2509109ddb9959745c456135e6cea4f915c5cc15c30
            1.536MB sha256:b9f4282add7dc805dcdb464043ab52bef821d60f0c8543ae954ebbc6d097132d
            2.518MB sha256:0f022eddd221bcdfecbb5cf6b9220e84e0b8cb80a1739faafb2915abf4a006ef
            1.788MB sha256:fcca3c1d763564b646e956f0d6ccf726a79df0db4e8e4ffab218d731f10a3de7
            67.39MB sha256:b153284b960176f42e5f7a3ba8e2129653c9249ba2965be5cd36c1b3d8458efb
$

There's 412 total packages in Fedora CoreOS. That second layer of 141.3MB is the "long tail" of 336 packages - meaning anytime any one of those packages change, we will re-ship that whole layer.

But it gets very much notably worse for Fedora Silverblue today (demo over here) which has 1225 packages (three times that of FCOS), occupying a total of 1.748GB (about 2.5x FCOS).

The "remainder" set is 1200 packages in 461.8MB. It seems likely that at least one of those 1200 packages changes relatively quickly - and if it does, there's a major "size amplification" here.

Solution 0: Accept externally generated data for chunking

Here we take external data using historical build information and the external process decides on the chunking.

Solution 1: Better heuristics and/or human-written suggestions

First we can just better assign packages to "layers" based on heuristics (change frequency) possibly augmented with human-written defaults. (e.g. "group ostree and rpm-ostree").

Solution 2: Create state based on prior builds

Teach the build process to accept a previous build as input (for compose image, we already do) and better compute which packages should go where based on the delta between them?

Solution 3: Fix container runtimes to handle e.g. 500 layers

I don't think this would be really hard, but the problem is there's going to be a long tail of people running old container engines for a while and we want our images to be compatible.
(This is the same thing that will stall container zstd work)

Ultimate solution

What we really want is container image deltas of course.

The text was updated successfully, but these errors were encountered:

jlebon · 2022-09-29T21:14:15Z

I think I mentioned this somewhere else before, but for solution 1, one possible heuristic might be to group packages by the SRPM field since binary packages from the same SRPM will definitely change together.

cgwalters · 2022-09-29T22:13:50Z

I think I mentioned this somewhere else before, but for solution 1, one possible heuristic might be to group packages by the SRPM field since binary packages from the same SRPM will definitely change together.

That happens today; note the src_pkg() in

rpm-ostree/rust/src/container.rs

Line 290 in 72334b9

srcid: Rc::from(pkgmeta.src_pkg().to_str().unwrap()),

cgwalters · 2022-12-12T15:42:38Z

I think it'd be useful to look at optimizing other containers too:

coreos-assembler (and check out https://github.com/cgwalters/coreos-assembler/tree/build-via-rpmostree)
https://github.com/coreos/fedora-coreos-config/tree/testing-devel/ci/buildroot

cgwalters · 2023-01-27T13:19:30Z

Some work on this in #4271

dustymabe · 2023-02-15T00:17:51Z

One other thing we could consider is grouping some noarch packages together in bins. Theoretically the noarch packages are the same across all architectures so this could possibly even represent some space savings in the container registry because a noarch RPM layer between x86_64/aarch64/ppc64le/s390x would (or could) all be the same. In reality the details of how the layers are created may prevent them from being bit for bit compatible so it might not work out. Just a suggestion.

Thanks for working on this.

cgwalters · 2023-02-15T12:49:23Z

Yeah, one can think of this like a multi-dimensional space across which we're trying to find a local maximum. The dimensions are:

time (e.g. across different builds of FCOS)
architectures (sharing storage on the registry for a single build with noarch content)
variants (sharing storage across e.g. FCOS/Silverblue)

In reality the details of how the layers are created may prevent them from being bit for bit compatible so it might not work out.

There definitely will be some sharing in basically all cases because of linux-firmware, which is so large that it always ends up in its own layer. And due to the way the rpm-ostree container builds work, that will always be bit-for-bit identical and hence shared.

cgwalters added the container-native label Sep 14, 2022

cgwalters mentioned this issue Nov 28, 2022

qemu: Add support for full emulation coreos/coreos-assembler#3201

Merged

cgwalters assigned RishabhSaini Dec 5, 2022

RishabhSaini mentioned this issue Jan 4, 2023

Using historical build information for encapsulation #4247

Closed

3 tasks

cgwalters mentioned this issue Jan 27, 2023

OCI Image Based Updates download much more data than delta packages require - quay.io Kinoite Beta #4279

Closed

cgwalters added the triaged This issue was triaged label May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`container-encapsulate` should be more intelligent #4012

`container-encapsulate` should be more intelligent #4012

cgwalters commented Sep 14, 2022 •

edited

Loading

jlebon commented Sep 29, 2022

cgwalters commented Sep 29, 2022

cgwalters commented Dec 12, 2022 •

edited

Loading

cgwalters commented Jan 27, 2023

dustymabe commented Feb 15, 2023

cgwalters commented Feb 15, 2023

container-encapsulate should be more intelligent #4012

container-encapsulate should be more intelligent #4012

Comments

cgwalters commented Sep 14, 2022 • edited Loading

Solution 0: Accept externally generated data for chunking

Solution 1: Better heuristics and/or human-written suggestions

Solution 2: Create state based on prior builds

Solution 3: Fix container runtimes to handle e.g. 500 layers

Ultimate solution

jlebon commented Sep 29, 2022

cgwalters commented Sep 29, 2022

cgwalters commented Dec 12, 2022 • edited Loading

cgwalters commented Jan 27, 2023

dustymabe commented Feb 15, 2023

cgwalters commented Feb 15, 2023

`container-encapsulate` should be more intelligent #4012

`container-encapsulate` should be more intelligent #4012

cgwalters commented Sep 14, 2022 •

edited

Loading

cgwalters commented Dec 12, 2022 •

edited

Loading