Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: container-native rpmdb format #2005

Open
cgwalters opened this issue Apr 11, 2022 · 4 comments
Open

RFE: container-native rpmdb format #2005

cgwalters opened this issue Apr 11, 2022 · 4 comments
Labels
containers Containers and related technologies RFE

Comments

@cgwalters
Copy link
Contributor

The way OCI/Docker containers work is a series of layers. In overlayfs, modified files are "copied up".

There are two issues:

  • rpms have a lot of metadata, so the rpmdb can be of nontrivial size
  • The current main rpmdb format (sqlite) being a single big file means that it gets duplicated in each derived image

In the common case of something like e.g.

FROM registry.fedoraproject.org/fedora:35
RUN dnf -y install cowsay && dnf clean all

The resulting tar layer from the RUN command has an entirely duplicated copy of the rpm database, just with cowsay and its dependencies added.

It's quite common for builds like this to actually form the base image for further images - and this duplication gets compounded.

A strawman proposal here is something like /usr/lib/sysimage/rpmdb.d with something simple like zstd-compressed JSON files storing data instead. The files could be named something simple like 0000.json.zstd and then later changes which add packages add just a new 0001.json.zstd file or so. Or, we could keep sqlite and union those; I don't have a really strong opinion. (Well, not really "union" literally but more "merge", since we should support removing or upgrading packages from prior layers)

@voxik
Copy link
Contributor

voxik commented Apr 12, 2022

This is either related or duplicate of #1885.

@cgwalters
Copy link
Contributor Author

You're right, this overlaps a lot with previous threads, however, this one is much more about OCI/Docker containers than the previous ones.

@DemiMarie
Copy link
Contributor

There are two approaches to fix this:

  1. Use a storage engine that provides block-level copy-on-write, rather than file-level copy-on-write. BTRFS, ZFS, and device-mapper satisfy this requirement, as does overlay2 on a filesystem supporting reflinks. This does not help image layer sizes, however.
  2. Use a one-file-per-entry approach.

@lnussel
Copy link
Contributor

lnussel commented Apr 13, 2022

The ability to overlay a tree with extra packages is also part of the motivation for #1959.
Meanwhile a q&d solution for containers would be to add eg a plugin to dnf that dumps the database after the transaction and then deletes the database. On startup it could import those headers again if no db exists yet (cat *|rpmdb --importdb).

https://github.com/lnussel/toy/blob/master/dumpheaders.c stores rpm headers in separate files in contrast to rpmdb --exportdb.
In #1959 are also commits that allow use of packages (with or without payload) for that purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Containers and related technologies RFE
Projects
Status: Backlog
Development

No branches or pull requests

6 participants