Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm-ostree upgrade fails with ostree native containers #4107

Closed
jdoss opened this issue Oct 26, 2022 · 15 comments · Fixed by #4108
Closed

rpm-ostree upgrade fails with ostree native containers #4107

jdoss opened this issue Oct 26, 2022 · 15 comments · Fixed by #4108

Comments

@jdoss
Copy link

jdoss commented Oct 26, 2022

Describe the bug

I followed @miabbott example for setting up FCOS server with a ostree native containers layered on top of FCOS 36.20221001.3.0.

Everything looks great except when I try to stage the automatic update:

# rpm-ostree upgrade --trigger-automatic-update-policy
Pulling manifest: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:latest
error: remote error: getting username and password: 1 error occurred:
	* reading JSON file "/run/containers/62011/auth.json": open /run/containers/62011/auth.json: permission denied

I am not sure how to debug from here.

Reproduction steps

  1. Boot my ignition below with Bupy bupy vm layered-fcos-demo.bu --port 2022 --port 8000 --port 8022
  2. Let it fully boot and reboot with the container layer. SSH in and run rpm-ostree upgrade --trigger-automatic-update-policy
  3. See the error.

Expected behavior

Pull down the updated container layer and stage the update.

Actual behavior

error: remote error: getting username and password: 1 error occurred:
	* reading JSON file "/run/containers/62011/auth.json": open /run/containers/62011/auth.json: permission denied

System details

  • FCOS 36.20221001.3.0
  • QEMU

Ignition config

variant: fcos
version: 1.4.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-ed25519 mycoolkey jdoss@solidadmin.com
storage:
  directories:
  - path: /opt/pngx
    mode: 0755
  - path: /opt/pngx/export
    mode: 0755
systemd:
  units:
    - name: zincati.service
      enabled: false
    - name: fcos-rebase.service
      enabled: true
      contents: |
        [Unit]
        Description=Rebase FCOS to Container Image
        ConditionPathExists=!/var/lib/fcos-rebase.stamp
        ConditionFirstBoot=true
        Before=first-boot-complete.target
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=rpm-ostree rebase --bypass-driver --experimental ostree-unverified-registry:quay.io/quickvm/paperless-ngx:latest
        ExecStart=/bin/touch /var/lib/fcos-rebase.stamp
        ExecStartPost=systemctl reboot
        [Install]
        WantedBy=multi-user.target

Additional information

If we can blame @davdunc in anyway for this issue, that would make my week.

@dustymabe
Copy link
Member

Without --trigger-automatic-update-policy it works fine?

@jdoss
Copy link
Author

jdoss commented Oct 26, 2022

It does not work :(

[root@localhost ~]# rpm-ostree upgrade
note: automatic updates (stage) are enabled
Pulling manifest: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:latest
error: remote error: getting username and password: 1 error occurred:
	* reading JSON file "/run/containers/62011/auth.json": open /run/containers/62011/auth.json: permission denied

@dustymabe
Copy link
Member

your registry is private? I've tried this with a public quay registry, but not with a private one.

cc @jmarrero @cgwalters

@jdoss
Copy link
Author

jdoss commented Oct 26, 2022

Public repo. https://quay.io/repository/quickvm/paperless-ngx

I would like to use a private one for other use cases in the future tho.

@miabbott
Copy link
Member

I can reproduce this with the quay.io/repository/quickvm/paperless-ngx image, but I don't hit the error with my example image.

Can you share the Containerfile or how you are building the container image? It seems like there might be something happening in there that is causing this error.

@cgwalters
Copy link
Member

We support private repos; create /etc/ostree/auth.json which is a standard container pull secret.

That said, hmm I think I have seen that error when authentication fails instead of a saner "unauthorized" error. Looking quickly, I think we need to explicitly tell fetches to be anonymous if we don't find an auth file.

But at the moment, I'm not reproducing this failure.

I can reproduce this with the quay.io/repository/quickvm/paperless-ngx image,

Ah wait, but that's not the image. quay.io inserts a "/repository" in the web URL, but you can't use that as part of the container pull spec.

cgwalters referenced this issue in cgwalters/ostree-rs-ext Oct 26, 2022
We've seen a weird error out of the container stack when
we're not authorized to fetch an image, *and* no pull secret
is set up.

e.g.  https://github.com/coreos/fedora-coreos-tracker/issues/1328#issuecomment-1292067775

```
error: remote error: getting username and password: 1 error occurred:
	* reading JSON file "/run/containers/62011/auth.json": open /run/containers/62011/auth.json: permission denied
```

We don't want the containers/image stack trying to read the "standard"
config paths at the moment for a few reasons; one is that the standard
paths conflate "root" and "the system".  We want to support
separate pull secrets.  But, it should also work to symlink
the authfile.
@cgwalters
Copy link
Member

I did ostreedev/ostree-rs-ext#389 related to this. I have seen that error myself in the past, but now I'm a bit confused as to which scenarios reproduce it.

@jdoss
Copy link
Author

jdoss commented Oct 26, 2022

@miabbott here ya go:

FROM quay.io/fedora/fedora-coreos:stable
COPY etc /etc
COPY usr /usr
RUN rpm-ostree install systemd-oomd-defaults && \
    rpm-ostree cleanup -m && \
    sed -i 's/#AutomaticUpdatePolicy.*/AutomaticUpdatePolicy=stage/' /etc/rpm-ostreed.conf && \
    systemctl enable pngx-pod.service && \
    systemctl enable pngx-postgres.service && \
    systemctl enable pngx-redis.service && \
    systemctl enable pngx-tika.service && \
    systemctl enable pngx-gotenberg.service && \
    systemctl enable pngx-webserver.service && \
    systemctl enable pngx-sftpgo.service && \
    systemctl enable rpm-ostreed-automatic.timer && \
    ostree container commit

@cgwalters I copied my .docker/config.json and removed the auth token and things started working.

[root@localhost ~]# cat /etc/ostree/auth.json
{
  "auths": {
    "quay.io": {
      "auth": "",
      "email": ""
    }
  }
}
[root@localhost ~]# rpm-ostree upgrade --trigger-automatic-update-policy
Pulling manifest: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:testing
No upgrade available.

@miabbott
Copy link
Member

I can reproduce this with the quay.io/repository/quickvm/paperless-ngx image,

Ah wait, but that's not the image. quay.io inserts a "/repository" in the web URL, but you can't use that as part of the container pull spec.

I think that is just a copy/paste error on my part.

$ coreos-assembler run                                                                                                       

Fedora CoreOS 36.20221010.dev.0                   
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos              
                                                                               
Last login: Wed Oct 26 17:10:32 2022                                
[core@cosa-devsh ~]$ sudo systemctl disable zincati
Removed /etc/systemd/system/multi-user.target.wants/zincati.service.
[core@cosa-devsh ~]$ sudo rpm-ostree rebase --bypass-driver --experimental ostree-unverified-registry:quay.io/quickvm/paperless-ngx:latest
Pulling manifest: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:latest
Importing: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:latest (digest: sha256:68eba55805c4a43f05ec84d6ec9f7d6324a341ee045485830c52d4a8297fc76c)
ostree chunk layers stored: 0 needed: 51 (755.2?MB)
custom layers stored: 0 needed: 3 (5.5?MB)
Fetching ostree chunk sha256:39c48618fe92 (185.8?MB)
Fetched ostree chunk sha256:39c48618fe92
Fetching ostree chunk sha256:771577faa266 (48.0?MB)
Fetched ostree chunk sha256:771577faa266
Fetching ostree chunk sha256:60e7069a5951 (38.3?MB)
...
Staging deployment... done
Downgraded:
  alternatives 1.21-1.fc36 -> 1.19-2.fc36
  amd-gpu-firmware 20220913-140.fc36 -> 20220815-139.fc36
  dracut 057-3.fc36 -> 056-1.fc36
  dracut-network 057-3.fc36 -> 056-1.fc36
  dracut-squash 057-3.fc36 -> 056-1.fc36
  expat 2.4.9-1.fc36 -> 2.4.7-1.fc36
  intel-gpu-firmware 20220913-140.fc36 -> 20220815-139.fc36
  kernel 5.19.14-200.fc36 -> 5.19.12-200.fc36
  kernel-core 5.19.14-200.fc36 -> 5.19.12-200.fc36
  kernel-modules 5.19.14-200.fc36 -> 5.19.12-200.fc36
  linux-firmware 20220913-140.fc36 -> 20220815-139.fc36
  linux-firmware-whence 20220913-140.fc36 -> 20220815-139.fc36
  nvidia-gpu-firmware 20220913-140.fc36 -> 20220815-139.fc36
  openldap 2.6.3-1.fc36 -> 2.6.2-3.fc36
  openldap-compat 2.6.3-1.fc36 -> 2.6.2-3.fc36
  tzdata 2022d-1.fc36 -> 2022c-1.fc36
Added:
  systemd-oomd-defaults-250.8-1.fc36.noarch
Changes queued for next boot. Run "systemctl reboot" to start a reboot
[core@cosa-devsh ~]$ sudo systemctl reboot
[EVENT | QEMU guest is ready for SSH] [ [0;32m  OK   [0m] Started  [0;1;39mNetworkManager-dis…Manager Script Dispatcher Service.Fedora CoreOS 36.20221001.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

Last login: Wed Oct 26 17:12:28 2022
[core@cosa-devsh ~]$ sudo rpm-ostree upgrade
note: automatic updates (stage) are enabled
Pulling manifest: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:latest
error: remote error: getting username and password: 1 error occurred:
        * reading JSON file "/run/containers/62011/auth.json": open /run/containers/62011/auth.json: permission denied

@cgwalters
Copy link
Member

OK wow, yeah this reproduces after a reboot - but not after restarting rpm-ostreed? 😕

Digging

@cgwalters cgwalters transferred this issue from coreos/fedora-coreos-tracker Oct 26, 2022
@cgwalters
Copy link
Member

Verified that ostreedev/ostree-rs-ext#389 fixes this. That said, I'm not yet 100% sure why we don't see this on the initial rebase.

@dustymabe
Copy link
Member

So this is fixed upstream by ostreedev/ostree-rs-ext@64af26c ?

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 27, 2022
This was somehow failing in
coreos#4107

I want to see if we can reproduce it in CI.
@cgwalters
Copy link
Member

So this is fixed upstream by ostreedev/ostree-rs-ext@64af26c ?

Yes, I did get as far as verifying that I got the failing symptom, but then deploying the patched rpm-ostree (with the new vendored ostree-ext code) fixed it.

What still isn't clear to me is why we only somehow hit this after a reboot. The problem clearly has something to do with our use of DynamicUser=yes User=rpm-ostree when forking off skopeo.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 28, 2022
We want to ensure that we can both `podman run` and
pull containers.

xref coreos#4107
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 28, 2022
This was somehow failing in
coreos#4107

I want to see if we can reproduce it in CI.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 28, 2022
We want to ensure that we can both `podman run` and
pull containers.

xref coreos#4107
@cgwalters
Copy link
Member

cgwalters commented Oct 28, 2022

Ohhhh man, this bug is awesome. Such an absolutely perfect example of a bug that'd be caught by "real" systems testing and not our synthetic integration tests.

The problem here is:

  • By default /run/containers does not exist on boot
  • We run skopeo as the rpm-ostree user, but without HOME and things like XDG_RUNTIME_DIR set
  • The containers/image stack, when run as a user but without XDG_RUNTIME_DIR (I think that's the cause) looks for /run/containers/$uid/auth.json
  • This fails with ENOENT - but that failure is just ignored (as expected)

This is why this all passes our integration tests.

But - podman will create /run/containers with mode -rwx------ when you run a container.

And then, all attempts to open /run/containers will fail with EPERM.

Anyways so yes, the right fix here is definitely to tell the container stack not to look for an authfile. But, I may look at patching the containers/ stack to be more robust to this type of privilege-dropping scenario.

I updated #4108 - let's see if that test fails.

Then, it should pass when we bump ostree-rs-ext.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 28, 2022
In particular this should fix us trying to load the authfile
Closes: coreos#4107
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 28, 2022
This was somehow failing in
coreos#4107

I want to see if we can reproduce it in CI.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 28, 2022
In particular this should fix us trying to load the authfile
Closes: coreos#4107
@claptrap666
Copy link

@miabbott here ya go:

FROM quay.io/fedora/fedora-coreos:stable
COPY etc /etc
COPY usr /usr
RUN rpm-ostree install systemd-oomd-defaults && \
    rpm-ostree cleanup -m && \
    sed -i 's/#AutomaticUpdatePolicy.*/AutomaticUpdatePolicy=stage/' /etc/rpm-ostreed.conf && \
    systemctl enable pngx-pod.service && \
    systemctl enable pngx-postgres.service && \
    systemctl enable pngx-redis.service && \
    systemctl enable pngx-tika.service && \
    systemctl enable pngx-gotenberg.service && \
    systemctl enable pngx-webserver.service && \
    systemctl enable pngx-sftpgo.service && \
    systemctl enable rpm-ostreed-automatic.timer && \
    ostree container commit

@cgwalters I copied my .docker/config.json and removed the auth token and things started working.

[root@localhost ~]# cat /etc/ostree/auth.json
{
  "auths": {
    "quay.io": {
      "auth": "",
      "email": ""
    }
  }
}
[root@localhost ~]# rpm-ostree upgrade --trigger-automatic-update-policy
Pulling manifest: ostree-unverified-image:docker://quay.io/quickvm/paperless-ngx:testing
No upgrade available.

thx for the workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants