Skip to content
This repository has been archived by the owner on Oct 10, 2020. It is now read-only.

'atomic images list' can break if system container is missing files/partially installed #1191

Open
miabbott opened this issue Feb 20, 2018 · 8 comments

Comments

@miabbott
Copy link
Contributor

In the error situation covered by RHBZ#1542144, it is possible to end up with a system container that is only partially installed.

In this partial-install state, the atomic images list functionality breaks and is unable to show any of the images on the host.

# rpm -q atomic
atomic-1.21.1-1.git1170769.el7.x86_64

# atomic images list
[Errno 2] No such file or directory: '/var/lib/containers/atomic/etcd.0/info'

# atomic --debug images list
Namespace(_class=<class 'Atomic.images.Images'>, all=False, assumeyes=False, debug=True, filter=None, func='display_all_image_info', heading=True, ignore=False, json=False, profile=False, quiet=False, truncate=True)
Backends(docker: Active, ostree: Active, )

[Errno 2] No such file or directory: '/var/lib/containers/atomic/etcd.0/info'
Traceback (most recent call last):
  File "/bin/atomic", line 185, in <module>
    sys.exit(_func())
  File "/usr/lib/python2.7/site-packages/Atomic/images.py", line 159, in display_all_image_info
    _images = self._get_images()
  File "/usr/lib/python2.7/site-packages/Atomic/images.py", line 219, in _get_images
    self._mark_used(_images)
  File "/usr/lib/python2.7/site-packages/Atomic/images.py", line 308, in _mark_used
    all_containers = [x.image for x in self.be_utils.get_containers()]
  File "/usr/lib/python2.7/site-packages/Atomic/backendutils.py", line 157, in get_containers
    con_objs += be.get_containers()
  File "/usr/lib/python2.7/site-packages/Atomic/backends/_ostree.py", line 110, in get_containers
    return [self._make_container(x) for x in self.syscontainers.get_containers()]
  File "/usr/lib/python2.7/site-packages/Atomic/syscontainers.py", line 1508, in get_containers
    return self._get_containers_at(checkouts, False, containers) + self._get_containers_at(preinstalled, True, containers)
  File "/usr/lib/python2.7/site-packages/Atomic/syscontainers.py", line 1472, in _get_containers_at
    with open(os.path.join(fullpath, "info"), "r") as info_file:
IOError: [Errno 2] No such file or directory: '/var/lib/containers/atomic/etcd.0/info'

This can be reproduced more simply by just installing a system container and then removing the info file from the directory:

# rpm -q atomic
atomic-1.21.1-1.git1170769.el7.x86_64

# atomic pull --storage ostree registry.fedoraproject.org/f27/kubernetes-apiserver
Getting image source signatures
Copying blob sha256:04331e646521ddb577d113f3c103aef620cc4451641452c347864298669f8572
 86.07 MB / 86.07 MB [======================================================] 1s
Copying blob sha256:1eb9b88722dda2113c3e55a6436d733e4feea21e772b854dbe7f92010b46eafb
 134.34 MB / 134.34 MB [====================================================] 1s
Copying blob sha256:a0d3b3cff98825af4debc7883d4c96d2deb4400592d3b6d1a60058b5e2221ae9
 52.08 MB / 52.08 MB [======================================================] 1s
Copying config sha256:8c91fa19a01abc779994110abd8146c8f38fe26eb879c83f2aed5f4699ffa2a7
 2.98 KB / 2.98 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures

# atomic install --system --system-package=no --name kube-apiserver registry.fedoraproject.org/f27/kubernetes-apiserver
Extracting to /var/lib/containers/atomic/kube-apiserver.0
Created file /usr/local/bin/kubectl
Created file /etc/kubernetes/apiserver
Created file /etc/kubernetes/config
systemctl daemon-reload
systemd-tmpfiles --create /etc/tmpfiles.d/kube-apiserver.conf
systemctl enable kube-apiserver

# rm /var/lib/containers/atomic/kube-apiserver.0/info 

# atomic images list
[Errno 2] No such file or directory: '/var/lib/containers/atomic/kube-apiserver/info'
@peterbaouoft
Copy link
Contributor

Thanks for the report, /me currently investigating it :-).

@peterbaouoft
Copy link
Contributor

peterbaouoft commented Feb 21, 2018

File "/usr/lib/python2.7/site-packages/Atomic/images.py", line 308, in _mark_used
all_containers = [x.image for x in self.be_utils.get_containers()]

Update: seems like we trigger the failure when we mark the images to be used. (code below)

    def _mark_used(self, images):
        assert isinstance(images, list)
        all_containers = [x.image for x in self.be_utils.get_containers()]
        for image in images:
            if image.id in all_containers:
                image.used = True

The way how we check image to be used is to check if the image name shows up in containers' info file.(e.g: /var/lib/containers/atomic/kube-apiserver/info for above example). If it is, then we mark the image to be used. Therefore when the info file is not there, a failure would happen. So yea, I believe it is an error on our end. The expected behavior should be the following( I think):

1: When images is pulled to the host, we should always be able to list them no matter whether its containers are valid or not. However, we could output a warning about missing info file for containers.

2: When marking images to be used, we can skip "marking used" step if one container is found non-valid?(But that would mean we need to verify containers first, which would lead to performance issues...)

So, WDYT? @giuseppe

P.S: I don't mind working on the fix, just wanted to ask for opinions before doing so :-P.

@giuseppe
Copy link
Collaborator

@peterbaouoft if the info file cannot be opened then we need to skip the container, i.e. we need to catch the exception from open(os.path.join(fullpath, "info") and not include the container in the list.

If the image cannot be marked as used, then it is not a big problem as the container is not correctly installed.

peterbaouoft added a commit to peterbaouoft/atomic that referenced this issue Feb 21, 2018
As reported by projectatomic#1191,
when info file is missing, atomic images list will unexpectedly fail.

This patch is therefore to catch the error and instead output a
warning to notify user.
@peterbaouoft
Copy link
Contributor

Hmm, thinking more about it tho, this removal of info file error needs to be carefully handled.It will impact all other container related actions as well. I found 3 atm, and I think at least atomic containers delete and atomic uninstall $container should behave properly. I think we should allow user to uninstall even broken container right? WDYT? I can rework on the patch a bit based on your opinion @giuseppe. Thanks in advance :-)!

Below are the examples of possible failures introduced with the above error:

[root@localhost atomic]# ./atomic containers delete hello-world
[Errno 2] No such file or directory: '/var/lib/containers/atomic/hello-world/info'
[root@localhost atomic]# ./atomic uninstall hello-world
[Errno 2] No such file or directory: '/var/lib/containers/atomic/hello-world/info'
[root@localhost atomic]# ./atomic containers update hello-world
[Errno 2] No such file or directory: '/var/lib/containers/atomic/hello-world/info'

@giuseppe
Copy link
Collaborator

yes, I agree with you that atomic uninstall and atomic containers delete should not care about the info file being present (just ignore it in case it is not there) as the users will probably attempt these two actions to clean up an incorrectly installed container.

peterbaouoft added a commit to peterbaouoft/atomic that referenced this issue Feb 27, 2018
As reported by projectatomic#1191,
when info file is missing, atomic images list will unexpectedly fail.

This patch is therefore to catch the error and instead output a
warning to notify user.

This patch also adds an error handling case for following commands:
1: atomic uninstall $container
2: atomic containers update $container
3: atomic containers rollback $container

Now instead of erroring out, a warning is added to notify the user
about the failure
@peterbaouoft
Copy link
Contributor

So yea... the issue can't seem to be easily solved without changing architecture of the code(not sure if it is the right way to say it). I mentioned briefly about the ways I came up with in the PR.

TL;DR, to make users to be able to perform atomic uninstall $container or atomic containers delete $container, we may need to change multiple parts of the code base.

The following is the content:

There seemed to be an easy method IMO(haven't tried it yet). In has_container for _ostree backend, we can, when get checkout is true for the container, we can return a "NULL" based container if containers' info is missing, so that we can delete the checked out container folder.

However, as mentioned in the PR, other installed files locations are tracked inside the info file as well, so we are unable to remove those. To acknowledge that, we might want a persistent storage to track those file location? (not sure if it is worth it to track the corner case tho). Willing to hear your thoughts and maybe suggestions on this, and thanks in advance! :)

@giuseppe
Copy link
Collaborator

@peterbaouoft can you just ignore the issue when info is not found during "uninstall" and let the uninstall continue (i.e. disable/remove the systemd files + drop the checkouts)?

@peterbaouoft
Copy link
Contributor

Hi thanks for the reply, IIUC, both delete and uninstall for ostree backend will call inspect_container first. Before, to solve the problem micah posted, we would need to ignore the containers when they do not have info file. However, that implementation makes the container that have missing info files "undiscoverable".

As a result, the container will not be included and has_container method will return None. That's why I kinda suggested the implementation above. Though, I think we can do an ugly hack for simplicity reasons. We can directly change the behaviour of uninstall.py to handle the missing info at the highest layer, but that may add more "code debt" to the codebase :(.

I can first start the latter implementation, it seems not going to take a long time to do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants