Reenable remote system tests #7111

edsantiago · 2020-07-28T14:39:08Z

podman-remote is in better shape now. Let's see what needs
to be done to reenable remote system tests.

Signed-off-by: Ed Santiago santiago@redhat.com

edsantiago · 2020-07-29T18:40:42Z

This is uncomfortable: the remote test timed out but with no actual log of how far it got:

+ make remotesystem
# Wipe existing config, database, and cache: start with clean slate.
rm -f -rf /home/some13225dude/.local/share/containers /home/some13225dude/.config/containers
# Start podman server using tmp socket; loop-wait for it;
# test podman-remote; kill server, clean up tmp socket file.
# podman server spews copious unhelp
Timed out!

Restarted in hopes that it's a purely-coincidental infrastructure flake.

edsantiago · 2020-07-29T19:48:12Z

@cevich I need help here please. The failures look like real timeouts, but the logs are useless (incomplete). I have to assume there's a buffering problem, as if stdout/stderr aren't getting flushed properly. Does that sound familiar? Can you think of any way we can get full logs?

cevich · 2020-07-30T16:13:59Z

@edsantiago taking a peek...

cevich · 2020-07-30T18:47:18Z

@edsantiago this is a tricky one! The main thing that stands out to me here is the size of the onion. Maybe try peeling back a layers? For example, the wait/retry loop in a makefile is two complexities wrapped in an enigma. I know make has it's fingers into the stdio and exit-code pie, what happens w/o make being involved?

edsantiago · 2020-07-31T01:23:32Z

I think the Ubuntu issue is solved: Ubuntu /bin/sh is not bash, so [[ and &> weren't working. Those are resolved, and Ubuntu 19 & 20 tests now passed.

I see two alarming failures in special_testing_rootless:

[+0257s] not ok 74 podman volume with --userns=keep-id
[+0257s] # $ /var/tmp/go/src/github.com/containers/podman/bin/podman-remote --url unix:/tmp/podman.fpYZ0P run --rm -v /tmp/podman_bats.LsZK3v/volume_O8zRoMsGmt:/vol:z quay.io/libpod/alpine_labels:latest stat -c %u:%s /vol/myfile
[+0257s] # read unixpacket @->/run/user/23298/libpod/tmp/socket/1bee53f8e19e2b03ff773e504fead7f5994b8be5545b315a73a3d2cef290f567/attach: read: connection reset by peer
[+0257s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+0257s] # #|     FAIL: w/o keep-id: stat(file in container) == root
[+0257s] # #| expected: '0:0'
[+0257s] # #|   actual: 'read unixpacket @->/run/user/23298/libpod/tmp/socket/1bee53f8e19e2b03ff773e504fead7f5994b8be5545b315a73a3d2cef290f567/attach: read: connection reset by peer'
[+0257s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-------------------------------------

[+0297s] not ok 88 sensitive mount points are masked without --privileged
[+0297s] # $ /var/tmp/go/src/github.com/containers/podman/bin/podman-remote --url unix:/tmp/podman.fpYZ0P run --rm quay.io/libpod/alpine_labels:latest stat -c%n:%F:%h:%T:%t /dev/null /proc/acpi /proc/kcore /proc/keys /proc/latency_stats /proc/timer_list /proc/sched_debug /proc/scsi /sys/firmware /sys/fs/selinux /sys/dev
[+0297s] # can only attach to created or running containers: container state improper
[+0297s] # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
[+0297s] # #| FAIL: can only attach to created or running containers: Unknown file type ' container state improper'
[+0297s] # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first: where is that /run/user/.../socket coming from??? The podman command is very clearly invoked with --url. What is going on here?

The second, well, I'm just going to disable that test for podman-remote - but what is happening here too? It's just a podman run, no attach anywhere. I think this one is a race condition, but I'm not going to dive into it now. I want to document these now and hope we don't ignore them.

edsantiago · 2020-07-31T02:57:31Z

Okay, I don't think podman-remote is ready for prime time. These failures are weird. Giving up for the week.

cevich · 2020-07-31T15:12:36Z

Ubuntu /bin/sh is not bash

I can't tell you the number of times this catches me...a genuine PITA for no good reason IMHO...or at least no good reason I'm aware of. I'm glad to read that you got past the odd hanging/garbage problem.

The podman command is very clearly invoked with --url. What is going on here?

Also odd, I wonder if we're running into our old IFS bats bug again? IIRC, that was causing command-line args to not get passed through properly. In any case, it should be possible reasonably easy to run the command manually and see if it reproduces, no?

I think this one is a race condition

That occurred to me as well, like perhaps it's exiting just at the wrong time. Could it be due to a hiccup talking to quay.io maybe? Otherwise, given it's proximity to the first failure, I wonder if the two are somehow related. Is it possible the remote-end got wedged or crashed in some way causing both failures?

edsantiago · 2020-08-03T14:01:34Z

Just a marker to list the issues found and filed (so far) by this PR: #7116 #7117 #7119 #7122 #7123 #7128 #7135 #7136 #7137 #7139 #7195

podman-remote is in better shape now. Let's see what needs to be done to reenable remote system tests. - logs test: skip multilog, it doesn't work remote - diff test: use -l only when local, not with remote - many other tests: skip_if_remote, with 'FIXME: pending #xxxx' where xxxx is a filed issue. Unrelated: added new helper to skip_if_remote and _if_rootless, where we check if the source message includes "remote"/"rootless" and insert it if missing. This is a minor usability enhancement to make it easier to understand at-a-glance why a skip triggers. Signed-off-by: Ed Santiago <santiago@redhat.com>

edsantiago · 2020-08-03T20:00:05Z

This is ready for review - or at least as close as I can get it. Unfortunately, I just don't think podman-remote is ready for the world: it's not reliable. This PR has seen a history of flakes, most particularly what I believe is a race condition in which the client never sees output (#7195). Until that is resolved, I think it's a bad idea to enable podman-remote tests in CI: we will get a ton of flakes. But I also think it's a bad idea to leave podman-remote untested. Comments welcome.

vrothberg

LGTM
@baude @jwhonce @mheon PTAL

vrothberg · 2020-08-04T08:20:22Z

Thanks, @edsantiago! Your findings are concerning and suggest to dedicate some extra attention to podman-remote soon.

rhatdan · 2020-08-04T10:20:39Z

/approve
@edsantiago I don't really understand the comments, are you saying you don't want this PR merged?

openshift-ci-robot · 2020-08-04T10:20:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rhatdan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2020-08-04T11:21:05Z

@rhatdan I'm ambivalent. If we merge it, there is going to be a lot of developer pain because of frequent flakes and reruns. But I also don't see any incentive to fix podman-remote, so if we don't merge it I can see us going another six months without testing and with more bugs creeping into podman-remote.

I lean toward merging, I just want everyone to know that it's going to be painful if we do. (Until the flake bug is fixed).

rhatdan · 2020-08-04T12:56:41Z

/lgtm

openshift-ci-robot requested review from mrunalp and vrothberg July 28, 2020 14:39

edsantiago force-pushed the reenable_remote_system_tests branch 4 times, most recently from eb216a0 to f27d71e Compare July 29, 2020 16:56

edsantiago force-pushed the reenable_remote_system_tests branch 3 times, most recently from 63e9478 to d7bcb71 Compare July 30, 2020 23:44

edsantiago force-pushed the reenable_remote_system_tests branch from d7bcb71 to b51cd63 Compare July 31, 2020 01:26

edsantiago mentioned this pull request Aug 3, 2020

CI flake: podman-remote: no output from container (and "does not exist in database"?) #7195

Closed

edsantiago force-pushed the reenable_remote_system_tests branch from b51cd63 to a4fcf09 Compare August 3, 2020 15:38

vrothberg reviewed Aug 4, 2020

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2020

openshift-ci-robot assigned rhatdan Aug 4, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2020

openshift-merge-robot merged commit 6aed107 into containers:master Aug 4, 2020

edsantiago mentioned this pull request Aug 4, 2020

Handle podman-remote run --rm #7224

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reenable remote system tests #7111

Reenable remote system tests #7111

edsantiago commented Jul 28, 2020

edsantiago commented Jul 29, 2020

edsantiago commented Jul 29, 2020

cevich commented Jul 30, 2020

cevich commented Jul 30, 2020

edsantiago commented Jul 31, 2020

edsantiago commented Jul 31, 2020

cevich commented Jul 31, 2020

edsantiago commented Aug 3, 2020 •

edited

Loading

edsantiago commented Aug 3, 2020

vrothberg left a comment

vrothberg commented Aug 4, 2020

rhatdan commented Aug 4, 2020

openshift-ci-robot commented Aug 4, 2020

edsantiago commented Aug 4, 2020

rhatdan commented Aug 4, 2020

Reenable remote system tests #7111

Reenable remote system tests #7111

Conversation

edsantiago commented Jul 28, 2020

edsantiago commented Jul 29, 2020

edsantiago commented Jul 29, 2020

cevich commented Jul 30, 2020

cevich commented Jul 30, 2020

edsantiago commented Jul 31, 2020

edsantiago commented Jul 31, 2020

cevich commented Jul 31, 2020

edsantiago commented Aug 3, 2020 • edited Loading

edsantiago commented Aug 3, 2020

vrothberg left a comment

Choose a reason for hiding this comment

vrothberg commented Aug 4, 2020

rhatdan commented Aug 4, 2020

openshift-ci-robot commented Aug 4, 2020

edsantiago commented Aug 4, 2020

rhatdan commented Aug 4, 2020

edsantiago commented Aug 3, 2020 •

edited

Loading