Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate sporadic CI failures #126

Closed
Furisto opened this issue Jul 8, 2021 · 3 comments
Closed

Investigate sporadic CI failures #126

Furisto opened this issue Jul 8, 2021 · 3 comments
Assignees

Comments

@Furisto
Copy link
Collaborator

Furisto commented Jul 8, 2021

The CI fails sporadically during the kill command test when calling the start command with " could not be started because it was Stopped". I have not yet been able to reproduce this behavior locally.

This is the output of youki (with additional traces) when running the test successfully.

[INFO src/state.rs:16] 2021-07-08T12:12:31.874083870+02:00 CALLED STATE
[INFO src/delete.rs:26] 2021-07-08T12:12:31.876735623+02:00 CALLED DELETE
[INFO src/create.rs:34] 2021-07-08T12:12:31.880261860+02:00 CALLED CREATE
[INFO src/cgroups/common.rs:155] 2021-07-08T12:12:31.891048269+02:00 cgroup manager V1 will be used
[WARN src/capabilities.rs:27] 2021-07-08T12:12:31.916619716+02:00 CAP_CHECKPOINT_RESTORE doesn't support.
[WARN src/capabilities.rs:27] 2021-07-08T12:12:31.916699714+02:00 CAP_PERFMON doesn't support.
[WARN src/capabilities.rs:27] 2021-07-08T12:12:31.916721314+02:00 CAP_BPF doesn't support.
[INFO src/state.rs:16] 2021-07-08T12:12:31.984294516+02:00 CALLED STATE
[INFO src/kill.rs:20] 2021-07-08T12:12:31.994463136+02:00 CALLED KILL
[INFO src/state.rs:16] 2021-07-08T12:12:32.003495876+02:00 CALLED STATE
[INFO src/delete.rs:26] 2021-07-08T12:12:32.007922697+02:00 CALLED DELETE
[INFO src/cgroups/common.rs:155] 2021-07-08T12:12:32.020250779+02:00 cgroup manager V1 will be used
[INFO src/state.rs:16] 2021-07-08T12:12:32.051025233+02:00 CALLED STATE
[INFO src/create.rs:34] 2021-07-08T12:12:32.054227576+02:00 CALLED CREATE
[INFO src/cgroups/common.rs:155] 2021-07-08T12:12:32.063796607+02:00 cgroup manager V1 will be used
[WARN src/capabilities.rs:27] 2021-07-08T12:12:32.116028281+02:00 CAP_PERFMON doesn't support.
[WARN src/capabilities.rs:27] 2021-07-08T12:12:32.116231877+02:00 CAP_BPF doesn't support.
[WARN src/capabilities.rs:27] 2021-07-08T12:12:32.116306276+02:00 CAP_CHECKPOINT_RESTORE doesn't support.
[INFO src/start.rs:19] 2021-07-08T12:12:32.145331061+02:00 CALLED START
[INFO src/state.rs:16] 2021-07-08T12:12:32.148588804+02:00 CALLED STATE
[INFO src/kill.rs:20] 2021-07-08T12:12:32.151353455+02:00 CALLED KILL
[INFO src/state.rs:16] 2021-07-08T12:12:32.153879310+02:00 CALLED STATE
[INFO src/delete.rs:26] 2021-07-08T12:12:32.156782258+02:00 CALLED DELETE
[INFO src/cgroups/common.rs:155] 2021-07-08T12:12:32.165717500+02:00 cgroup manager V1 will be used
[INFO src/state.rs:16] 2021-07-08T12:12:32.190686057+02:00 CALLED STATE
[INFO src/create.rs:34] 2021-07-08T12:12:32.193853601+02:00 CALLED CREATE
[INFO src/cgroups/common.rs:155] 2021-07-08T12:12:32.203325533+02:00 cgroup manager V1 will be used
[WARN src/capabilities.rs:27] 2021-07-08T12:12:32.251435881+02:00 CAP_BPF doesn't support.
[WARN src/capabilities.rs:27] 2021-07-08T12:12:32.251517179+02:00 CAP_CHECKPOINT_RESTORE doesn't support.
[WARN src/capabilities.rs:27] 2021-07-08T12:12:32.251542679+02:00 CAP_PERFMON doesn't support.
[INFO src/start.rs:19] 2021-07-08T12:12:32.277765314+02:00 CALLED START
[INFO src/state.rs:16] 2021-07-08T12:12:32.280738961+02:00 CALLED STATE
[INFO src/kill.rs:20] 2021-07-08T12:12:32.283505912+02:00 CALLED KILL
[INFO src/state.rs:16] 2021-07-08T12:12:32.286962351+02:00 CALLED STATE
[INFO src/delete.rs:26] 2021-07-08T12:12:32.289950498+02:00 CALLED DELETE
[INFO src/cgroups/common.rs:155] 2021-07-08T12:12:32.299679825+02:00 cgroup manager V1 will be used
[INFO src/state.rs:16] 2021-07-08T12:12:32.347732274+02:00 CALLED STATE

This is where we are checking the state.

pub fn refresh_status(&mut self) -> Result<Self> {
        let new_status = match self.pid() {
            Some(pid) => {
                // Note that Process::new does not spawn a new process
                // but instead creates a new Process structure, and fill
                // it with information about the process with given pid
                if let Ok(proc) = Process::new(pid.as_raw()) {
                    use procfs::process::ProcState;
                    match proc.stat.state().unwrap() {
                        ProcState::Zombie | ProcState::Dead => ContainerStatus::Stopped,
                        _ => match self.status() {
                            ContainerStatus::Creating | ContainerStatus::Created => self.status(),
                            _ => ContainerStatus::Running,
                        },
                    }
                } else {
                    ContainerStatus::Stopped
                }
            }
            None => ContainerStatus::Stopped,
        };
        Ok(self.update_status(new_status))
    }

It appears like the container process doesn't exist anymore. The question is why do we see this behavior only in the kill tests?

@utam0k
Copy link
Member

utam0k commented Aug 9, 2021

It doesn't happen much anymore...

@utam0k
Copy link
Member

utam0k commented Aug 9, 2021

How about we wait for this issue to be implemented?
#56

@utam0k
Copy link
Member

utam0k commented Aug 31, 2021

I haven't encountered it at all lately, so I'll close once.

@utam0k utam0k closed this as completed Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants