Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comments to process module and minor refactoring #64

Merged
merged 2 commits into from
Jun 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docs/doc-draft.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ These are references to various documentations and specifications, which can be
- [Unix Sockets man page](https://man7.org/linux/man-pages/man7/unix.7.html) : Useful to understand sockets
- [prctl man page](https://man7.org/linux/man-pages/man2/prctl.2.html) : Process control man pages
- [OCI Linux spec](https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md) : Linux specific section of OCI Spec
- [pipe2 man page](https://man7.org/linux/man-pages/man2/pipe.2.html) : definition and usage of pipe2

---

Expand Down Expand Up @@ -60,5 +61,20 @@ One thing to note is that in the end, container is just another process in Linux
When given create command, Youki will load the specification, configuration, sockets etc.
forks the process into parent an child (C1), forks the child process again (C2), applies the limits, namespaces etc to the child of child (C2)process ,and runs the command/program in the C2. After the command / program is finished the C2 returns. The C1 is waiting for the C2 to exit, after which it also exits.

### Process

This handles creation of process and thus the container process. The hierarchy is :
main youki process -> intermediate child process(C1) -> Init Process (C2)

where -> indicate fork.

The main youki process sets up the pipe and forks the child process and waits on it to send message and pid of init process using pipe. The child process sets up another pipe for init process, and forks the init process. The init process then notifies the child process that it is ready, which in turn notifies the main youki process that init process is forked and its pid.

- [mio Token definition](https://docs.rs/mio/0.7.11/mio/struct.Token.html)
- [oom-score-adj](https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9)
- [unshare man page](https://man7.org/linux/man-pages/man1/unshare.1.html)
- [user-namespace man page](https://man7.org/linux/man-pages/man7/user_namespaces.7.html)
- [wait man page](https://man7.org/linux/man-pages/man3/wait.3p.html)

[oci runtime specification]: https://github.com/opencontainers/runtime-spec/blob/master/runtime.md
[runc man pages]: (https://github.com/opencontainers/runc/blob/master/man/runc.8.md)
49 changes: 41 additions & 8 deletions src/process/child.rs
Original file line number Diff line number Diff line change
@@ -1,23 +1,32 @@
use std::io::Write;
use std::{io::Read, time::Duration};

use super::{MAX_EVENTS, WAIT_FOR_INIT};
use anyhow::{bail, Result};
use mio::unix::pipe;
use mio::unix::pipe::Receiver;
use mio::unix::pipe::Sender;
use mio::{Events, Interest, Poll, Token};
use nix::unistd::Pid;
use std::io::Read;
use std::io::Write;

use crate::process::message::Message;

// Token is used to identify which socket generated an event
const CHILD: Token = Token(1);

/// Contains sending end of pipe for parent process, receiving end of pipe
/// for the init process and poller for that
pub struct ChildProcess {
sender_for_parent: Sender,
receiver: Option<Receiver>,
poll: Option<Poll>,
}

// Note : The original youki process first forks into 'parent' (P) and 'child' (C1) process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// of which this represents the child (C1) process. The C1 then again forks into parent process which is C1,
// and Child (C2) process. C2 is called as init process as it will run the command of the container. But form
// a process point of view, init process is child of child process, which is child of original youki process.
impl ChildProcess {
/// create a new Child process structure
pub fn new(sender_for_parent: Sender) -> Result<Self> {
Ok(Self {
sender_for_parent,
Expand All @@ -26,34 +35,48 @@ impl ChildProcess {
})
}

pub fn setup_uds(&mut self) -> Result<Sender> {
/// sets up sockets for init process
pub fn setup_pipe(&mut self) -> Result<Sender> {
// create a new pipe
let (sender, mut receiver) = pipe::new()?;
// create a new poll, and register the receiving end of pipe to it
// This will poll for the read events, so when data is written to sending end of the pipe,
// the receiving end will be readable and poll wil notify
let poll = Poll::new()?;
poll.registry()
.register(&mut receiver, CHILD, Interest::READABLE)?;

self.receiver = Some(receiver);
self.poll = Some(poll);
Ok(sender)
}

pub fn ready(&mut self, init_pid: Pid) -> Result<()> {
/// Indicate that child process has forked the init process to parent process
pub fn notify_parent(&mut self, init_pid: Pid) -> Result<()> {
log::debug!(
"child send to parent {:?}",
(Message::ChildReady as u8).to_be_bytes()
);
// write ChildReady message to the pipe to parent
self.write_message_for_parent(Message::ChildReady)?;
// write pid of init process which is forked by child process to the pipe,
// Pid in nix::unistd is type alias of SessionId which itself is alias of i32
self.sender_for_parent
.write_all(&(init_pid.as_raw()).to_be_bytes())?;
Ok(())
}

/// writes given message to pipe for the parent
#[inline]
fn write_message_for_parent(&mut self, msg: Message) -> Result<()> {
self.sender_for_parent
.write_all(&(msg as u8).to_be_bytes())?;
Ok(())
}

/// Wait for the init process to be ready
pub fn wait_for_init_ready(&mut self) -> Result<()> {
// make sure pipe for init process is set up
let receiver = self
.receiver
.as_mut()
Expand All @@ -63,10 +86,16 @@ impl ChildProcess {
.as_mut()
.expect("Complete the setup of uds in advance.");

let mut events = Events::with_capacity(128);
poll.poll(&mut events, Some(Duration::from_millis(1000)))?;
// Create collection with capacity to store up to MAX_EVENTS events
let mut events = Events::with_capacity(MAX_EVENTS);
// poll the receiving end of pipe created for WAIT_FOR_INIT duration an event
poll.poll(&mut events, Some(WAIT_FOR_INIT))?;
for event in events.iter() {
// check if the event token in PARENT
// note that this does not assign anything to PARENT, but instead compares PARENT and event.token()
// check http://patshaughnessy.net/2018/1/18/learning-rust-if-let-vs--match for a bit more detailed explanation
if let CHILD = event.token() {
// read message from the init process
let mut buf = [0; 1];
receiver.read_exact(&mut buf)?;
match Message::from(u8::from_be_bytes(buf)) {
Expand All @@ -77,6 +106,10 @@ impl ChildProcess {
unreachable!()
}
}
bail!("unexpected message.")
// should not reach here, as there should be a ready event from init within WAIT_FOR_INIT duration
unreachable!(
"No message received from init process within {} seconds",
WAIT_FOR_INIT.as_secs()
);
}
}
38 changes: 30 additions & 8 deletions src/process/fork.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,52 +15,64 @@ use nix::unistd;
use crate::cgroups::common::CgroupManager;
use crate::container::ContainerStatus;
use crate::process::{child, init, parent, Process};
use crate::utils;
use crate::{cond::Cond, container::Container};

/// Function to perform the first fork for in order to run the container process
pub fn fork_first<P: AsRef<Path>>(
pid_file: Option<P>,
is_userns: bool,
linux: &oci_spec::Linux,
container: &Container,
cmanager: Box<dyn CgroupManager>,
) -> Result<Process> {
// create a new pipe
let ccond = Cond::new()?;

// create new parent process structure
let (mut parent, sender_for_parent) = parent::ParentProcess::new()?;
// create a new child process structure with sending end of parent process
let child = child::ChildProcess::new(sender_for_parent)?;

unsafe {
// fork the process
match unistd::fork()? {
// in the child process
unistd::ForkResult::Child => {
utils::set_name("rc-user")?;

// if Out-of-memory score adjustment is set in specification.
// set the score value for the current process
// check https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9 for some more information
if let Some(ref r) = linux.resources {
if let Some(adj) = r.oom_score_adj {
let mut f = fs::File::create("/proc/self/oom_score_adj")?;
f.write_all(adj.to_string().as_bytes())?;
}
}

// if new user is specified in specification, this will be true
// and new namespace will be created, check https://man7.org/linux/man-pages/man7/user_namespaces.7.html
// for more information
if is_userns {
sched::unshare(sched::CloneFlags::CLONE_NEWUSER)?;
}

ccond.notify()?;

Ok(Process::Child(child))
}
// in the parent process
unistd::ForkResult::Parent { child } => {
ccond.wait()?;

// apply the control group to the child process
cmanager.apply(&linux.resources.as_ref().unwrap(), child)?;

// wait for child to fork init process and report back its pid
let init_pid = parent.wait_for_child_ready()?;
// update status and pid of the container process
container
.update_status(ContainerStatus::Created)?
.set_pid(init_pid)
.save()?;

// if file to write the pid to is specified, write pid of the child
if let Some(pid_file) = pid_file {
fs::write(&pid_file, format!("{}", child))?;
}
Expand All @@ -70,21 +82,31 @@ pub fn fork_first<P: AsRef<Path>>(
}
}

/// Function to perform the second fork, which will spawn the actual container process
pub fn fork_init(mut child_process: ChildProcess) -> Result<Process> {
let sender_for_child = child_process.setup_uds()?;
// setup sockets for init process
let sender_for_child = child_process.setup_pipe()?;
unsafe {
// for the process into current process (C1) (which is child of first_fork) and init process
match unistd::fork()? {
// if it is child process, create new InitProcess structure and return
unistd::ForkResult::Child => Ok(Process::Init(InitProcess::new(sender_for_child))),
// in the forking process C1
unistd::ForkResult::Parent { child } => {
// wait for init process to be ready
child_process.wait_for_init_ready()?;
child_process.ready(child)?;
// notify the parent process (original youki process) that init process is forked and ready
child_process.notify_parent(child)?;

// wait for the init process, which is container process, to change state
// check https://man7.org/linux/man-pages/man3/wait.3p.html for more information
match waitpid(child, None)? {
// if normally exited
WaitStatus::Exited(pid, status) => {
// cmanager.remove()?;
log::debug!("exited pid: {:?}, status: {:?}", pid, status);
exit(status);
}
// if terminated by a signal
WaitStatus::Signaled(pid, status, _) => {
log::debug!("signaled pid: {:?}, status: {:?}", pid, status);
exit(0);
Expand Down
8 changes: 8 additions & 0 deletions src/process/init.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,22 @@ use anyhow::Result;
use mio::unix::pipe::Sender;

use crate::process::message::Message;

/// Contains sending end for pipe for the child process
pub struct InitProcess {
sender_for_child: Sender,
}

impl InitProcess {
/// create a new Init process structure
pub fn new(sender_for_child: Sender) -> Self {
Self { sender_for_child }
}

/// Notify that this process is ready
// The child here is in perspective of overall hierarchy
// main youki process -> child process -> init process
// the child here does not mean child of the init process
pub fn ready(&mut self) -> Result<()> {
log::debug!(
"init send to child {:?}",
Expand All @@ -22,6 +29,7 @@ impl InitProcess {
Ok(())
}

#[inline]
fn write_message_for_child(&mut self, msg: Message) -> Result<()> {
self.sender_for_child
.write_all(&(msg as u8).to_be_bytes())?;
Expand Down
1 change: 1 addition & 0 deletions src/process/message.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/// Used as wrapper for messages to be sent between child and parent processes
#[derive(Debug)]
pub enum Message {
ChildReady = 0x00,
Expand Down
14 changes: 14 additions & 0 deletions src/process/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! This provides a thin wrapper around fork syscall,
//! with enums and functions specific to youki implemented.

use std::time::Duration;

pub mod fork;
pub mod message;

Expand All @@ -7,8 +12,17 @@ mod parent;

pub use init::InitProcess;

/// Used to describe type of process after fork.
/// Parent and child processes mean same things as in normal fork call
/// InitProcess is specifically used to indicate the process which will run the command of container
pub enum Process {
Parent(parent::ParentProcess),
Child(child::ChildProcess),
Init(init::InitProcess),
}
/// Maximum event capacity of polling
const MAX_EVENTS: usize = 128;
/// Time to wait when polling for message from child process
const WAIT_FOR_CHILD: Duration = Duration::from_secs(5);
/// Time to wait when polling for message from init process
const WAIT_FOR_INIT: Duration = Duration::from_millis(1000);
38 changes: 32 additions & 6 deletions src/process/parent.rs
Original file line number Diff line number Diff line change
@@ -1,46 +1,72 @@
use std::{io::Read, time::Duration};
use std::io::Read;

use super::{MAX_EVENTS, WAIT_FOR_CHILD};
use crate::process::message::Message;
use anyhow::{bail, Result};
use mio::unix::pipe;
use mio::unix::pipe::{Receiver, Sender};
use mio::{Events, Interest, Poll, Token};

use crate::process::message::Message;

// Token is used to identify which socket generated an event
const PARENT: Token = Token(0);

/// Contains receiving end of pipe to child process and a poller for that.
pub struct ParentProcess {
receiver: Receiver,
poll: Poll,
}

// Poll is used to register and listen for various events
// by registering it with an event source such as receiving end of a pipe
impl ParentProcess {
/// Create new Parent process structure
pub fn new() -> Result<(Self, Sender)> {
// create a new pipe
let (sender, mut receiver) = pipe::new()?;
// create a new poll, and register the receiving end of pipe to it
// This will poll for the read events, so when data is written to sending end of the pipe,
// the receiving end will be readable and poll wil notify
let poll = Poll::new()?;
poll.registry()
.register(&mut receiver, PARENT, Interest::READABLE)?;
Ok((Self { receiver, poll }, sender))
}

/// Waits for associated child process to send ready message
/// and return the pid of init process which is forked by child process
pub fn wait_for_child_ready(&mut self) -> Result<i32> {
let mut events = Events::with_capacity(128);
self.poll.poll(&mut events, Some(Duration::from_secs(5)))?;
// Create collection with capacity to store up to MAX_EVENTS events
let mut events = Events::with_capacity(MAX_EVENTS);

// poll the receiving end of pipe created for WAIT_FOR_CHILD duration for an event
self.poll.poll(&mut events, Some(WAIT_FOR_CHILD))?;
for event in events.iter() {
// check if the event token in PARENT
// note that this does not assign anything to PARENT, but instead compares PARENT and event.token()
// check http://patshaughnessy.net/2018/1/18/learning-rust-if-let-vs--match for a bit more detailed explanation
if let PARENT = event.token() {
// read data from pipe
let mut buf = [0; 1];
self.receiver.read_exact(&mut buf)?;
// convert to Message wrapper
match Message::from(u8::from_be_bytes(buf)) {
Message::ChildReady => {
// read pid of init process forked by child, 4 bytes as the type is i32
let mut buf = [0; 4];
self.receiver.read_exact(&mut buf)?;
return Ok(i32::from_be_bytes(buf));
}
msg => bail!("receive unexpected message {:?} in parent process", msg),
}
} else {
// as the poll is registered with only parent token
unreachable!()
}
}
bail!("unexpected message.")
// should not reach here, as there should be a ready event from child within WAIT_FOR_CHILD duration
unreachable!(
"No message received from child process within {} seconds",
WAIT_FOR_CHILD.as_secs()
);
Comment on lines +67 to +70
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}