Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial checkpoint support #641

Merged
merged 4 commits into from
Feb 25, 2022
Merged

Conversation

adrianreber
Copy link
Contributor

@adrianreber adrianreber commented Jan 24, 2022

This is the initial draft for checkpoint support. No restore support yet.

Tests are still missing and it relies on the not yet officially release Rust CRIU bindings. For now it is just a draft.

It depends on #632

I called the checkpoint command ´checkpointt' with two ts so that Podman does not think youki already fully supports checkpoint/restore. Once everything works I will rename it.

@adrianreber adrianreber force-pushed the 2022-01-24-checkpointt branch 2 times, most recently from 5c4fca1 to 7ae3c3c Compare January 25, 2022 17:02
@codecov-commenter
Copy link

Codecov Report

Merging #641 (7ae3c3c) into main (1b810d4) will decrease coverage by 0.48%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #641      +/-   ##
==========================================
- Coverage   69.69%   69.21%   -0.49%     
==========================================
  Files          84       85       +1     
  Lines       10983    11059      +76     
==========================================
- Hits         7655     7654       -1     
- Misses       3328     3405      +77     

@adrianreber
Copy link
Contributor Author

Hmm, the test succeeds locally with runc. Need to figure out why it does not work in CI.

@adrianreber adrianreber force-pushed the 2022-01-24-checkpointt branch 6 times, most recently from 78cde5e to e9b0ed9 Compare January 26, 2022 09:59
@adrianreber
Copy link
Contributor Author

Tests are still failing as I only tested on a cgroup v2 system. v1 needs some additional code.

@adrianreber adrianreber force-pushed the 2022-01-24-checkpointt branch 3 times, most recently from 1dd0f57 to 63105b0 Compare January 26, 2022 13:30
}

let directory = std::fs::File::open(&opts.image_path)?;
criu.set_images_dir_fd(directory.as_raw_fd());
Copy link

@rst0git rst0git Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@Furisto Furisto Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create the directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to create it, but we could if it does not exist. I think in Podman we create it to make sure it gets created with 0600. We could do it here. Or not. I have not strong opinion. If it does not exist the user will see an error. By creating the directory we could avoid that error. On the other hand, if it does not exist then the user has to make a conscious decision where the checkpoint directory is. I have a couple of checkpoint directories all around my development system because the directory is automatically created. Both options have advantages and disadvantages.

We could leave it for now and if users are complaining then it can still be added later. Or if one of the container engines needs it, it can also be added later.

@adrianreber
Copy link
Contributor Author

Ready for review. All tests are successful.

@utam0k
Copy link
Member

utam0k commented Feb 15, 2022

Thanks for your PR. But I don't have the time to review this PR on my weekday. So I'll check on next holiday, please wait just a moment 🙏

@adrianreber
Copy link
Contributor Author

Thanks for your PR. But I don't have the time to review this PR on my weekday. So I'll check on next holiday, please wait just a moment pray

No problem.

crates/libcontainer/src/container/container_checkpoint.rs Outdated Show resolved Hide resolved
}

let directory = std::fs::File::open(&opts.image_path)?;
criu.set_images_dir_fd(directory.as_raw_fd());
Copy link
Collaborator

@Furisto Furisto Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create the directory?

crates/libcontainer/src/container/container_checkpoint.rs Outdated Show resolved Hide resolved
@Furisto
Copy link
Collaborator

Furisto commented Feb 15, 2022

Did not manage to go through everything today, so feel free to wait until I am finished.

@adrianreber
Copy link
Contributor Author

Changed as suggested by @Furisto

pub fn checkpoint(args: Checkpoint, root_path: PathBuf) -> Result<()> {
log::debug!("start checkpointing container {}", args.container_id);
let mut container = load_container(root_path, &args.container_id)?;
let opts = libcontainer::container::CheckpointOptions {
Copy link
Member

@utam0k utam0k Feb 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you implement impl From<CheckpointOption> from Checkpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this question. Maybe my rust knowledge is not good enough. Can you be more specific about what needs to change here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianreber I'm sorry for my inadequate explanation. Why don't you implement this trait? I feel it would be more Rust-like if the conversion from Checkpoint to CheckpointOptions could be done using from(). WDYT?
https://doc.rust-lang.org/std/convert/trait.From.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I tried it, but I am not sure where. If I try to implement it in the youki create I get:

impl doesn't use only types from inside the current crate

If I add it to the liboci_cli crate I need to access information from libcontainer and also the other way around. Not sure if that is desired or even possible.

Any recommendations where I need to add the corresponding implementation?

@utam0k
Copy link
Member

utam0k commented Feb 19, 2022

Great idea. I think we can represent the WIP by adding it to the feature table in the README.md, WDYT?

I called the checkpoint command ´checkpointt' with two ts so that Podman does not think youki already fully supports checkpoint/restore. Once everything works I will rename it.

crates/libcontainer/src/container/container_checkpoint.rs Outdated Show resolved Hide resolved
crates/libcontainer/src/container/container_checkpoint.rs Outdated Show resolved Hide resolved
crates/liboci-cli/src/checkpoint.rs Outdated Show resolved Hide resolved
crates/libcontainer/src/container/container_checkpoint.rs Outdated Show resolved Hide resolved
@@ -36,9 +36,14 @@ jobs:
uses: Swatinem/rust-cache@v1
- run: sudo apt-get -y update
- run: sudo apt-get install -y pkg-config libsystemd-dev libdbus-glib-1-dev libelf-dev libseccomp-dev
- name: Install runc 1.1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to install this version of runc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runc 1.1.0 changed how it handles --work-dir (or --work-path). Before 1.1.0 runc would create a directory called criu-work if the user does not specify a working directory. CRIU's default behaviour is to put the log and statistics files in the image path if work dir is not set. With runc 1.1.0 runc behaves like CRIU. That is also what I did for this PR and therefore the tests do not pass with older versions of runc.

@adrianreber
Copy link
Contributor Author

Great idea. I think we can represent the WIP by adding it to the feature table in the README.md, WDYT?

I called the checkpoint command ´checkpointt' with two ts so that Podman does not think youki already fully supports checkpoint/restore. Once everything works I will rename it.

In this PR or in another PR?

@utam0k
Copy link
Member

utam0k commented Feb 23, 2022

Great idea. I think we can represent the WIP by adding it to the feature table in the README.md, WDYT?

I called the checkpoint command ´checkpointt' with two ts so that Podman does not think youki already fully supports checkpoint/restore. Once everything works I will rename it.

In this PR or in another PR?

@adrianreber Can I ask for this PR?

@adrianreber
Copy link
Contributor Author

Did most of the suggested changes (besides the one with impl From<) and also added a line to README.md.

0..2 does not include 2. Change it to 0..3 to include 2.

Signed-off-by: Adrian Reber <areber@redhat.com>
This adds the first code to checkpoint a container. The checkpoint
command is name 'checkpointt' (with two 't's at the end) so that
container engines like Podman do not think to use this not yet finished
checkpoint restore implementation.

For Podman it is still necessary to tell CRIU that the network namespace
is external at least and restoring needs special handling to support
'--console-socket'.

Signed-off-by: Adrian Reber <areber@redhat.com>
If stdin is not pointing to /dev/null checkpointing fails for now. Just
point it to /dev/null.

Signed-off-by: Adrian Reber <areber@redhat.com>
This still uses the subcommand 'checkpointt' until it works in
combination with Podman.

Signed-off-by: Adrian Reber <areber@redhat.com>
Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianreber LGTM 💯
Thanks!!!

@utam0k utam0k merged commit 6c60abd into youki-dev:main Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants