Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support checkpoint and restore #142

Open
Furisto opened this issue Jul 17, 2021 · 13 comments
Open

Support checkpoint and restore #142

Furisto opened this issue Jul 17, 2021 · 13 comments
Assignees

Comments

@Furisto
Copy link
Collaborator

Furisto commented Jul 17, 2021

Checkpoint and restore is supported by runc. We should also support these operations. There does not seem to be a crate that allows interacting with criu, so we probably have to write it ourselves. The go implementation would be a good starting point. If anyone has more information on this topic it would be appreciated.

@Furisto Furisto self-assigned this Jul 17, 2021
@lizhemingi
Copy link
Contributor

Is there any thing I can help ?

@utam0k
Copy link
Member

utam0k commented Jul 19, 2021

@Furisto Have you started implementing this yet?

@Furisto
Copy link
Collaborator Author

Furisto commented Jul 19, 2021

I have started with the implementation. Will let you know if I need support or if we can divide it up.

@Furisto
Copy link
Collaborator Author

Furisto commented Jul 23, 2021

Hey @duduainankai I could use your help. What I have done so far:

  • I have generated the code for the criu protobuf messages
  • I can start criu in swrk mode
  • I can send a message to criu to dump a simple process
  • The message is processed and the process successfully dumped according to the criu logs. The image files are written to the output folder.
  • I am waiting to receive a response from criu to tell me everything went well
    ... and nothing happens, I am never receiving a response

I am following the steps outlined here and in runc. Maybe you have an idea?

@lizhemingi
Copy link
Contributor

Hey @duduainankai I could use your help. What I have done so far:

  • I have generated the code for the criu protobuf messages
  • I can start criu in swrk mode
  • I can send a message to criu to dump a simple process
  • The message is processed and the process successfully dumped according to the criu logs. The image files are written to the output folder.
  • I am waiting to receive a response from criu to tell me everything went well
    ... and nothing happens, I am never receiving a response

I am following the steps outlined here and in runc. Maybe you have an idea?

Got. I will check on what you have listed and see what I can do. @Furisto

@adrianreber
Copy link
Contributor

I started the discussion about a CRIU rust interface here checkpoint-restore/criu#1722

@Furisto
Copy link
Collaborator Author

Furisto commented Jan 16, 2022

@adrianreber Thanks! Will take a look.

@adrianreber
Copy link
Contributor

I tried to checkpoint a container and it almost works. The main problem currently is that -d --detach is missing which does a setsid(). Are there any plans to implement --detach soon? With --detach checkpointing should be possible pretty easily.

@adrianreber
Copy link
Contributor

Checkpointing works now, if I do setsid() after starting the container with stdin, stdout, stderr redirected to /dev/null.

It is also important that the youki re-opens /dev/null inside the container like I did it for crun: containers/crun@bbb1fa9

This is also needed: #623

@utam0k
Copy link
Member

utam0k commented Jan 21, 2022

@adrianreber
Awesome!
Oh, I forgot about it 😭 Can I ask you to create an issue?

Are there any plans to implement --detach soon?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Nov 14, 2022

Sorry to ping the issue subscribers here, but is this still relevant? Two of the thing mentioned in above have been merged/closed, and the first issue has a related PR which has been merged. If I recall correctly, there are also some integration tests regarding the checkpoint-restore functionality. If this support is done, can we close this issue? If not, what else might be needed for it to work?

@adrianreber
Copy link
Contributor

I only implemented the checkpoint part. I did not find the time yet to implement the restore part of the code. Once restore is implemented this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants