-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: support checkpoint #1740
feature: support checkpoint #1740
Conversation
----------------------------------------------------------------------
FAIL: /go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:35: APIContainerCheckpointSuite.TestCheckpointCreateAPI
/go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:48:
c.Assert(err, check.IsNil)
... value *url.Error = &url.Error{Op:"Post", URL:"http://d/v1.24/containers/TestCheckpointCreateAPI/checkpoints", Err:(*errors.errorString)(0xc42004a060)} ("Post http://d/v1.24/containers/TestCheckpointCreateAPI/checkpoints: EOF")
----------------------------------------------------------------------
FAIL: /go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:95: APIContainerCheckpointSuite.TestCheckpointDelAPI
/go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:107:
c.Assert(err, check.IsNil)
... value *url.Error = &url.Error{Op:"Post", URL:"http://d/v1.24/containers/TestCheckpointDelAPI/checkpoints", Err:(*errors.errorString)(0xc42004a060)} ("Post http://d/v1.24/containers/TestCheckpointDelAPI/checkpoints: EOF")
---------------------------------------------------------------------- |
It's weird that all failed tests were passed on my machine. |
Oh, I Got it, it miss criu on test machine, it should be test first. |
Codecov Report
@@ Coverage Diff @@
## master #1740 +/- ##
==========================================
+ Coverage 58.64% 63.44% +4.79%
==========================================
Files 206 207 +1
Lines 15840 16035 +195
==========================================
+ Hits 9289 10173 +884
+ Misses 5413 4588 -825
- Partials 1138 1274 +136
|
apis/server/router.go
Outdated
@@ -33,6 +33,9 @@ func initRoute(s *Server) http.Handler { | |||
s.addRoute(r, http.MethodPost, "/daemon/update", s.updateDaemon) | |||
|
|||
// container | |||
s.addRoute(r, http.MethodPost, "/containers/{name:.*}/checkpoints", s.createContainerCheckpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could add the withCancelHandler
to wrap s.createContainerCheckpoint
, just in case that client closes connection without patience~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
cli/checkpoint.go
Outdated
apiClient := cc.cli.Client() | ||
|
||
if err := apiClient.ContainerCheckpointCreate(ctx, args[0], &types.CheckpointCreateOptions{ | ||
CheckpointID: args[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add checkpoint dir
here so that client could config checkpoint direction themselves? Because client may want to migrate container to another machine, they would like to send whole checkpoint direction to another machine~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should consider it, but It will be done later.
cli/checkpoint.go
Outdated
func (cc *CheckpointDelCommand) Init(c *Cli) { | ||
cc.cli = c | ||
cc.cmd = &cobra.Command{ | ||
Use: "delete [OPTIONS] CONTAINER CHECKPOINT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use pouch checkpoint rm
instead of pouch checkpoint delete
here to stay the same with removing containers/images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checkpint is a independent function, why we should keep same with a command name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the rm
can save time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing a bash complement for all pouch command, do it save your time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you share your bash complement script? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing... not done, it will pushed as soon as it finished.
daemon/mgr/container.go
Outdated
if err := mgr.Client.CreateCheckpoint(ctx, c.ID, dir, options.Exit); err != nil { | ||
return err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checkpointing containers successfully, we should set this container's status to checkpointed
. And this status is different from paused
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chekpoint is not a status.
@allencloud , of course, we should do proposal first, consider for my time, I will finish this as soon as possible. |
Do we need to install additional packages related to criu when experiencing this feature? @Ace-Tang We need to consider more about the doc parts. |
Do we need to add more details about the document part on how to install CRIU? @Ace-Tang |
I do not think the CRIU installtion doc should stop the pr from reviewing, by the way, you can suggest me to add a doc @allencloud . Since the pr is hold so long, can someone do the review for the code? thanks, @alibaba/pouch |
apis/swagger.yml
Outdated
@@ -3582,6 +3582,17 @@ definitions: | |||
items: | |||
type: "string" | |||
|
|||
CheckpointCreateOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add more API side details for all these APIs.
like:
/images/create:
post:
summary: "Create an image by pulling from a registry or importing from an existing source file"
consumes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done, @allencloud
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more API routes description in swagger for all these APIs.
daemon/mgr/container_checkpoint.go
Outdated
return err | ||
} | ||
|
||
if err := mgr.Client.CreateCheckpoint(ctx, c.ID, dir, options.Exit); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If options.Exit
is true
, the container would be stopped by checkpoint process. Do we need to add a markStoppedAndRelease
here to set stopped
status of this container and release some resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am not consider that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiechengsheng , I re-think the problem, you may not correct for this, markStoppedAndRelease
will called as soon as container task exit, it is running in a goroutine, called by received containerd exit channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but could you tell me which goroutine runs this markStoppedAndRelease
function? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
// TestCreate tests create checkpoint | ||
func (suite *PouchCheckpointSuite) TestCheckpointCreate(c *check.C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add one test case TestCheckpointCreateWithExitFlag
to checkpoint and stop one container, then check the stopped
status and the checkpoint of this stopped container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already add.
test/api_checkpoint_test.go
Outdated
} | ||
|
||
// TestCheckpointCreateAPI tests checkpoint create api | ||
func (suite *APIContainerCheckpointSuite) TestCheckpointCreateAPI(c *check.C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add one test case TestCheckpointCreateAPIWithExitFlag
to checkpoint and stop one container, then check the stopped status and the checkpoint of this stopped container?
cli/checkpoint.go
Outdated
func (cc *CheckpointDelCommand) Init(c *Cli) { | ||
cc.cli = c | ||
cc.cmd = &cobra.Command{ | ||
Use: "delete [OPTIONS] CONTAINER CHECKPOINT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the rm
can save time.
if !strings.HasPrefix(req.URL.Path, expectedURL) { | ||
return nil, fmt.Errorf("expected URL '%s', got '%s'", expectedURL, req.URL) | ||
} | ||
if req.Header.Get("Content-Type") == "application/json" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the Content-Type
doesn't equal application/json
, the case should fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-think this problem, maybe it also can accepted non-json type? @fuweid
ctrd/container.go
Outdated
"github.com/containerd/containerd/leases" | ||
"github.com/containerd/containerd/linux/runctypes" | ||
"github.com/containerd/containerd/oci" | ||
"github.com/docker/docker/pkg/stdcopy" | ||
"github.com/opencontainers/image-spec/specs-go/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use imagespec
instead of v1
? v1
doesn't contain any useful information.
ctrd/container.go
Outdated
return fmt.Errorf("failed to checkpoint: %s", err) | ||
} | ||
defer client.ImageService().Delete(ctx, checkpoint.Name()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we split it into one function? for example, the named getCheckpointImageDescriptor
function handled cpDesc
.
daemon/mgr/container_checkpoint.go
Outdated
if id == "" { | ||
return "", fmt.Errorf("checkpoint can not be empty") | ||
} | ||
dir := filepath.Join(mgr.Config.HomeDir, "containerd/state", "io.containerd.runtime.v1.linux", mgr.Config.Namespace, container, "checkpoint", id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make it as function?
LGTM, except that draft a simple example describing how to install criu utils and create/list/delete checkpoints, how to quickly restore a container using a checkpoint. |
LGTM @allencloud please take a look. |
return err | ||
} | ||
|
||
rw.WriteHeader(http.StatusOK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing this to status code of StatusCreated(201)?
@@ -3559,6 +3559,17 @@ definitions: | |||
Width: | |||
type: "integer" | |||
|
|||
ContainerStartOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen any API routes definition. Maybe in a follow-up pull request?
case err != nil && os.IsNotExist(err): | ||
return checkpointDir, os.MkdirAll(checkpointDir, 0700) | ||
case err != nil: | ||
return "", fmt.Errorf("failed to create checkpoint %s", checkpointID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not wrap the err
into the returned error?
) | ||
|
||
// getCheckpointDir gets container checkpoint directory. | ||
func (mgr *ContainerManager) getCheckpointDir(container, prefixDir, checkpointID string, create bool) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the implementation of this function, I am wondering if we need to add more dedicated error instance when failed. Since these errors in errtypes
packages could be used to construct the status codes for API layers. WDYT?
} | ||
|
||
if c.State.Status != types.StatusRunning { | ||
return fmt.Errorf("can not checkpoint from a %s container", c.State.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.Wrapf(errtypes.ErrConflict, "can not checkpoint from a %s container", c.State.Status)
Now we have no ErrConflict
, maybe we need to add one.
implementing checkpoint functionality, it will checkpointing the address space and state of the entire process tree to a collection of “image” files. This functionality need a external software criu. support pouch checkpoint create, list and delete. implement checkpoint in #1723 Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Ace-Tang <aceapril@126.com>
LGTM |
} | ||
|
||
resp, err := client.post(ctx, "/containers/"+name+"/start", q, nil, nil) | ||
func (client *APIClient) ContainerStart(ctx context.Context, name string, options types.ContainerStartOptions) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this change has modified the API, we need to revert this part.
support checkpoint create,list,delete
Signed-off-by: Ace-Tang aceapril@126.com
Ⅰ. Describe what this PR did
support funtion
pouch checkpoint create
create a checkpoint from a running container using criu.
pouch checkpoint ls
list a container all checkpoints.
pouch checkpoint rm
delete a container checkpoint.
pouch start --checkpoint
restore a container from a checkpoint image.
support checkpoint in #1723.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how you did it
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews