Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support checkpoint #1740

Merged
merged 3 commits into from
Aug 10, 2018
Merged

feature: support checkpoint #1740

merged 3 commits into from
Aug 10, 2018

Conversation

Ace-Tang
Copy link
Contributor

@Ace-Tang Ace-Tang commented Jul 16, 2018

support checkpoint create,list,delete

Signed-off-by: Ace-Tang aceapril@126.com

Ⅰ. Describe what this PR did

support funtion

  1. pouch checkpoint create
    create a checkpoint from a running container using criu.

  2. pouch checkpoint ls
    list a container all checkpoints.

  3. pouch checkpoint rm
    delete a container checkpoint.

  4. pouch start --checkpoint
    restore a container from a checkpoint image.

support checkpoint in #1723.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how you did it

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

@chuanchang
Copy link
Contributor

chuanchang commented Jul 16, 2018

----------------------------------------------------------------------
FAIL: /go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:35: APIContainerCheckpointSuite.TestCheckpointCreateAPI
/go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:48:
    c.Assert(err, check.IsNil)
... value *url.Error = &url.Error{Op:"Post", URL:"http://d/v1.24/containers/TestCheckpointCreateAPI/checkpoints", Err:(*errors.errorString)(0xc42004a060)} ("Post http://d/v1.24/containers/TestCheckpointCreateAPI/checkpoints: EOF")
----------------------------------------------------------------------
FAIL: /go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:95: APIContainerCheckpointSuite.TestCheckpointDelAPI
/go/src/github.com/alibaba/pouch/test/api_checkpoint_test.go:107:
    c.Assert(err, check.IsNil)
... value *url.Error = &url.Error{Op:"Post", URL:"http://d/v1.24/containers/TestCheckpointDelAPI/checkpoints", Err:(*errors.errorString)(0xc42004a060)} ("Post http://d/v1.24/containers/TestCheckpointDelAPI/checkpoints: EOF")
----------------------------------------------------------------------

@Ace-Tang
Copy link
Contributor Author

It's weird that all failed tests were passed on my machine.

@Ace-Tang
Copy link
Contributor Author

Oh, I Got it, it miss criu on test machine, it should be test first.

@codecov-io
Copy link

codecov-io commented Jul 16, 2018

Codecov Report

Merging #1740 into master will increase coverage by 4.79%.
The diff coverage is 31.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1740      +/-   ##
==========================================
+ Coverage   58.64%   63.44%   +4.79%     
==========================================
  Files         206      207       +1     
  Lines       15840    16035     +195     
==========================================
+ Hits         9289    10173     +884     
+ Misses       5413     4588     -825     
- Partials     1138     1274     +136
Flag Coverage Δ
#criv1alpha1test 32.88% <11%> (-0.32%) ⬇️
#criv1alpha2test 33.41% <11%> (?)
#integrationtest 37.74% <25.22%> (-0.2%) ⬇️
#unittest 23.46% <4.58%> (-0.27%) ⬇️
Impacted Files Coverage Δ
cri/v1alpha1/cri.go 65.36% <100%> (ø) ⬆️
apis/server/router.go 91.6% <100%> (+0.17%) ⬆️
cri/v1alpha2/cri.go 66.02% <100%> (+66.02%) ⬆️
ctrd/container.go 43.5% <18.07%> (-7.23%) ⬇️
daemon/mgr/container_checkpoint.go 22.78% <22.78%> (ø)
daemon/mgr/container.go 55.28% <53.33%> (-0.24%) ⬇️
apis/server/container_bridge.go 78.84% <61.76%> (-2.74%) ⬇️
apis/server/utils.go 61.9% <0%> (-4.77%) ⬇️
daemon/logger/jsonfile/utils.go 71.54% <0%> (-1.63%) ⬇️
... and 12 more

@allencloud
Copy link
Collaborator

Please add a proposal in the issue list first. This is a big big feature in PouchContainer, we must to explain the feature demand much more in issue #1723. Thanks a lot.

After maintainers review you proposal well, we start the code review part. Thanks for your cooperation. @Ace-Tang 🌹 ❤️

@@ -33,6 +33,9 @@ func initRoute(s *Server) http.Handler {
s.addRoute(r, http.MethodPost, "/daemon/update", s.updateDaemon)

// container
s.addRoute(r, http.MethodPost, "/containers/{name:.*}/checkpoints", s.createContainerCheckpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add the withCancelHandler to wrap s.createContainerCheckpoint, just in case that client closes connection without patience~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

apiClient := cc.cli.Client()

if err := apiClient.ContainerCheckpointCreate(ctx, args[0], &types.CheckpointCreateOptions{
CheckpointID: args[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add checkpoint dir here so that client could config checkpoint direction themselves? Because client may want to migrate container to another machine, they would like to send whole checkpoint direction to another machine~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should consider it, but It will be done later.

func (cc *CheckpointDelCommand) Init(c *Cli) {
cc.cli = c
cc.cmd = &cobra.Command{
Use: "delete [OPTIONS] CONTAINER CHECKPOINT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use pouch checkpoint rm instead of pouch checkpoint delete here to stay the same with removing containers/images?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checkpint is a independent function, why we should keep same with a command name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the rm can save time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing a bash complement for all pouch command, do it save your time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you share your bash complement script? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing... not done, it will pushed as soon as it finished.

if err := mgr.Client.CreateCheckpoint(ctx, c.ID, dir, options.Exit); err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checkpointing containers successfully, we should set this container's status to checkpointed. And this status is different from paused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chekpoint is not a status.

@Ace-Tang
Copy link
Contributor Author

@allencloud , of course, we should do proposal first, consider for my time, I will finish this as soon as possible.

@allencloud
Copy link
Collaborator

Do we need to install additional packages related to criu when experiencing this feature? @Ace-Tang

We need to consider more about the doc parts.

@allencloud
Copy link
Collaborator

Do we need to add more details about the document part on how to install CRIU? @Ace-Tang

@Ace-Tang
Copy link
Contributor Author

I do not think the CRIU installtion doc should stop the pr from reviewing, by the way, you can suggest me to add a doc @allencloud . Since the pr is hold so long, can someone do the review for the code? thanks, @alibaba/pouch

apis/swagger.yml Outdated
@@ -3582,6 +3582,17 @@ definitions:
items:
type: "string"

CheckpointCreateOptions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add more API side details for all these APIs.

like:

  /images/create:
    post:
      summary: "Create an image by pulling from a registry or importing from an existing source file"
      consumes:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done, @allencloud

Copy link
Collaborator

@allencloud allencloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need more API routes description in swagger for all these APIs.

return err
}

if err := mgr.Client.CreateCheckpoint(ctx, c.ID, dir, options.Exit); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If options.Exit is true, the container would be stopped by checkpoint process. Do we need to add a markStoppedAndRelease here to set stopped status of this container and release some resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am not consider that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiechengsheng , I re-think the problem, you may not correct for this, markStoppedAndRelease will called as soon as container task exit, it is running in a goroutine, called by received containerd exit channel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but could you tell me which goroutine runs this markStoppedAndRelease function? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// TestCreate tests create checkpoint
func (suite *PouchCheckpointSuite) TestCheckpointCreate(c *check.C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add one test case TestCheckpointCreateWithExitFlag to checkpoint and stop one container, then check the stopped status and the checkpoint of this stopped container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already add.

}

// TestCheckpointCreateAPI tests checkpoint create api
func (suite *APIContainerCheckpointSuite) TestCheckpointCreateAPI(c *check.C) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add one test case TestCheckpointCreateAPIWithExitFlag to checkpoint and stop one container, then check the stopped status and the checkpoint of this stopped container?

func (cc *CheckpointDelCommand) Init(c *Cli) {
cc.cli = c
cc.cmd = &cobra.Command{
Use: "delete [OPTIONS] CONTAINER CHECKPOINT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the rm can save time.

if !strings.HasPrefix(req.URL.Path, expectedURL) {
return nil, fmt.Errorf("expected URL '%s', got '%s'", expectedURL, req.URL)
}
if req.Header.Get("Content-Type") == "application/json" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the Content-Type doesn't equal application/json, the case should fail, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-think this problem, maybe it also can accepted non-json type? @fuweid

"github.com/containerd/containerd/leases"
"github.com/containerd/containerd/linux/runctypes"
"github.com/containerd/containerd/oci"
"github.com/docker/docker/pkg/stdcopy"
"github.com/opencontainers/image-spec/specs-go/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use imagespec instead of v1 ? v1 doesn't contain any useful information.

return fmt.Errorf("failed to checkpoint: %s", err)
}
defer client.ImageService().Delete(ctx, checkpoint.Name())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we split it into one function? for example, the named getCheckpointImageDescriptor function handled cpDesc.

if id == "" {
return "", fmt.Errorf("checkpoint can not be empty")
}
dir := filepath.Join(mgr.Config.HomeDir, "containerd/state", "io.containerd.runtime.v1.linux", mgr.Config.Namespace, container, "checkpoint", id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make it as function?

@yyb196
Copy link
Collaborator

yyb196 commented Aug 8, 2018

LGTM, except that draft a simple example describing how to install criu utils and create/list/delete checkpoints, how to quickly restore a container using a checkpoint.

@pouchrobot pouchrobot added LGTM one maintainer or community participant agrees to merge the pull reuqest. conflict/needs-rebase and removed LGTM one maintainer or community participant agrees to merge the pull reuqest. conflict/needs-rebase labels Aug 8, 2018
@fuweid
Copy link
Contributor

fuweid commented Aug 10, 2018

LGTM @allencloud please take a look.

return err
}

rw.WriteHeader(http.StatusOK)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing this to status code of StatusCreated(201)?

@@ -3559,6 +3559,17 @@ definitions:
Width:
type: "integer"

ContainerStartOptions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not seen any API routes definition. Maybe in a follow-up pull request?

case err != nil && os.IsNotExist(err):
return checkpointDir, os.MkdirAll(checkpointDir, 0700)
case err != nil:
return "", fmt.Errorf("failed to create checkpoint %s", checkpointID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not wrap the err into the returned error?

)

// getCheckpointDir gets container checkpoint directory.
func (mgr *ContainerManager) getCheckpointDir(container, prefixDir, checkpointID string, create bool) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the implementation of this function, I am wondering if we need to add more dedicated error instance when failed. Since these errors in errtypes packages could be used to construct the status codes for API layers. WDYT?

}

if c.State.Status != types.StatusRunning {
return fmt.Errorf("can not checkpoint from a %s container", c.State.Status)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return errors.Wrapf(errtypes.ErrConflict, "can not checkpoint from a %s container", c.State.Status)
Now we have no ErrConflict, maybe we need to add one.

implementing checkpoint functionality, it will checkpointing the
address space and state of the entire process tree to a collection
of “image” files. This functionality need a external software criu.

support pouch checkpoint create, list and delete.

implement checkpoint in #1723

Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Ace-Tang <aceapril@126.com>
Signed-off-by: Ace-Tang <aceapril@126.com>
@allencloud
Copy link
Collaborator

LGTM

@allencloud allencloud merged commit 3d16d73 into AliyunContainerService:master Aug 10, 2018
@pouchrobot pouchrobot added the LGTM one maintainer or community participant agrees to merge the pull reuqest. label Aug 10, 2018
}

resp, err := client.post(ctx, "/containers/"+name+"/start", q, nil, nil)
func (client *APIClient) ContainerStart(ctx context.Context, name string, options types.ContainerStartOptions) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this change has modified the API, we need to revert this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature LGTM one maintainer or community participant agrees to merge the pull reuqest. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants