Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: stats of cri manager #1431

Merged
merged 1 commit into from
Jul 9, 2018

Conversation

starnop
Copy link
Contributor

@starnop starnop commented May 29, 2018

Signed-off-by: Starnop starnop@163.com

Ⅰ. Describe what this PR did

This PR implement ContainerStats(), ListContainerStats() and ImageFsInfo() of CRI manager.

All of these interfaces are used to retrieve the metrics of containers or image file system info.

Ⅱ. Does this pull request fix one issue?

None

Ⅲ. Describe how you did it

Ⅳ. Describe how to verify it

Use crictl tools:

# crictl stats
DEBU[0000] ListContainerStatsRequest: &ListContainerStatsRequest{Filter:&ContainerStatsFilter{Id:,PodSandboxId:,LabelSelector:map[string]string{},},} 

......

CONTAINER           CPU %               MEM                 DISK                INODES
0bbc13aa25d81       0.59                30.14MB             45.06kB             14
15eef79448135       0.00                286.7kB             24.58kB             9
2e257f09a2260       0.09                9.581MB             24.58kB             6
476ae4d251fe6       0.00                290.8kB             24.58kB             9
574b07cfd946c       0.03                10.21MB             8.192kB             3
587856b374711       0.44                11.49MB             24.58kB             10
6df66169f907a       1.18                37.26MB             61.44kB             19
725a1d78af2b0       0.00                290.8kB             24.58kB             9
8b44f5423bd64       0.02                6.955MB             12.29kB             4
a1f9ccae689ed       0.00                274.4kB             24.58kB             9
a42c9875b9e82       1.13                233.7MB             36.86kB             12
c4321c743307d       0.00                225.3kB             24.58kB             9
cf2dda929ea0b       0.00                286.7kB             24.58kB             9
eb400edd89bb5       0.00                5.947MB             28.67kB             9

or use curl tools:

# curl -X GET http://127.0.0.1:10255/stats/summary

Ⅴ. Special notes for reviews

@codecov-io
Copy link

codecov-io commented May 29, 2018

Codecov Report

Merging #1431 into master will decrease coverage by 1.74%.
The diff coverage is 9.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1431      +/-   ##
==========================================
- Coverage   41.45%   39.71%   -1.75%     
==========================================
  Files         273      274       +1     
  Lines       17714    18584     +870     
==========================================
+ Hits         7344     7381      +37     
- Misses       9461    10294     +833     
  Partials      909      909
Impacted Files Coverage Δ
cri/v1alpha2/cri_utils.go 24.57% <0%> (-3.3%) ⬇️
ctrd/snapshot.go 44% <0%> (-17.12%) ⬇️
cri/v1alpha2/cri.go 0% <0%> (ø) ⬆️
cri/v1alpha1/cri.go 0% <0%> (ø) ⬆️
cri/v1alpha1/cri_utils.go 23.88% <0%> (-4.85%) ⬇️
ctrd/wrapper_client.go 77.14% <100%> (ø) ⬆️
ctrd/client_opts.go 50% <100%> (+4.54%) ⬆️
ctrd/client.go 54.93% <100%> (ø) ⬆️
ctrd/container.go 48.79% <11.11%> (-1.21%) ⬇️
daemon/mgr/container.go 35.59% <15%> (-14.8%) ⬇️
... and 7 more

Copy link
Contributor

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First Round Review

@@ -127,6 +143,18 @@ func NewCriManager(config *config.Config, ctrMgr mgr.ContainerMgr, imgMgr mgr.Im
return nil, fmt.Errorf("failed to create sandbox meta store: %v", err)
}

imageFSPath := imageFSPath(path.Join(config.HomeDir, "containerd/root"), defaultSnapshotterName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add utils' function to retrieve the root folder in the project next step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate PR will be submitted to deal with similar things.

for {
err := s.Sync()
if err != nil {
logrus.Errorf("failed to sync snapshot stats: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add TODO for this line, because it maybe print error message every 10 seconds.

The TODO is to track the error and report it to the monitor or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if sn, ok := s.snapshots[key]; ok {
return sn, nil
}
return Snapshot{}, fmt.Errorf("failed to get %q in snapshot store", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return const ErrNotFound error here. No need to custom right there.
Therefore, we can use the const error to check result of Get and make the code clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sn.Inodes = uint64(usage.Inodes)
s.store.Add(sn)
}
for _, sn := range s.store.List() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mind to add more information about this part? For example, about the delete logic. Why do you only delete the out of date snapshot?

Copy link
Contributor Author

@starnop starnop Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this operation is to keep the synchronization with client.SnapshotService.
A case:

when you remove a container,you also need to remove snapshot. However, SnapshotStore will not be notified. So we need to delete snapshots from SnapshotStore that doesn't exist actually.

return nil, fmt.Errorf("ContainerStats Not Implemented Yet")
containerID := r.GetContainerId()

container, err := c.ContainerMgr.Get(ctx, containerID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a function to get the cs? If we have the function, we will reuse the function in the ListContainerStats. Don't repeat yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is not necessary to do that, the function getContainerMetrics has been encapsulated, what has been repeated is just error handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The function like generateContainerStatInfo wraps the c.ContainerMgr.Stats and c.getContainerMetrics to avoid the repeat the same thing. In the future, you only need to update the wrapped function.

Or use the getContainerMetrics to contains the c.ContainerMgr.Stats, WDTY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit it's more elegant. Thank you.

timestamp := time.Now().UnixNano()
var usedBytes, inodesUsed uint64
for _, sn := range snapshots {
// Use the oldest timestamp as the timestamp of imagefs info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more information about this logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamp is latest time which the information were collected, we hava to be consistent. So I choose the oldest timestamp as the timestamp of imagefs info here.

@starnop starnop force-pushed the cri-stats branch 5 times, most recently from e2690e7 to 6584c5d Compare July 2, 2018 06:54
return nil, fmt.Errorf("ContainerStats Not Implemented Yet")
containerID := r.GetContainerId()

container, err := c.ContainerMgr.Get(ctx, containerID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The function like generateContainerStatInfo wraps the c.ContainerMgr.Stats and c.getContainerMetrics to avoid the repeat the same thing. In the future, you only need to update the wrapped function.

Or use the getContainerMetrics to contains the c.ContainerMgr.Stats, WDTY?

// deviceUUID gets device uuid of a device. The passed in rdev should
// be linux device number.
func deviceUUID(rdev uint64) (string, error) {
const uuidDir = "/dev/disk/by-uuid"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put the const at the beginning of file?

if err != nil {
return 0, err
}
stat := info.Sys().(*syscall.Stat_t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/pkg/os/#FileInfo Sys() maybe nil and could we use stat, ok := info.Sys().(*syscall.Stat_t) to handle here? Because the assert failure will cause the panic.

@@ -17,10 +19,16 @@ import (
"github.com/alibaba/pouch/pkg/utils"
"github.com/go-openapi/strfmt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move it into next group

@starnop starnop force-pushed the cri-stats branch 5 times, most recently from cb1cdb3 to 02f17da Compare July 6, 2018 09:31
}

// NewSnapshotsSyncer creates a snapshot syncer.
func NewSnapshotsSyncer(store *SnapshotStore, cli ctrd.APIClient, period time.Duration) *SnapshotsSyncer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSnapshotsSyncer will not be exported, rename it to newSnapshotsSyncer would be better.

@YaoZengzeng
Copy link
Contributor

Except the minor issue mentioned above, LGTM

Signed-off-by: Starnop <starnop@163.com>
@fuweid
Copy link
Contributor

fuweid commented Jul 9, 2018

LGTM

@fuweid fuweid merged commit ebf3817 into AliyunContainerService:master Jul 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants