Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default metricsController wrong args #550

Merged
merged 2 commits into from
May 19, 2019
Merged

Fix default metricsController wrong args #550

merged 2 commits into from
May 19, 2019

Conversation

hougangliu
Copy link
Member

@hougangliu hougangliu commented May 17, 2019

Fixes: #549


This change is Reviewable

@hougangliu
Copy link
Member Author

@johnugeorge
Copy link
Member

Can you explain why is this needed?

@hougangliu
Copy link
Member Author

Can you explain why is this needed?

https://github.com/kubeflow/katib/blob/master/cmd/metricscollector/v1alpha2/Dockerfile#L16 entrypoint has defined "./metricscollector", without this PR, command ./metricscollector ./metricscollector -e experimentName -t ... cannot work

@richardsliu
Copy link
Contributor

This is the v1alpha1 default metric collector: https://github.com/kubeflow/katib/blob/master/manifests/v1alpha1/studyjobcontroller/metricsControllerConfigMap.yaml#L27

How did it work before?

@andreyvelich
Copy link
Member

@richardsliu I think, it works with both way.
You can specify - "./metricscollector" in yaml file, but it makes no sense, because we have Entrypoint in Dockerfile.
In any time Metrics Collector container starts with ./metricscollector.
We can remove this https://github.com/kubeflow/katib/pull/550/files#diff-8793428d3b373ec152fbd4d75134c712L27 from MetricsCollector in v1alpha1, as well.

@hougangliu
Copy link
Member Author

I checked that metricsConlloector images (v1alpha1) in Kubeflow 0.5.1 release, find that its entrypoint is null anyway, that's why old args can work. but I checked v1alpha2 CI build, its entrypoint matches exactly what its Dockerfile specified. Anyway, we can add command in deployment to ignore entrypoint of image.

# docker inspect gcr.io/kubeflow-images-public/katib/metrics-collector:v0.1.2-alpha-156-g4ab3dbd
[
    {
        "Id": "sha256:347a8ab2553215496b71e81f224a0d2c268263f78bd9fa43139b7be6f6c23d0b",
        "RepoTags": [
            "gcr.io/kubeflow-images-public/katib/metrics-collector:v0.1.2-alpha-156-g4ab3dbd"
        ],
        "RepoDigests": [
            "gcr.io/kubeflow-images-public/katib/metrics-collector@sha256:31aa474e977219a46f52c95a75a0d3bc372a27af3144652756ffa7256d55e50e"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2019-03-29T19:09:59.723965161Z",
        "Container": "",
        "ContainerConfig": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) COPY file:134b1886acc4ade95cf03061d1b5cb03e003a8ebc6a58cd664b1adf501b8b030 in . "
            ],
            "ArgsEscaped": true,
            "Image": "sha256:5840c9e64a96d22fa84328d410e9b6ba6b3e09fa979eddf60ae3b63962b7c962",
            "Volumes": null,
            "WorkingDir": "/app",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
...
# docker inspect gcr.io/kubeflow-ci/katib/v1alpha2/metrics-collector@sha256:729b9ee72911d7c13ba739dc1a5cae2c4cc084ac93418279ecc12a697d4b5ab0
[
    {
        "Id": "sha256:0139def7d24f5b2072aaa5ab7f8a9c2d27ab5cf17ba23c31e97864304abe1c9a",
        "RepoTags": [],
        "RepoDigests": [
            "gcr.io/kubeflow-ci/katib/v1alpha2/metrics-collector@sha256:729b9ee72911d7c13ba739dc1a5cae2c4cc084ac93418279ecc12a697d4b5ab0"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2019-05-18T00:52:31.625122156Z",
        "Container": "af488ea15d7e5cfe8ebd90f9d308e6da299fd0f787e0978f12d6fd0b604f2099",
        "ContainerConfig": {
            "Hostname": "af488ea15d7e",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) ",
                "ENTRYPOINT [\"./metricscollector\"]"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:2ea5d4c8f224e7d7e66351f21e668b69b8f87b263911298c2fcffd58ca0ff102",
            "Volumes": null,
            "WorkingDir": "/app",
            "Entrypoint": [
                "./metricscollector"
            ],
            "OnBuild": null,
            "Labels": {}
        },
...

@gaocegege
Copy link
Member

gaocegege commented May 18, 2019

First, I think we need this PR. Because we should not add ./metricscollector as the argument. We already have ENTRYPOINT ./metricscollector in Dockerfile

The reason why it works for v1alpha1 without this PR, is that the v1alpha1 image is built based on older commits. In older commit like 3d4cd04#diff-864b07fc4cad5d634db8509ba7c9e494L13 , we do not have ENTRYPOINT at that time. Then Docker container's command will be correct.

The field is introduced in 287e503#diff-93a859e37952fbfb40d2778d7ac4b7e1R16 . After this commit, we will not be able to run the metrics collector becuase of a unknown arguments ./metricscollector in YAML. Then Docker container's command will be ./metricscollector ./metricscollector <other arguments>

Thus, it is a PR that we need. Thanks @hougangliu

@gaocegege
Copy link
Member

Did I explain clearly @richardsliu I think it is a problem because of the lack of regression testing

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@hougangliu
Copy link
Member Author

hougangliu commented May 19, 2019

287e503#diff-93a859e37952fbfb40d2778d7ac4b7e1R16 introducing ENTRYPOINT can explain why v1alpha1 default metric collector can work well before this PR (More exactly it can work before #527 merged).
Anyway, now this PR is in need both two versions

@hougangliu
Copy link
Member Author

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hougangliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v1alpha2 default metricCollector Pod fails to work
6 participants