Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add bestTrialId to statusJob status #312

Merged
merged 2 commits into from
Jan 3, 2019
Merged

add bestTrialId to statusJob status #312

merged 2 commits into from
Jan 3, 2019

Conversation

hougangliu
Copy link
Member

@hougangliu hougangliu commented Dec 24, 2018

Fixes: #305


This change is Reviewable

@hougangliu
Copy link
Member Author

/assign @richardsliu @YujiOshima

Only pkg/api/api.proto is updated and changes of other files under pkg/api/ are automatically generated.

@hougangliu
Copy link
Member Author

After a studyJob completed, user can get bestTrialId in studyjob status, then he can call GetTrial gRpc to get hyperparameter detail.

apiVersion: kubeflow.org/v1alpha1
kind: StudyJob
metadata:
  ...
  name: random-example
  namespace: kubeflow 
  resourceVersion: "88427"
spec:
  metricsnames:
  - accuracy
  objectivevaluename: Validation-accuracy
  optimizationgoal: 0.99
  optimizationtype: maximize
  owner: crd
  parameterconfigs:
  - feasible:
      max: "0.03"
      min: "0.01"
    name: --lr
    parametertype: double
  - feasible:
      max: "5"
      min: "2"
    name: --num-layers
    parametertype: int
  - feasible:
      list:
      - sgd
      - adam
      - ftrl
    name: --optimizer
    parametertype: categorical
  requestcount: 1
  studyName: random-example
  suggestionSpec:
    requestNumber: 3
    suggestionAlgorithm: random
    suggestionParameters:
    - name: SuggestionCount
      value: "0"
  workerSpec:
    goTemplate:
      rawTemplate: |-
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: {{.WorkerID}}
          namespace: kubeflow
        spec:
          template:
            spec:
              containers:
              - name: {{.WorkerID}}
                image: katib/mxnet-mnist-example
                command:
                - "python"
                - "/mxnet/example/image-classification/train_mnist.py"
                - "--batch-size=64"
                {{- with .HyperParameters}}
                {{- range .}}
                - "{{.Name}}={{.Value}}"
                {{- end}}
                {{- end}}
              restartPolicy: Never
status:
  bestObjectiveValue: 0.980792
  bestTrialId: t0ce0e11284b4b73
  completionTime: 2018-12-24T07:37:15Z
  conditon: Completed
  earlyStoppingParameterId: ""
  lastReconcileTime: 2018-12-24T07:37:15Z
  startTime: 2018-12-24T07:31:17Z
  studyid: y33b9b01aa7bfcc3
  suggestionCount: 1
  suggestionParameterId: vb1d0a73fdfabe2f
  trials:
  - trialid: qae2e83646d61786
    workeridlist:
    - completionTime: 2018-12-24T07:37:14Z
      conditon: Completed
      kind: Job
      objectiveValue: 0.978205
      startTime: 2018-12-24T07:31:17Z
      workerid: p17f9325c8b65f68
  - trialid: jbd34ca57e76036b
    workeridlist:
    - completionTime: 2018-12-24T07:37:14Z
      conditon: Completed
      kind: Job
      objectiveValue: 0.97492
      startTime: 2018-12-24T07:31:17Z
      workerid: y5fd2ce56c1c5074
  - trialid: t0ce0e11284b4b73
    workeridlist:
    - completionTime: 2018-12-24T07:35:13Z
      conditon: Completed
      kind: Job
      objectiveValue: 0.980792
      startTime: 2018-12-24T07:31:17Z
      workerid: h8acd2157fd748ae

@hougangliu
Copy link
Member Author

/retest

@YujiOshima
Copy link
Contributor

@hougangliu Thanks! This is great! Please generate mocks by mockgen script

@YujiOshima
Copy link
Contributor

@hougangliu It is better to have not only the best trialID but the best workerID.

@hougangliu
Copy link
Member Author

@YujiOshima thanks! I will re-submit another patch to fix your comment after #303 merged

@hougangliu
Copy link
Member Author

/retest

/**
* Get a trial configuration from DB by trial ID
*/
message GetTrialRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request may be confusing with GetTrialsRequest above, since the names differ only by 1 letter and the parameters are the same. Maybe we should rename GetTrialsRequest to ListStudyTrialsRequest? @YujiOshima What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can change it in another PR.
BTW, should we keep backward compatibility about GetTrialsRequest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should reconsider this when we define our Beta APIs. For now this looks good.

@richardsliu
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardsliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit fae6aa5 into kubeflow:master Jan 3, 2019
@hougangliu hougangliu deleted the complete-trial branch January 3, 2019 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants