Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions pod does not fail when exception is raised #1120

Closed
StefanoFioravanzo opened this issue Apr 3, 2020 · 5 comments
Closed

Suggestions pod does not fail when exception is raised #1120

StefanoFioravanzo opened this issue Apr 3, 2020 · 5 comments

Comments

@StefanoFioravanzo
Copy link
Member

/kind bug

What steps did you take and what happened:
Created a Katib experiment, using grid search and two int parameters. CRD looks like this:

apiVersion: kubeflow.org/v1alpha3
kind: Experiment
metadata:
  labels:
    controller-tools.k8s.io: '1.0'
  name: katib-simple-trial
spec:
  algorithm:
    algorithmName: grid
  parallelTrialCount: 1
  maxFailedTrialCount: 6
  maxTrialCount: 12
  objective:
    additionalMetricNames:
    goal: 100
    objectiveMetricName: result
    type: maximize
  parameters:
  - feasibleSpace:
      max: '50'
      min: '1'
      step: '10'
    name: a
    parameterType: int
  - feasibleSpace:
      max: '1'
      min: '50'
      step: '9'
    name: b
    parameterType: int
  trialTemplate:
    goTemplate:
      rawTemplate: |
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: {{.Trial}}
          namespace: {{.NameSpace}}
        spec:
          template:
            metadata:
              annotations:
                sidecar.istio.io/inject: "false"
            spec:
              restartPolicy: Never
              containers:
                - name: {{.Trial}}
                  image: <myimage>
                  command:
                    - python3 -u -c "<some_command>"

Since parameter b has min=50 and max=1, I would expect the submission of the CRD to fail.

What did you expect to happen:
What happens is that the suggestions pod is created and it starts to continuously produce the following error:

ERROR:grpc._server:Exception calling application: Low must be lower than high
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/chocolate_service.py", line 39, in GetSuggestions
    search_space, trials, request.request_number)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/chocolate/base_chocolate_service.py", line 33, in getSuggestions
    int(param.min), int(param.max), 1)
  File "/usr/local/lib/python3.6/site-packages/chocolate/space.py", line 140, in __init__
    assert low < high, "Low must be lower than high"
AssertionError: Low must be lower than high

So the katib experiment runs indefinitely without ever failing and without producing any trials. The controller logs don't help either and no events are generated, it's just the suggestions pod that produces these errors.

Environment:

  • Kubeflow version: 1.0
  • Minikube version: 1.2.0 (MiniKF latest)
  • Kubernetes version: 1.14
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
bug 0.99

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the bug label Apr 3, 2020
@gaocegege
Copy link
Member

We need to add validation for algorithm to fail fast. I think it is in the roadmap 2020.

@gaocegege
Copy link
Member

ref #1126

@stale
Copy link

stale bot commented Nov 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Dec 19, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants