Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem about katib-db #2020

Closed
skgreenstar opened this issue Nov 18, 2022 · 10 comments
Closed

Problem about katib-db #2020

skgreenstar opened this issue Nov 18, 2022 · 10 comments

Comments

@skgreenstar
Copy link

skgreenstar commented Nov 18, 2022

image

I have the same phenomenon. However, I am not a katib-controller problem, even though everything was done normally and the job was done normally, the data does not accumulate on the db.

For your information, I am connecting to a mysql server other than mysql provided by katib. The connection to the corresponding db is normal, and the observe_logs table is also normally created.

However, the data is not accumulated in the db.

스크린샷 2022-11-16 오후 7 48 10

What's the problem? Is there a solution?

Originally posted by @skgreenstar in #1949 (comment)

@johnugeorge
Copy link
Member

Can you check logs of katib-controller , katib-db-manager if there are any errors?

@skgreenstar
Copy link
Author

skgreenstar commented Nov 18, 2022

katib-controller, katib-db-manager logs.

I used a random example.

katib-db-manager
image

katib-controller
{"level":"info","ts":1668414800.0215728,"logger":"entrypoint","msg":"Config:","experiment-suggestion-name":"default","webhook-port":8443,"metrics-addr":":8080","inject-security-context":false,"enable-grpc-probe-in-suggestion":true,"trial-resources":[{"Group":"batch","Version":"v1","Kind":"Job"},{"Group":"kubeflow.org","Version":"v1","Kind":"TFJob"},{"Group":"kubeflow.org","Version":"v1","Kind":"PyTorchJob"},{"Group":"kubeflow.org","Version":"v1","Kind":"MPIJob"},{"Group":"kubeflow.org","Version":"v1","Kind":"XGBoostJob"},{"Group":"kubeflow.org","Version":"v1","Kind":"MXJob"}]} {"level":"info","ts":1668414800.376007,"logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"} {"level":"info","ts":1668414800.3764246,"logger":"entrypoint","msg":"Registering Components."} {"level":"info","ts":1668414800.3765895,"logger":"entrypoint","msg":"Setting up controller."} {"level":"info","ts":1668414800.3766088,"logger":"experiment-controller","msg":"Using the default suggestion implementation"} {"level":"info","ts":1668414800.3767135,"logger":"experiment-controller","msg":"Experiment controller created"} {"level":"info","ts":1668414800.376745,"logger":"suggestion-controller","msg":"Suggestion controller created"} {"level":"info","ts":1668414800.3768158,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"batch","CRD Version":"v1","CRD Kind":"Job"} {"level":"info","ts":1668414800.3768592,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"kubeflow.org","CRD Version":"v1","CRD Kind":"TFJob"} {"level":"info","ts":1668414800.3769019,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"kubeflow.org","CRD Version":"v1","CRD Kind":"PyTorchJob"} {"level":"info","ts":1668414800.3769283,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"kubeflow.org","CRD Version":"v1","CRD Kind":"MPIJob"} {"level":"info","ts":1668414800.3769667,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"kubeflow.org","CRD Version":"v1","CRD Kind":"XGBoostJob"} {"level":"info","ts":1668414800.3769836,"logger":"trial-controller","msg":"Job watch added successfully","CRD Group":"kubeflow.org","CRD Version":"v1","CRD Kind":"MXJob"} {"level":"info","ts":1668414800.3769927,"logger":"trial-controller","msg":"Trial controller created"} {"level":"info","ts":1668414800.3769958,"logger":"entrypoint","msg":"Setting up webhooks."} {"level":"info","ts":1668414800.3771226,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-experiment"} {"level":"info","ts":1668414800.3772378,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-experiment"} {"level":"info","ts":1668414800.3774362,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-pod"} {"level":"info","ts":1668414800.3774776,"logger":"entrypoint","msg":"Starting the Cmd."} {"level":"info","ts":1668414800.3777974,"logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"} {"level":"info","ts":1668414800.3779306,"msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"} {"level":"info","ts":1668414800.378101,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"} {"level":"info","ts":1668414800.378208,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":8443} {"level":"info","ts":1668414800.3782678,"logger":"controller.experiment-controller","msg":"Starting EventSource","source":"kind source: *v1beta1.Experiment"} {"level":"info","ts":1668414800.378303,"logger":"controller.experiment-controller","msg":"Starting EventSource","source":"kind source: *v1beta1.Trial"} {"level":"info","ts":1668414800.378327,"logger":"controller.experiment-controller","msg":"Starting EventSource","source":"kind source: *v1beta1.Suggestion"} {"level":"info","ts":1668414800.3783312,"logger":"controller.experiment-controller","msg":"Starting Controller"} {"level":"info","ts":1668414800.378488,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"} {"level":"info","ts":1668414800.3785033,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *v1beta1.Trial"} {"level":"info","ts":1668414800.3785233,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1668414800.3785284,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1668414800.378533,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1668414800.3785377,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1668414800.3785446,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1668414800.3785653,"logger":"controller.trial-controller","msg":"Starting EventSource","source":"kind source: *unstructured.Unstructured"} {"level":"info","ts":1668414800.3785691,"logger":"controller.trial-controller","msg":"Starting Controller"} {"level":"info","ts":1668414800.3785753,"logger":"controller.suggestion-controller","msg":"Starting EventSource","source":"kind source: *v1beta1.Suggestion"} {"level":"info","ts":1668414800.3785987,"logger":"controller.suggestion-controller","msg":"Starting EventSource","source":"kind source: *v1.Deployment"} {"level":"info","ts":1668414800.3786066,"logger":"controller.suggestion-controller","msg":"Starting EventSource","source":"kind source: *v1.Service"} {"level":"info","ts":1668414800.3786135,"logger":"controller.suggestion-controller","msg":"Starting EventSource","source":"kind source: *v1.PersistentVolumeClaim"} {"level":"info","ts":1668414800.378618,"logger":"controller.suggestion-controller","msg":"Starting Controller"} {"level":"info","ts":1668414800.6810136,"logger":"controller.suggestion-controller","msg":"Starting workers","worker count":1} {"level":"info","ts":1668414800.68106,"logger":"controller.experiment-controller","msg":"Starting workers","worker count":1} {"level":"info","ts":1668414800.6810784,"logger":"controller.trial-controller","msg":"Starting workers","worker count":1} {"level":"info","ts":1668586651.541223,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow-oko/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586651.541277,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow-oko/random","addCount":3} {"level":"info","ts":1668586651.541285,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow-oko/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586651.5413072,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow-oko/random","namespace":"kubeflow-oko","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586651.5521846,"logger":"experiment-suggestion-client","msg":"Suggestion created","experiment":"kubeflow-oko/random","namespace":"kubeflow-oko","name":"random"} {"level":"info","ts":1668586651.5526276,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow-oko/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586651.5526524,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow-oko/random","addCount":3} {"level":"info","ts":1668586651.5526626,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow-oko/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586651.5632856,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow-oko/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586651.563321,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow-oko/random","addCount":3} {"level":"info","ts":1668586651.5633469,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow-oko/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586651.5639825,"logger":"suggestion-controller","msg":"Creating Service","Suggestion":"kubeflow-oko/random","name":"random-random"} {"level":"info","ts":1668586651.7152066,"logger":"suggestion-controller","msg":"Creating Deployment","Suggestion":"kubeflow-oko/random","name":"random-random"} {"level":"info","ts":1668586651.7377508,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow-oko/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586651.7377858,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow-oko/random","addCount":3} {"level":"info","ts":1668586651.7378013,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow-oko/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586651.7484431,"logger":"suggestion-controller","msg":"Update suggestion instance status failed, reconciler requeued","Suggestion":"kubeflow-oko/random","err":"Operation cannot be fulfilled on suggestions.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586683.2643886,"logger":"suggestion-client","msg":"Algorithm settings are validated","Suggestion":"kubeflow-oko/random"} {"level":"info","ts":1668586683.2644699,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow-oko/random","Suggestion Requests":3,"Suggestion Count":0} {"level":"info","ts":1668586683.278306,"logger":"suggestion-client","msg":"Getting suggestions","Suggestion":"kubeflow-oko/random","endpoint":"random-random.kubeflow-oko:6789","Number of current request parameters":3,"Number of response parameters":3} {"level":"info","ts":1668586683.3337023,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow-oko/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668586683.333707,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow-oko/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586683.3338342,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow-oko/random","addCount":3} {"level":"info","ts":1668586683.3338442,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow-oko/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586683.3583622,"logger":"experiment-controller","msg":"Created Trials","Experiment":"kubeflow-oko/random","trialNames":["random-xpwk8ltm","random-xrnd77p6","random-6dzx8b5m"]} {"level":"info","ts":1668586683.3781238,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow-oko/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586683.379831,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow-oko/random-xpwk8ltm","kind":"Job","name":"random-xpwk8ltm"} {"level":"info","ts":1668586683.3963602,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow-oko/random-xpwk8ltm"} {"level":"info","ts":1668586683.4138367,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow-oko/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586683.4328313,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow-oko/random-xrnd77p6","kind":"Job","name":"random-xrnd77p6"} {"level":"info","ts":1668586683.4395044,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow-oko/random-xrnd77p6"} {"level":"info","ts":1668586683.4521363,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow-oko/random-6dzx8b5m","kind":"Job","name":"random-6dzx8b5m"} {"level":"info","ts":1668586683.4593759,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow-oko/random-6dzx8b5m"} {"level":"info","ts":1668586907.9376683,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586907.9378033,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586907.9378147,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668586907.9378202,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586907.9378366,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow/random","namespace":"kubeflow","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586907.9460092,"logger":"experiment-suggestion-client","msg":"Suggestion created","experiment":"kubeflow/random","namespace":"kubeflow","name":"random"} {"level":"info","ts":1668586907.9461386,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586907.946149,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668586907.946156,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586907.9462576,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586907.9462674,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668586907.946274,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586907.955395,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586907.95542,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668586907.9553444,"logger":"suggestion-controller","msg":"Creating Service","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668586907.9554296,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586907.970557,"logger":"suggestion-controller","msg":"Creating Deployment","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668586907.985851,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586907.9858773,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668586907.9858837,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586919.450472,"logger":"suggestion-client","msg":"Algorithm settings are validated","Suggestion":"kubeflow/random"} {"level":"info","ts":1668586919.4505477,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":0} {"level":"info","ts":1668586919.464237,"logger":"suggestion-client","msg":"Getting suggestions","Suggestion":"kubeflow/random","endpoint":"random-random.kubeflow:6789","Number of current request parameters":3,"Number of response parameters":3} {"level":"info","ts":1668586919.4951923,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668586919.4952202,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668586919.4952269,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668586919.4976187,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668586919.5236697,"logger":"experiment-controller","msg":"Created Trials","Experiment":"kubeflow/random","trialNames":["random-mjsvh9sv","random-vxh4z497","random-6r2k8fw6"]} {"level":"info","ts":1668586919.5238223,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":2,"completedCount":0} {"level":"info","ts":1668586919.5238328,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":1} {"level":"info","ts":1668586919.5238388,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"error","ts":1668586919.5309176,"logger":"experiment-controller","msg":"Trial create error","Experiment":"kubeflow/random","Trial name":"random-6r2k8fw6","error":"trials.kubeflow.org \"random-6r2k8fw6\" already exists","stacktrace":"github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:335\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:281\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"info","ts":1668586919.5496483,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-mjsvh9sv","kind":"Job","name":"random-mjsvh9sv"} {"level":"info","ts":1668586919.5563848,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-mjsvh9sv"} {"level":"info","ts":1668586919.5657005,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586919.5788264,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-vxh4z497","kind":"Job","name":"random-vxh4z497"} {"level":"info","ts":1668586919.5843813,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586919.5852973,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-vxh4z497"} {"level":"info","ts":1668586919.6221187,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586919.6246238,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-6r2k8fw6","kind":"Job","name":"random-6r2k8fw6"} {"level":"info","ts":1668586919.6315792,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-6r2k8fw6"} {"level":"info","ts":1668586919.64229,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586919.661524,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668586919.6785293,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668587315.2781813,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668588134.3283863,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668588134.328599,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668588134.3286376,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668588134.3286448,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588134.3286636,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow/random","namespace":"kubeflow","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588134.338889,"logger":"experiment-suggestion-client","msg":"Suggestion created","experiment":"kubeflow/random","namespace":"kubeflow","name":"random"} {"level":"info","ts":1668588134.3389866,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668588134.3390079,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668588134.3390143,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588134.3391397,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668588134.3391612,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668588134.3391676,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588134.3469849,"logger":"suggestion-controller","msg":"Creating Service","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668588134.3470345,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668588134.3470647,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668588134.347072,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588134.3617835,"logger":"suggestion-controller","msg":"Creating Deployment","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668588134.3768647,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668588134.3768938,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668588134.3769004,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588165.8743377,"logger":"suggestion-client","msg":"Algorithm settings are validated","Suggestion":"kubeflow/random"} {"level":"info","ts":1668588165.8744602,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":0} {"level":"info","ts":1668588165.8881361,"logger":"suggestion-client","msg":"Getting suggestions","Suggestion":"kubeflow/random","endpoint":"random-random.kubeflow:6789","Number of current request parameters":3,"Number of response parameters":3} {"level":"info","ts":1668588165.913698,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668588165.9137259,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668588165.9137325,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668588165.9137766,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668588165.950391,"logger":"experiment-controller","msg":"Created Trials","Experiment":"kubeflow/random","trialNames":["random-cj92znhc","random-sd6gjd9w","random-bn844xsr"]} {"level":"info","ts":1668588165.9624007,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-cj92znhc","kind":"Job","name":"random-cj92znhc"} {"level":"info","ts":1668588165.9776835,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-cj92znhc"} {"level":"info","ts":1668588166.0303087,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-sd6gjd9w","kind":"Job","name":"random-sd6gjd9w"} {"level":"info","ts":1668588166.038241,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-sd6gjd9w"} {"level":"info","ts":1668588166.0579722,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-bn844xsr","kind":"Job","name":"random-bn844xsr"} {"level":"info","ts":1668588166.0651834,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-bn844xsr"} {"level":"info","ts":1668590346.655757,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-bn844xsr","kind":"Job","name":"random-bn844xsr"} {"level":"info","ts":1668590352.2007232,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-cj92znhc","kind":"Job","name":"random-cj92znhc"} {"level":"info","ts":1668591769.3253787,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668593129.5950477,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593129.595195,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593129.5952377,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593129.5952442,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593129.5952787,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow/random","namespace":"kubeflow","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593129.6105905,"logger":"experiment-suggestion-client","msg":"Suggestion created","experiment":"kubeflow/random","namespace":"kubeflow","name":"random"} {"level":"info","ts":1668593129.6106865,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593129.6107092,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593129.6107163,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593129.6107495,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow/random","namespace":"kubeflow","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593129.620079,"logger":"suggestion-controller","msg":"Creating Service","Suggestion":"kubeflow/random","name":"random-random"} {"level":"error","ts":1668593129.6210566,"logger":"experiment-suggestion-client","msg":"CreateSuggestion failed","experiment":"kubeflow/random","instance":"random","error":"suggestions.kubeflow.org \"random\" already exists","stacktrace":"github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileSuggestions\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:471\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).createTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:350\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:335\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:281\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1668593129.6266665,"logger":"experiment-controller","msg":"GetOrCreateSuggestion failed","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3,"error":"suggestions.kubeflow.org \"random\" already exists","stacktrace":"github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).createTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:350\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:335\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:281\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1668593129.6267118,"logger":"experiment-controller","msg":"Get suggestions error","Experiment":"kubeflow/random","error":"suggestions.kubeflow.org \"random\" already exists","stacktrace":"github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:335\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:281\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1668593129.6267288,"logger":"experiment-controller","msg":"Create trials error","Experiment":"kubeflow/random","error":"suggestions.kubeflow.org \"random\" already exists","stacktrace":"github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:281\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1668593129.626762,"logger":"experiment-controller","msg":"Reconcile experiment error","Experiment":"kubeflow/random","error":"suggestions.kubeflow.org \"random\" already exists","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1668593129.6268165,"logger":"controller.experiment-controller","msg":"Reconciler error","name":"random","namespace":"kubeflow","error":"suggestions.kubeflow.org \"random\" already exists","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"info","ts":1668593129.6269112,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593129.626921,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593129.6269267,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593129.6325796,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593129.6326146,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593129.6326218,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593129.6424057,"logger":"suggestion-controller","msg":"Creating Deployment","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668593129.659416,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593129.6596498,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593129.6596687,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593141.2057698,"logger":"suggestion-client","msg":"Algorithm settings are validated","Suggestion":"kubeflow/random"} {"level":"info","ts":1668593141.2058437,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":0} {"level":"info","ts":1668593141.22259,"logger":"suggestion-client","msg":"Getting suggestions","Suggestion":"kubeflow/random","endpoint":"random-random.kubeflow:6789","Number of current request parameters":3,"Number of response parameters":3} {"level":"info","ts":1668593141.2344902,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593141.2345214,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593141.2345288,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593141.2347133,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668593141.267749,"logger":"experiment-controller","msg":"Created Trials","Experiment":"kubeflow/random","trialNames":["random-klnhv6st","random-kkssrfxp","random-2gvr4jdg"]} {"level":"info","ts":1668593141.2859087,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593141.2918856,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-klnhv6st","kind":"Job","name":"random-klnhv6st"} {"level":"info","ts":1668593141.2982204,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-klnhv6st"} {"level":"info","ts":1668593141.3374305,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-kkssrfxp","kind":"Job","name":"random-kkssrfxp"} {"level":"info","ts":1668593141.344244,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-kkssrfxp"} {"level":"info","ts":1668593141.3607996,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-2gvr4jdg","kind":"Job","name":"random-2gvr4jdg"} {"level":"info","ts":1668593141.3677237,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-2gvr4jdg"} {"level":"info","ts":1668593697.1168056,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593697.116917,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593697.1169395,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593697.1169474,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593697.1169627,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow/random","namespace":"kubeflow","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593697.1282291,"logger":"experiment-suggestion-client","msg":"Suggestion created","experiment":"kubeflow/random","namespace":"kubeflow","name":"random"} {"level":"info","ts":1668593697.1283438,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593697.1283782,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593697.1283853,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593697.128587,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593697.1286192,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593697.1286254,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593697.1370792,"logger":"suggestion-controller","msg":"Creating Service","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668593697.1371374,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593697.1371527,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593697.1371622,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593697.1531599,"logger":"suggestion-controller","msg":"Creating Deployment","Suggestion":"kubeflow/random","name":"random-random"} {"level":"info","ts":1668593697.176106,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593697.176142,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593697.176151,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593697.1828392,"logger":"suggestion-controller","msg":"Update suggestion instance status failed, reconciler requeued","Suggestion":"kubeflow/random","err":"Operation cannot be fulfilled on suggestions.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593708.661534,"logger":"suggestion-client","msg":"Algorithm settings are validated","Suggestion":"kubeflow/random"} {"level":"info","ts":1668593708.661672,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":0} {"level":"info","ts":1668593708.674528,"logger":"suggestion-client","msg":"Getting suggestions","Suggestion":"kubeflow/random","endpoint":"random-random.kubeflow:6789","Number of current request parameters":3,"Number of response parameters":3} {"level":"info","ts":1668593708.688999,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/random","Suggestion Requests":3,"Suggestion Count":3} {"level":"info","ts":1668593708.6891828,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":0,"completedCount":0} {"level":"info","ts":1668593708.6892042,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":3} {"level":"info","ts":1668593708.6892111,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"info","ts":1668593708.7261095,"logger":"experiment-controller","msg":"Created Trials","Experiment":"kubeflow/random","trialNames":["random-k5kpr6hj","random-46mtvmwd","random-mwzdw8st"]} {"level":"info","ts":1668593708.7262993,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/random","requiredActiveCount":3,"parallelCount":3,"activeCount":2,"completedCount":0} {"level":"info","ts":1668593708.7263136,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/random","addCount":1} {"level":"info","ts":1668593708.7263207,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/random","name":"random","Suggestion Requests":3} {"level":"error","ts":1668593708.736903,"logger":"experiment-controller","msg":"Trial create error","Experiment":"kubeflow/random","Trial name":"random-mwzdw8st","error":"trials.kubeflow.org \"random-mwzdw8st\" already exists","stacktrace":"github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileTrials\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:335\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:281\ngh.neting.cc/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"} {"level":"info","ts":1668593708.7594647,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593708.7643652,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-k5kpr6hj","kind":"Job","name":"random-k5kpr6hj"} {"level":"info","ts":1668593708.7702346,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-k5kpr6hj"} {"level":"info","ts":1668593708.775841,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593708.7933445,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593708.7951496,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-46mtvmwd","kind":"Job","name":"random-46mtvmwd"} {"level":"info","ts":1668593708.8014355,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-46mtvmwd"} {"level":"info","ts":1668593708.8111715,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593708.8274133,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/random-mwzdw8st","kind":"Job","name":"random-mwzdw8st"} {"level":"info","ts":1668593708.8315144,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/random","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"random\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668593708.8341055,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/random-mwzdw8st"} {"level":"info","ts":1668595180.8200922,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/jhg","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"jhg\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668595180.8203042,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/jhg","requiredActiveCount":2,"parallelCount":2,"activeCount":0,"completedCount":0} {"level":"info","ts":1668595180.820325,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/jhg","addCount":2} {"level":"info","ts":1668595180.820335,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/jhg","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595180.820351,"logger":"experiment-suggestion-client","msg":"Creating Suggestion","experiment":"kubeflow/jhg","namespace":"kubeflow","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595180.8274512,"logger":"experiment-suggestion-client","msg":"Suggestion created","experiment":"kubeflow/jhg","namespace":"kubeflow","name":"jhg"} {"level":"info","ts":1668595180.8275928,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/jhg","requiredActiveCount":2,"parallelCount":2,"activeCount":0,"completedCount":0} {"level":"info","ts":1668595180.8276055,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/jhg","addCount":2} {"level":"info","ts":1668595180.8276126,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/jhg","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595180.8278134,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/jhg","requiredActiveCount":2,"parallelCount":2,"activeCount":0,"completedCount":0} {"level":"info","ts":1668595180.827834,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/jhg","addCount":2} {"level":"info","ts":1668595180.8278399,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/jhg","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595180.8358617,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/jhg","requiredActiveCount":2,"parallelCount":2,"activeCount":0,"completedCount":0} {"level":"info","ts":1668595180.8358943,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/jhg","addCount":2} {"level":"info","ts":1668595180.8359017,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/jhg","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595180.83595,"logger":"suggestion-controller","msg":"Creating Service","Suggestion":"kubeflow/jhg","name":"jhg-random"} {"level":"info","ts":1668595180.8536804,"logger":"suggestion-controller","msg":"Creating Deployment","Suggestion":"kubeflow/jhg","name":"jhg-random"} {"level":"info","ts":1668595180.8722422,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/jhg","requiredActiveCount":2,"parallelCount":2,"activeCount":0,"completedCount":0} {"level":"info","ts":1668595180.87227,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/jhg","addCount":2} {"level":"info","ts":1668595180.872277,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/jhg","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595192.370637,"logger":"suggestion-client","msg":"Algorithm settings are validated","Suggestion":"kubeflow/jhg"} {"level":"info","ts":1668595192.3707123,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/jhg","Suggestion Requests":2,"Suggestion Count":0} {"level":"info","ts":1668595192.3803372,"logger":"suggestion-client","msg":"Getting suggestions","Suggestion":"kubeflow/jhg","endpoint":"jhg-random.kubeflow:6789","Number of current request parameters":2,"Number of response parameters":2} {"level":"info","ts":1668595192.4042382,"logger":"experiment-controller","msg":"Statistics","Experiment":"kubeflow/jhg","requiredActiveCount":2,"parallelCount":2,"activeCount":0,"completedCount":0} {"level":"info","ts":1668595192.4042997,"logger":"experiment-controller","msg":"Reconcile Suggestion","Experiment":"kubeflow/jhg","addCount":2} {"level":"info","ts":1668595192.4043071,"logger":"experiment-controller","msg":"GetOrCreateSuggestion","Experiment":"kubeflow/jhg","name":"jhg","Suggestion Requests":2} {"level":"info","ts":1668595192.4044302,"logger":"suggestion-controller","msg":"Sync assignments","Suggestion":"kubeflow/jhg","Suggestion Requests":2,"Suggestion Count":2} {"level":"info","ts":1668595192.4280088,"logger":"experiment-controller","msg":"Created Trials","Experiment":"kubeflow/jhg","trialNames":["jhg-8fxksm7k","jhg-pchlkp2m"]} {"level":"info","ts":1668595192.4462867,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/jhg","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"jhg\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668595192.4546516,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/jhg-8fxksm7k","kind":"TFJob","name":"jhg-8fxksm7k"} {"level":"info","ts":1668595192.4635334,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/jhg-8fxksm7k"} {"level":"info","ts":1668595192.5008855,"logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":"kubeflow/jhg","err":"Operation cannot be fulfilled on experiments.kubeflow.org \"jhg\": the object has been modified; please apply your changes to the latest version and try again"} {"level":"info","ts":1668595192.5024803,"logger":"trial-controller","msg":"Creating Job","Trial":"kubeflow/jhg-pchlkp2m","kind":"TFJob","name":"jhg-pchlkp2m"} {"level":"info","ts":1668595192.5107436,"logger":"trial-controller","msg":"Trial status changed to Running","Trial":"kubeflow/jhg-pchlkp2m"}

@johnugeorge
Copy link
Member

It looks good. Can you check logs of metrics container of the trial pod?

@skgreenstar
Copy link
Author

skgreenstar commented Nov 18, 2022

logs of metrics container of the trial pod..!

There was no particular issue. The log and metric results were also very normal.
Nevertheless, it is strange that metric data is not accumulated in db.

I tested on two aks.

The katib metric was very normal and identical.

When I compared the db that is accumulated in mysql provided by katib with the db that I set,
There was a difference in that the data was not accumulated.

If possible, is there a way to check the communication from trial pod to db? (It's strange that the data was inserted into the db, but it didn't go into the db.)
Is there a way to check this in the log or in person?

@johnugeorge
Copy link
Member

To confirm, are you seeing success message in metric logs once training is completed ?

klog.Infof("Metrics reported. :\n%v", olog)

@skgreenstar
Copy link
Author

No show about klog.Infof("Metrics reported. :\n%v", olog)
ㅠㅠ
What's wrong?

2022-11-18T10:47:18Z INFO start with arguments Namespace(num_classes=10, num_examples=60000, add_stn=False, image_shape='1, 28, 28', network='mlp', num_layers=2, gpus=None, kv_store='device', num_epochs=10, lr=0.02184738980894598, lr_factor=0.1, lr_step_epochs='10', initializer='default', optimizer='ftrl', mom=0.9, wd=0.0001, batch_size=64, disp_batches=100, model_prefix=None, save_period=1, monitor=0, load_epoch=None, top_k=0, loss='', test_io=0, dtype='float32', gc_type='none', gc_threshold=0.5, macrobatch_size=0, warmup_epochs=5, warmup_strategy='linear', profile_worker_suffix='', profile_server_suffix='', use_imagenet_data_augmentation=0) 2022-11-18T10:47:18Z DEBUG Starting new HTTP connection (1): data.mxnet.io:80 2022-11-18T10:47:18Z DEBUG http://data.mxnet.io:80 "GET /data/mnist/train-labels-idx1-ubyte.gz HTTP/1.1" 301 290 2022-11-18T10:47:18Z DEBUG Starting new HTTP connection (1): data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 2022-11-18T10:47:18Z DEBUG http://data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 "GET /data/mnist/train-labels-idx1-ubyte.gz HTTP/1.1" 200 28881 2022-11-18T10:47:18Z INFO downloaded http://data.mxnet.io/data/mnist/train-labels-idx1-ubyte.gz into train-labels-idx1-ubyte.gz successfully 2022-11-18T10:47:18Z DEBUG Starting new HTTP connection (1): data.mxnet.io:80 2022-11-18T10:47:19Z DEBUG http://data.mxnet.io:80 "GET /data/mnist/train-images-idx3-ubyte.gz HTTP/1.1" 301 290 2022-11-18T10:47:19Z DEBUG Starting new HTTP connection (1): data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 2022-11-18T10:47:19Z DEBUG http://data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 "GET /data/mnist/train-images-idx3-ubyte.gz HTTP/1.1" 200 9912422 2022-11-18T10:47:20Z INFO downloaded http://data.mxnet.io/data/mnist/train-images-idx3-ubyte.gz into train-images-idx3-ubyte.gz successfully 2022-11-18T10:47:21Z DEBUG Starting new HTTP connection (1): data.mxnet.io:80 2022-11-18T10:47:21Z DEBUG http://data.mxnet.io:80 "GET /data/mnist/t10k-labels-idx1-ubyte.gz HTTP/1.1" 301 289 2022-11-18T10:47:21Z DEBUG Starting new HTTP connection (1): data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 2022-11-18T10:47:21Z DEBUG http://data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 "GET /data/mnist/t10k-labels-idx1-ubyte.gz HTTP/1.1" 200 4542 2022-11-18T10:47:21Z INFO downloaded http://data.mxnet.io/data/mnist/t10k-labels-idx1-ubyte.gz into t10k-labels-idx1-ubyte.gz successfully 2022-11-18T10:47:21Z DEBUG Starting new HTTP connection (1): data.mxnet.io:80 2022-11-18T10:47:21Z DEBUG http://data.mxnet.io:80 "GET /data/mnist/t10k-images-idx3-ubyte.gz HTTP/1.1" 301 289 2022-11-18T10:47:21Z DEBUG Starting new HTTP connection (1): data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 2022-11-18T10:47:22Z DEBUG http://data.mxnet.io.s3-website-us-west-1.amazonaws.com:80 "GET /data/mnist/t10k-images-idx3-ubyte.gz HTTP/1.1" 200 1648877 2022-11-18T10:47:22Z INFO downloaded http://data.mxnet.io/data/mnist/t10k-images-idx3-ubyte.gz into t10k-images-idx3-ubyte.gz successfully [10:47:23] ../src/executor/graph_executor.cc:1991: Subgraph backend MKLDNN is activated. 2022-11-18T10:47:23Z INFO Epoch[0] Batch [0-100] Speed: 22117.72 samples/sec accuracy=0.120823 2022-11-18T10:47:24Z INFO Epoch[0] Batch [100-200] Speed: 18615.30 samples/sec accuracy=0.114062 2022-11-18T10:47:24Z INFO Epoch[0] Batch [200-300] Speed: 19706.07 samples/sec accuracy=0.110937 2022-11-18T10:47:24Z INFO Epoch[0] Batch [300-400] Speed: 18194.82 samples/sec accuracy=0.108281 2022-11-18T10:47:25Z INFO Epoch[0] Batch [400-500] Speed: 14671.04 samples/sec accuracy=0.110469 2022-11-18T10:47:25Z INFO Epoch[0] Batch [500-600] Speed: 17969.78 samples/sec accuracy=0.115156 2022-11-18T10:47:26Z INFO Epoch[0] Batch [600-700] Speed: 17839.19 samples/sec accuracy=0.113281 2022-11-18T10:47:26Z INFO Epoch[0] Batch [700-800] Speed: 17449.71 samples/sec accuracy=0.114219 2022-11-18T10:47:26Z INFO Epoch[0] Batch [800-900] Speed: 17674.52 samples/sec accuracy=0.109687 2022-11-18T10:47:26Z INFO Epoch[0] Train-accuracy=0.112373 2022-11-18T10:47:26Z INFO Epoch[0] Time cost=3.407 2022-11-18T10:47:27Z INFO Epoch[0] Validation-accuracy=0.113854 2022-11-18T10:47:27Z INFO Epoch[1] Batch [0-100] Speed: 16581.58 samples/sec accuracy=0.113397 2022-11-18T10:47:28Z INFO Epoch[1] Batch [100-200] Speed: 17419.48 samples/sec accuracy=0.107344 2022-11-18T10:47:28Z INFO Epoch[1] Batch [200-300] Speed: 19009.30 samples/sec accuracy=0.110625 2022-11-18T10:47:28Z INFO Epoch[1] Batch [300-400] Speed: 17902.17 samples/sec accuracy=0.110156 2022-11-18T10:47:29Z INFO Epoch[1] Batch [400-500] Speed: 10735.69 samples/sec accuracy=0.113750 2022-11-18T10:47:29Z INFO Epoch[1] Batch [500-600] Speed: 16978.96 samples/sec accuracy=0.112812 2022-11-18T10:47:30Z INFO Epoch[1] Batch [600-700] Speed: 17491.50 samples/sec accuracy=0.110000 2022-11-18T10:47:30Z INFO Epoch[1] Batch [700-800] Speed: 16610.88 samples/sec accuracy=0.113437 2022-11-18T10:47:30Z INFO Epoch[1] Batch [800-900] Speed: 18042.14 samples/sec accuracy=0.114687 2022-11-18T10:47:30Z INFO Epoch[1] Train-accuracy=0.112373 2022-11-18T10:47:30Z INFO Epoch[1] Time cost=3.664 2022-11-18T10:47:31Z INFO Epoch[1] Validation-accuracy=0.113854 2022-11-18T10:47:31Z INFO Epoch[2] Batch [0-100] Speed: 17646.74 samples/sec accuracy=0.107983 2022-11-18T10:47:32Z INFO Epoch[2] Batch [100-200] Speed: 15857.25 samples/sec accuracy=0.111406 2022-11-18T10:47:32Z INFO Epoch[2] Batch [200-300] Speed: 16038.05 samples/sec accuracy=0.119375 2022-11-18T10:47:32Z INFO Epoch[2] Batch [300-400] Speed: 16631.82 samples/sec accuracy=0.110781 2022-11-18T10:47:33Z INFO Epoch[2] Batch [400-500] Speed: 13834.23 samples/sec accuracy=0.117969 2022-11-18T10:47:33Z INFO Epoch[2] Batch [500-600] Speed: 17591.15 samples/sec accuracy=0.110312 2022-11-18T10:47:34Z INFO Epoch[2] Batch [600-700] Speed: 15246.08 samples/sec accuracy=0.107500 2022-11-18T10:47:34Z INFO Epoch[2] Batch [700-800] Speed: 17705.64 samples/sec accuracy=0.111250 2022-11-18T10:47:34Z INFO Epoch[2] Batch [800-900] Speed: 16142.41 samples/sec accuracy=0.112500 2022-11-18T10:47:35Z INFO Epoch[2] Train-accuracy=0.112390 2022-11-18T10:47:35Z INFO Epoch[2] Time cost=3.720 2022-11-18T10:47:35Z INFO Epoch[2] Validation-accuracy=0.113854 2022-11-18T10:47:35Z INFO Epoch[3] Batch [0-100] Speed: 17662.90 samples/sec accuracy=0.113552 2022-11-18T10:47:36Z INFO Epoch[3] Batch [100-200] Speed: 16966.99 samples/sec accuracy=0.111250 2022-11-18T10:47:36Z INFO Epoch[3] Batch [200-300] Speed: 18091.43 samples/sec accuracy=0.117813 2022-11-18T10:47:36Z INFO Epoch[3] Batch [300-400] Speed: 18035.56 samples/sec accuracy=0.117031 2022-11-18T10:47:37Z INFO Epoch[3] Batch [400-500] Speed: 14823.78 samples/sec accuracy=0.112187 2022-11-18T10:47:37Z INFO Epoch[3] Batch [500-600] Speed: 17392.27 samples/sec accuracy=0.108438 2022-11-18T10:47:38Z INFO Epoch[3] Batch [600-700] Speed: 17805.44 samples/sec accuracy=0.109063 2022-11-18T10:47:38Z INFO Epoch[3] Batch [700-800] Speed: 17073.53 samples/sec accuracy=0.115781 2022-11-18T10:47:38Z INFO Epoch[3] Batch [800-900] Speed: 18030.94 samples/sec accuracy=0.108281 2022-11-18T10:47:38Z INFO Epoch[3] Train-accuracy=0.112373 2022-11-18T10:47:38Z INFO Epoch[3] Time cost=3.476 2022-11-18T10:47:39Z INFO Epoch[3] Validation-accuracy=0.113854 2022-11-18T10:47:39Z INFO Epoch[4] Batch [0-100] Speed: 16212.23 samples/sec accuracy=0.110922 2022-11-18T10:47:40Z INFO Epoch[4] Batch [100-200] Speed: 17849.98 samples/sec accuracy=0.113281 2022-11-18T10:47:40Z INFO Epoch[4] Batch [200-300] Speed: 15840.57 samples/sec accuracy=0.110781 2022-11-18T10:47:40Z INFO Epoch[4] Batch [300-400] Speed: 16185.89 samples/sec accuracy=0.108750 2022-11-18T10:47:41Z INFO Epoch[4] Batch [400-500] Speed: 14281.89 samples/sec accuracy=0.115469 2022-11-18T10:47:41Z INFO Epoch[4] Batch [500-600] Speed: 17330.29 samples/sec accuracy=0.108750 2022-11-18T10:47:42Z INFO Epoch[4] Batch [600-700] Speed: 17372.21 samples/sec accuracy=0.112656 2022-11-18T10:47:42Z INFO Epoch[4] Batch [700-800] Speed: 17350.72 samples/sec accuracy=0.116562 2022-11-18T10:47:42Z INFO Epoch[4] Batch [800-900] Speed: 16899.46 samples/sec accuracy=0.110781 2022-11-18T10:47:42Z INFO Epoch[4] Train-accuracy=0.112373 2022-11-18T10:47:42Z INFO Epoch[4] Time cost=3.625 2022-11-18T10:47:43Z INFO Epoch[4] Validation-accuracy=0.113854 2022-11-18T10:47:43Z INFO Epoch[5] Batch [0-100] Speed: 17812.51 samples/sec accuracy=0.115099 2022-11-18T10:47:44Z INFO Epoch[5] Batch [100-200] Speed: 17452.41 samples/sec accuracy=0.112656 2022-11-18T10:47:44Z INFO Epoch[5] Batch [200-300] Speed: 15469.40 samples/sec accuracy=0.105313 2022-11-18T10:47:44Z INFO Epoch[5] Batch [300-400] Speed: 17133.62 samples/sec accuracy=0.115469 2022-11-18T10:47:45Z INFO Epoch[5] Batch [400-500] Speed: 15616.07 samples/sec accuracy=0.114062 2022-11-18T10:47:45Z INFO Epoch[5] Batch [500-600] Speed: 16958.49 samples/sec accuracy=0.108594 2022-11-18T10:47:45Z INFO Epoch[5] Batch [600-700] Speed: 17679.47 samples/sec accuracy=0.115937 2022-11-18T10:47:46Z INFO Epoch[5] Batch [700-800] Speed: 17608.55 samples/sec accuracy=0.110781 2022-11-18T10:47:46Z INFO Epoch[5] Batch [800-900] Speed: 17637.37 samples/sec accuracy=0.117188 2022-11-18T10:47:46Z INFO Epoch[5] Train-accuracy=0.112390 2022-11-18T10:47:46Z INFO Epoch[5] Time cost=3.529 2022-11-18T10:47:47Z INFO Epoch[5] Validation-accuracy=0.113854 2022-11-18T10:47:47Z INFO Epoch[6] Batch [0-100] Speed: 17695.35 samples/sec accuracy=0.107519 2022-11-18T10:47:47Z INFO Epoch[6] Batch [100-200] Speed: 19204.32 samples/sec accuracy=0.112344 2022-11-18T10:47:48Z INFO Epoch[6] Batch [200-300] Speed: 17837.10 samples/sec accuracy=0.112031 2022-11-18T10:47:48Z INFO Epoch[6] Batch [300-400] Speed: 17382.90 samples/sec accuracy=0.112656 2022-11-18T10:47:49Z INFO Epoch[6] Batch [400-500] Speed: 17051.68 samples/sec accuracy=0.107813 2022-11-18T10:47:49Z INFO Epoch[6] Batch [500-600] Speed: 16500.56 samples/sec accuracy=0.118281 2022-11-18T10:47:49Z INFO Epoch[6] Batch [600-700] Speed: 17856.19 samples/sec accuracy=0.110937 2022-11-18T10:47:50Z INFO Epoch[6] Batch [700-800] Speed: 17896.79 samples/sec accuracy=0.108906 2022-11-18T10:47:50Z INFO Epoch[6] Batch [800-900] Speed: 17635.39 samples/sec accuracy=0.114844 2022-11-18T10:47:50Z INFO Epoch[6] Train-accuracy=0.112373 2022-11-18T10:47:50Z INFO Epoch[6] Time cost=3.481 2022-11-18T10:47:50Z INFO Epoch[6] Validation-accuracy=0.113854 2022-11-18T10:47:51Z INFO Epoch[7] Batch [0-100] Speed: 16647.40 samples/sec accuracy=0.111541 2022-11-18T10:47:51Z INFO Epoch[7] Batch [100-200] Speed: 18504.04 samples/sec accuracy=0.116719 2022-11-18T10:47:52Z INFO Epoch[7] Batch [200-300] Speed: 18081.00 samples/sec accuracy=0.114375 2022-11-18T10:47:52Z INFO Epoch[7] Batch [300-400] Speed: 16781.41 samples/sec accuracy=0.111406 2022-11-18T10:47:52Z INFO Epoch[7] Batch [400-500] Speed: 16740.76 samples/sec accuracy=0.107500 2022-11-18T10:47:53Z INFO Epoch[7] Batch [500-600] Speed: 15801.75 samples/sec accuracy=0.116875 2022-11-18T10:47:53Z INFO Epoch[7] Batch [600-700] Speed: 17920.74 samples/sec accuracy=0.114844 2022-11-18T10:47:54Z INFO Epoch[7] Batch [700-800] Speed: 16894.94 samples/sec accuracy=0.107188 2022-11-18T10:47:54Z INFO Epoch[7] Batch [800-900] Speed: 17229.27 samples/sec accuracy=0.110937 2022-11-18T10:47:54Z INFO Epoch[7] Train-accuracy=0.112407 2022-11-18T10:47:54Z INFO Epoch[7] Time cost=3.521 2022-11-18T10:47:54Z INFO Epoch[7] Validation-accuracy=0.113854 2022-11-18T10:47:55Z INFO Epoch[8] Batch [0-100] Speed: 15458.33 samples/sec accuracy=0.109994 2022-11-18T10:47:55Z INFO Epoch[8] Batch [100-200] Speed: 17319.60 samples/sec accuracy=0.108438 2022-11-18T10:47:56Z INFO Epoch[8] Batch [200-300] Speed: 17710.22 samples/sec accuracy=0.107031 2022-11-18T10:47:56Z INFO Epoch[8] Batch [300-400] Speed: 17166.42 samples/sec accuracy=0.112656 2022-11-18T10:47:56Z INFO Epoch[8] Batch [400-500] Speed: 17966.06 samples/sec accuracy=0.116250 2022-11-18T10:47:57Z INFO Epoch[8] Batch [500-600] Speed: 16286.45 samples/sec accuracy=0.115469 2022-11-18T10:47:57Z INFO Epoch[8] Batch [600-700] Speed: 17355.05 samples/sec accuracy=0.114531 2022-11-18T10:47:57Z INFO Epoch[8] Batch [700-800] Speed: 17255.15 samples/sec accuracy=0.114844 2022-11-18T10:47:58Z INFO Epoch[8] Batch [800-900] Speed: 18007.08 samples/sec accuracy=0.114687 2022-11-18T10:47:58Z INFO Epoch[8] Train-accuracy=0.112373 2022-11-18T10:47:58Z INFO Epoch[8] Time cost=3.504 2022-11-18T10:47:58Z INFO Epoch[8] Validation-accuracy=0.113854 2022-11-18T10:47:59Z INFO Epoch[9] Batch [0-100] Speed: 16269.76 samples/sec accuracy=0.112469 2022-11-18T10:47:59Z INFO Epoch[9] Batch [100-200] Speed: 14299.49 samples/sec accuracy=0.108438 2022-11-18T10:47:59Z INFO Epoch[9] Batch [200-300] Speed: 17222.93 samples/sec accuracy=0.113437 2022-11-18T10:48:00Z INFO Epoch[9] Batch [300-400] Speed: 15778.56 samples/sec accuracy=0.115625 2022-11-18T10:48:00Z INFO Epoch[9] Batch [400-500] Speed: 17833.15 samples/sec accuracy=0.118594 2022-11-18T10:48:01Z INFO Epoch[9] Batch [500-600] Speed: 15407.44 samples/sec accuracy=0.114062 2022-11-18T10:48:01Z INFO Epoch[9] Batch [600-700] Speed: 16887.44 samples/sec accuracy=0.112031 2022-11-18T10:48:01Z INFO Epoch[9] Batch [700-800] Speed: 17728.87 samples/sec accuracy=0.105000 2022-11-18T10:48:02Z INFO Epoch[9] Batch [800-900] Speed: 17824.49 samples/sec accuracy=0.109844 2022-11-18T10:48:02Z INFO Epoch[9] Train-accuracy=0.112357 2022-11-18T10:48:02Z INFO Epoch[9] Time cost=3.635 2022-11-18T10:48:02Z INFO Epoch[9] Validation-accuracy=0.113854

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member

Sorry for the late reply @skgreenstar.
Was your issue resolved in a newer Katib version (e.g. 0.16 or 1.15) ?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants