This repo contains the libraries for writing a custom job operators such as tf-operator and pytorch-operator. To write a custom operator, user need to do following steps
-
Generate operator skeleton using kube-builder or operator-sdk
-
Define job crd and reuse common API. Check test_job for full example.
import (
commonv1 "github.com/kubeflow/common/pkg/apis/common/v1"
)
// reuse commonv1 api in your type.go
RunPolicy *commonv1.RunPolicy `json:"runPolicy,omitempty"`
TestReplicaSpecs map[TestReplicaType]*commonv1.ReplicaSpec `json:"testReplicaSpecs"`
- Write a custom controller that implements controller interface, such as the TestJobController and instantiate a testJobController object
testJobController := TestJobController {
...
}
- Instantiate a JobController struct object and pass in the custom controller written in step 1 as a parameter
import "github.com/kubeflow/common/pkg/controller.v1/common"
jobController := common.JobController {
Controller: testJobController,
Config: v1.JobControllerConfiguration{EnableGangScheduling: false},
Recorder: recorder,
}
- Within you main reconcile loop, call the JobController.ReconcileJobs method.
reconcile(...) {
// Your main reconcile loop.
...
jobController.ReconcileJobs(...)
...
}
Note that this repo is still under construction, API compatibility is not guaranteed at this point.
The API fies are located under pkg/apis/common/v1
:
- constants.go: the constants such as label keys.
- interface.go: the interfaces to be implemented by custom controllers.
- controller.go: the main
JobController
that contains theReconcileJobs
API method to be invoked by user. This is the entrypoint of the JobController logic. The rest of code underjob_controller/
folder contains the core logic for theJobController
to work, such as creating and managing worker pods, services etc.