Package v1 is the v1 version of the API.
Package v1 contains API Schema definitions for the kubeflow.org v1 API group
Field | Description |
---|---|
|
minReplicas is the lower limit for the number of replicas to which the training job can scale down. It defaults to null. |
|
upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas, defaults to null. |
|
|
|
|
|
|
|
|
|
RDZVConf contains additional rendezvous configuration (<key1>=<value1>,<key2>=<value2>,…). |
|
Start a local standalone rendezvous backend that is represented by a C10d TCP store on port 29400. Useful when launching single-node, multi-worker job. If specified --rdzv_backend, --rdzv_endpoint, --rdzv_id are auto-assigned; any explicitly set values are ignored. |
|
Number of workers per node; supported values: [auto, cpu, gpu, int]. |
|
|
|
Metrics contains the specifications which are used to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated with multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the HPA will not be created. |
Field | Description |
---|---|
|
|
|
|
|
|
|
Refer to Kubernetes API documentation for fields of |
|
|
|
Field | Description |
---|---|
|
|
|
|
|
|
|
Refer to Kubernetes API documentation for fields of |
|
Field | Description |
---|---|
|
Specifies the number of slots per worker used in hostfile. Defaults to 1. |
|
CleanPodPolicy defines the policy that whether to kill pods after the job completes. Defaults to None. |
|
|
|
MainContainer specifies name of the main container which executes the MPI code. |
|
|
MXJob is the Schema for the mxjobs API
Field | Description |
---|---|
|
|
|
|
|
|
|
Refer to Kubernetes API documentation for fields of |
|
|
|
MXJobList contains a list of MXJob
Field | Description |
---|---|
|
|
|
|
|
|
|
Refer to Kubernetes API documentation for fields of |
|
MXJobSpec defines the desired state of MXJob
Field | Description |
---|---|
|
RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. |
|
JobMode specify the kind of MXjob to do. Different mode may have different MXReplicaSpecs request |
|
MXReplicaSpecs is map of commonv1.ReplicaType and commonv1.ReplicaSpec specifies the MX replicas to run. For example, { "Scheduler": commonv1.ReplicaSpec, "Server": commonv1.ReplicaSpec, "Worker": commonv1.ReplicaSpec, } |
Field | Description |
---|---|
|
minReplicas is the lower limit for the number of replicas to which the training job can scale down. It defaults to null. |
|
upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas, defaults to null. |
|
MaxRestarts is the limit for restart times of pods in elastic mode. |
|
Metrics contains the specifications which are used to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated with multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the HPA will not be created. |
PaddleJob Represents a PaddleJob resource.
Field | Description |
---|---|
|
|
|
|
|
Standard Kubernetes type metadata. |
|
Refer to Kubernetes API documentation for fields of |
|
Specification of the desired state of the PaddleJob. |
|
Most recently observed status of the PaddleJob. Read-only (modified by the system). |
PaddleJobList is a list of PaddleJobs.
Field | Description |
---|---|
|
|
|
|
|
Standard type metadata. |
|
Refer to Kubernetes API documentation for fields of |
|
List of PaddleJobs. |
PaddleJobSpec is a desired state description of the PaddleJob.
Field | Description |
---|---|
|
RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. |
|
ElasticPolicy holds the elastic policy for paddle job. |
|
A map of PaddleReplicaType (type) to ReplicaSpec (value). Specifies the Paddle cluster configuration. For example, { "Master": PaddleReplicaSpec, "Worker": PaddleReplicaSpec, } |
PyTorchJob Represents a PyTorchJob resource.
Field | Description |
---|---|
|
|
|
|
|
Standard Kubernetes type metadata. |
|
Refer to Kubernetes API documentation for fields of |
|
Specification of the desired state of the PyTorchJob. |
|
Most recently observed status of the PyTorchJob. Read-only (modified by the system). |
PyTorchJobList is a list of PyTorchJobs.
Field | Description |
---|---|
|
|
|
|
|
Standard type metadata. |
|
Refer to Kubernetes API documentation for fields of |
|
List of PyTorchJobs. |
PyTorchJobSpec is a desired state description of the PyTorchJob.
Field | Description |
---|---|
|
RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. |
|
|
|
A map of PyTorchReplicaType (type) to ReplicaSpec (value). Specifies the PyTorch cluster configuration. For example, { "Master": PyTorchReplicaSpec, "Worker": PyTorchReplicaSpec, } |
TFJob represents a TFJob resource.
Field | Description |
---|---|
|
|
|
|
|
Standard Kubernetes type metadata. |
|
Refer to Kubernetes API documentation for fields of |
|
Specification of the desired state of the TFJob. |
|
Most recently observed status of the TFJob. Populated by the system. Read-only. |
TFJobList is a list of TFJobs.
Field | Description |
---|---|
|
|
|
|
|
Standard type metadata. |
|
Refer to Kubernetes API documentation for fields of |
|
List of TFJobs. |
TFJobSpec is a desired state description of the TFJob.
Field | Description |
---|---|
|
RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. |
|
SuccessPolicy defines the policy to mark the TFJob as succeeded. Default to "", using the default rules. |
|
A map of TFReplicaType (type) to ReplicaSpec (value). Specifies the TF cluster configuration. For example, { "PS": ReplicaSpec, "Worker": ReplicaSpec, } |
|
A switch to enable dynamic worker |
XGBoostJob is the Schema for the xgboostjobs API
Field | Description |
---|---|
|
|
|
|
|
|
|
Refer to Kubernetes API documentation for fields of |
|
|
|
XGBoostJobList contains a list of XGBoostJob
Field | Description |
---|---|
|
|
|
|
|
|
|
Refer to Kubernetes API documentation for fields of |
|
XGBoostJobSpec defines the desired state of XGBoostJob
Field | Description |
---|---|
|
INSERT ADDITIONAL SPEC FIELDS - desired state of cluster Important: Run "make" to regenerate code after modifying this file |
|