Skip to content
This repository has been archived by the owner on Jun 7, 2022. It is now read-only.

Latest commit

 

History

History
1545 lines (777 loc) · 64.2 KB

titus-v3-spec.md

File metadata and controls

1545 lines (777 loc) · 64.2 KB

Protocol Documentation

Table of Contents

Top

src/main/proto/netflix/titus/titus_job_api.proto

BatchJobSpec

Batch job specification.

Field Type Label Description
size uint32 (Required) Number of tasks to run (> 0).
runtimeLimitSec uint64 (Required) Maximum amount of time in seconds that the job's task is allowed to run. The timer is started once the task transitions to the 'RUNNING' state. If a task terminates with an error and is restarted, the timer starts again from 0.
retryPolicy RetryPolicy (Required) Task rescheduling policy in case of failure.
retryOnRuntimeLimit bool true when the task should be restarted after being terminated due to runtime limit.

Capacity

This data structure is associated with a service job and specifies the number of tasks to run (desired). At any point in time, the condition min <= desired <= max must hold true. The desired state may be changed by a user, but also may be changed as a result of an auto-scaling action.

Field Type Label Description
min uint32 (Required) Minimum number of tasks to run (min >= 0)
max uint32 (Required) Maximum number of tasks that can be run (max >= desired)
desired uint32 (Required) Desired number of tasks to run (min <= desired <= max)

Constraints

Task placement constraints. Currently supported constraint types are:

  • zoneBalance - distributes tasks of a job evenly among the availability zones
  • uniqueHost - runs each task of a job on a different agent
  • exclusiveHost - ensures that an agent is exclusively assigned to a given job
Field Type Label Description
constraints Constraints.ConstraintsEntry repeated (Optional) A map of constraint name/values. If multiple constraints are given, all must be met (logical 'and').
expression string Not supported yet. (Optional) An expression combining multiple constraints. For example 'zoneBalance AND serverGroup == "mySG"'. Avalilable operators: <, <=, ==, >, >=, in, like, AND, OR

Constraints.ConstraintsEntry

Field Type Label Description
key string
value string

Container

Container descriptor.

Field Type Label Description
resources ContainerResources (Required) Resources for the whole task.
securityProfile SecurityProfile (Required) Container security profile: IAM role, security groups, container roles.
image Image (Required) Image reference.
attributes Container.AttributesEntry repeated (Optional) Arbitrary set of key/value pairs. Key names starting with 'titus.' are reserved by Titus.
entryPoint string repeated (Optional) Override the entrypoint of the image. If set, the command baked into the image (if any) is always ignored. Interactions between the entrypoint and command are the same as specified by Docker: https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact

To clear (unset) the entrypoint of the image, pass a single empty string value: [""] | | command | string | repeated | (Optional) Additional parameters for the entrypoint defined either here or provided in the container image. To clear (unset) the command of the image, pass a single empty string value: [""] | | env | Container.EnvEntry | repeated | (Optional) A collection of system environment variables passed to the container. | | softConstraints | Constraints | | (Optional) Constraints that Titus will prefer to fulfill but are not required. These constraints apply to the whole task. | | hardConstraints | Constraints | | (Optional) Constraints that have to be met for a task to be scheduled on an agent. These constraints apply to the whole task. | | experimental | google.protobuf.Any | | (Optional) Experimental features | | volumeMounts | VolumeMount | repeated | (Optional) An array of VolumeMounts. These VolumeMounts will be mounted in the container, and must reference one of the volumes declared for the Job. See the k8s docs https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#volumemount-v1-core for more technical details. |

Container.AttributesEntry

Field Type Label Description
key string
value string

Container.EnvEntry

Field Type Label Description
key string
value string

ContainerResources

Field Type Label Description
cpu double (Required) Number of CPUs to allocate to a task (must be always > 0, but the actual limit is configurable).
gpu uint32 (Optional) Number of GPUs to allocate to a task.
memoryMB uint32 (Required) Amount of memory to allocate to a task (must be always > 0, but the actual limit is configurable).
diskMB uint32 (Required) Amount of ephemeral disk space to allocate to a task (must be always > 0, but the actual limit is configurable).
networkMbps uint32 (Required) Amount of network bandwidth to allocate to an individual task (must be always > 0, but the actual limit is configurable).
allocateIP bool (Deprecated) IP always allocated.
efsMounts ContainerResources.EfsMount repeated (Optional) EFS mounts.
shmSizeMB uint32 (Optional) Size of shared memory /dev/shm. If not set, a default value will be provided. A provided value must be less than or equal to amount of memory allocated.
signedAddressAllocations SignedAddressAllocation repeated (Optional) IP addresses allocated from Titus VPC IP service to be assigned to tasks.
pool string The name of the pool of static IPs to select from
staticIPAddressIDs StaticIPAddressIDs The list of addresses to use for this job

ContainerResources.EfsMount

Field Type Label Description
efsId string (Required) EFS id
mountPoint string (Required) EFS mount point
mountPerm MountPerm (Required) EFS mount permission mask
efsRelativeMountPoint string (Optional) EFS relative mount point

Image

To reference an image, a user has to provide an image name and a version. A user may specify a version either with a tag value (for example 'latest') or a digest. When submitting a job, a user should provide either a tag or a digest value only (not both of them).

For example, docker images can be referenced by {name=titus-examples, tag=latest}. A user could also choose to provide only the digest without a tag. In this case, the tag value would be empty.

Field Type Label Description
name string (Required) Image name.
tag string (Required if digest not set) Image tag.
digest string (Required if tag not set) Image digest.

Job

Job entity is returned by query operations only.

Field Type Label Description
id string (Required) The unique id (UUID).
jobDescriptor JobDescriptor (Required) Job descriptor.
status JobStatus (Required) Last known job state.
statusHistory JobStatus repeated (Required) State transition history.
version Version (Optional) Job version associated with the given entity. Revision numbers for jobs and tasks are created from the same ordered number generator.

JobAttributesDeleteRequest

Field Type Label Description
jobId string
keys string repeated

JobAttributesUpdate

Field Type Label Description
jobId string
attributes JobAttributesUpdate.AttributesEntry repeated

JobAttributesUpdate.AttributesEntry

Field Type Label Description
key string
value string

JobCapacityUpdate

Field Type Label Description
jobId string
Capacity Capacity

JobCapacityUpdateWithOptionalAttributes

Field Type Label Description
jobId string
jobCapacityWithOptionalAttributes JobCapacityWithOptionalAttributes

JobCapacityWithOptionalAttributes

Field Type Label Description
min google.protobuf.UInt32Value (Optional) Minimum number of tasks to run (min >= 0)
max google.protobuf.UInt32Value (Optional) Maximum number of tasks that can be run (max >= desired)
desired google.protobuf.UInt32Value (Optional) Desired number of tasks to run (min <= desired <= max)

JobChangeNotification

Job event stream consists of two phases. In the first phase, a snapshot of the current state (a job and its tasks) is streamed, and it is followed by the SnapshotEnd notification marker. In the second phase, job/task state updates are sent. When a job is terminated, the stream completes.

Field Type Label Description
jobUpdate JobChangeNotification.JobUpdate
taskUpdate JobChangeNotification.TaskUpdate
snapshotEnd JobChangeNotification.SnapshotEnd
keepAliveResponse KeepAliveResponse Supported only by ObserveJobsWithKeepAlive event stream.
timestamp uint64 Event creation timestamp.

JobChangeNotification.JobUpdate

Emitted when a new job is created or when any of the job's attributes change.

Field Type Label Description
job Job
archived bool For internal usage only. Set to true if a job is finished and is moved to archive storage.

JobChangeNotification.SnapshotEnd

A notification marker that indicates that all known jobs were streamed to the client.

JobChangeNotification.TaskUpdate

Emitted when a task is created or its state has changed.

Field Type Label Description
task Task
movedFromAnotherJob bool movedFromAnotherJob will be true on the first event for the target Job after a task is moved between jobs. task.jobId will be the destination job, and it will include a 'task.movedFromJob' entry in its taskContext map with the source jobId.
archived bool For internal usage only. Set to true if a task is finished and is moved to archive storage.

JobDataRecord

Field Type Label Description
metadata DataRecordMetadata
job Job

JobDescriptor

Job descriptor contains the full job specification (batch or service) that is used to run a job.

Field Type Label Description
owner Owner (Optional) Owner of a job (see Owner entity description for more information).
applicationName string (Required) Free form name.
capacityGroup string (Optional) Capacity group associated with a job. If not set, defaults to 'DEFAULT'.
jobGroupInfo JobGroupInfo (Optional) Mostly relevant for service jobs, but applicable to batch jobs as well, allows a user to specify own unique identifier for a job (see JobGroupInfo for more information).
attributes JobDescriptor.AttributesEntry repeated (Optional) Arbitrary set of key/value pairs. Names starting with 'titus' (case does not matter) are reserved for an internal use.
container Container (Required) Container to be executed for a job.
batch BatchJobSpec Batch job specific descriptor.
service ServiceJobSpec Service job specific descriptor.
disruptionBudget JobDisruptionBudget (Optional) Job disruption budget. If not defined, a job type specific (batch or service) default is set.
networkConfiguration NetworkConfiguration (Optional) Networking configuration. If not defined, sane defaults are provided by the backend.
extraContainers BasicContainer repeated (Optional) Extra Containers can be specificed to run alongside the main container in a "pod" (similar to k8s pods). Additional containers can be specified in this field, and they will be launched together with the main container, sharing its resources (network/ram/cpu/gpu/etc). Startup ordering happens in the following way: 1. Titus System Services 2. Platform Sidecars (configured below) 3A. extraContiners (this field) 3B. The main container (container field)
volumes Volume repeated (Optional) An array of Volumes to be used by one or more of the containers. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#volume-v1-core Note that Titus only supports a subset of storage drivers.
platformSidecars PlatformSidecar repeated (Optional) Array of platform sidecars to launch alongside the task. These platform sidecars are always ordered after Titus System Services, and before any user container (main or extraContainers).

JobDescriptor.AttributesEntry

Field Type Label Description
key string
value string

JobDisruptionBudget

Job disruption budget, associated (optionally) with a job.

Field Type Label Description
selfManaged JobDisruptionBudget.SelfManaged
availabilityPercentageLimit JobDisruptionBudget.AvailabilityPercentageLimit
unhealthyTasksLimit JobDisruptionBudget.UnhealthyTasksLimit
relocationLimit JobDisruptionBudget.RelocationLimit
rateUnlimited JobDisruptionBudget.RateUnlimited
ratePercentagePerHour JobDisruptionBudget.RatePercentagePerHour
ratePerInterval JobDisruptionBudget.RatePerInterval
ratePercentagePerInterval JobDisruptionBudget.RatePercentagePerInterval
timeWindows TimeWindow repeated (Optional) Time window to which relocation process is restricted.
containerHealthProviders ContainerHealthProvider repeated (Optional) Container health providers to use when relocating a container.

JobDisruptionBudget.AvailabilityPercentageLimit

The minimum required percentage of tasks in a healthy state. Tasks will not be terminated by the eviction service if this limit would be violated.

Field Type Label Description
percentageOfHealthyContainers double

JobDisruptionBudget.RatePerInterval

Allow up to the given amount of relocations per the time interval.

Field Type Label Description
intervalMs uint64
limitPerInterval uint32

JobDisruptionBudget.RatePercentagePerHour

Allow up to the given percentage of tasks to be relocated within an hour.

Field Type Label Description
maxPercentageOfContainersRelocatedInHour double

JobDisruptionBudget.RatePercentagePerInterval

Percentage of containers that can be relocated within a time interval. The number of containers is determined during each evaluation, and the number is based on the current desired job size. If the job size changes, the percentage of containers changes accordingly. For example, setting / interval to 60000 (1 minute) and ratePercentagePerInterval to 5 (5%) would allow only for up to 5% of all containers to be relocated every minute, given the other criteria are met. For a job with a desired size of 100, 5 container relocations per minute would be allowed. If the desired job size changes to 200, the relocation rate increases to 10 containers per minute.

Field Type Label Description
intervalMs uint64
percentageLimitPerInterval double

JobDisruptionBudget.RateUnlimited

No limits on how many containers in a job may be relocated, provided the other disruption budget constraints are not violated.

JobDisruptionBudget.RelocationLimit

Maximum number of times a task can be relocated (only batch tasks, which have a maximum execution time).

Field Type Label Description
limit uint32

JobDisruptionBudget.SelfManaged

Self managed task relocation policy for users that would like to orchestrate custom termination logic. If the containers are not terminated within the confgured amount of time, the system default migration policy is assumed instead.

Field Type Label Description
relocationTimeMs uint64 Amount of time a container owner has to migrate their containers. A maximum will be enforced by the system.

JobDisruptionBudget.UnhealthyTasksLimit

The maximum required amount of tasks in an unhealthy state. Tasks will not be terminated by the eviction service if this limit would be violated.

Field Type Label Description
limitOfUnhealthyContainers uint32

JobDisruptionBudgetUpdate

Field Type Label Description
jobId string
disruptionBudget JobDisruptionBudget

JobGroupInfo

Additional information for building a supplementary job identifier, as the 'applicationName' can be shared by many jobs running at the same time in Titus. By setting 'JobGroupInfo', a user may create a job id that is guaranteed to be unique accross all currently running Titus jobs. The uniquness is checked if any of the attributes in this record is a non empty string. The full name is built as: '<application_name>-<stack>-<detail>-<sequence>.

Field Type Label Description
stack string (Optional) Any text. It is recommended (but not required), that the value does not include the '-' character.
detail string (Optional) Any text. It is recommended (but not required), that the value does not include the '-' character.
sequence string (Optional) Any text. It is recommended (but not required), that the value does not include the '-' character.

JobId

Field Type Label Description
id string

JobIds

Field Type Label Description
id string repeated

JobProcessesUpdate

Field Type Label Description
jobId string
serviceJobProcesses ServiceJobSpec.ServiceJobProcesses

JobQuery

Job query request. The query result is limited to the active data set. Finished jobs/tasks are not evaluated when the query is executed.

Field Type Label Description
page Page (Required) Requested page number/size.
filteringCriteria JobQuery.FilteringCriteriaEntry repeated (Optional) Collection of fields and their values for a filter. Available query criteria: jobIds - list of comma separated job ids taskIds - list of comma separated task ids owner - job owner applicationName - job application name imageName - image name imageTag - image tag capacityGroup - job assigned capacity group jobGroupStack - job group stack jobGroupDetail - job group details jobGroupSequence - job group sequence jobType - job type (batch or service) attributes - comma separated job attribute key/value pairs (for example "key1,key2:value2;k3:value3") attributes.op - logical 'and' or 'or' operators, which should be applied to multiple attributes specified in the query jobState - job state (one) taskStates - task states (multiple, comma separated). Empty value is the same as no value set. taskStateReasons - reasons associated with task states (multiple, comma separated) needsMigration - if set to true, return only jobs with tasks that require migration
fields string repeated (Optional) If set, only field values explicitly specified in this parameter will be returned This does not include certain attributes like 'jobId', 'appName' which are always returned. If the nested field value is provided, only the explicitly listed nested fields will be returned. For example: tasks.taskId rule will result in including just this value when encoding Task entity.

JobQuery.FilteringCriteriaEntry

Field Type Label Description
key string
value string

JobQueryResult

Field Type Label Description
items Job repeated
pagination Pagination

JobStatus

Composite data structure holding both job state information and the reason of the transition to this state.

Field Type Label Description
state JobStatus.JobState (Required) Job state
reasonCode string (Optional) An identifier of an event that caused a transition to this state. Each job manager can introduce its own set of reason codes. As of now, there are no common reason codes defined for jobs.
reasonMessage string (Optional) Textual description accompanying the 'reasonCode'.
timestamp uint64 Time when a transition to a state happened.

JobStatusUpdate

Field Type Label Description
id string
enableStatus bool

LogLocation

Task log locations

Field Type Label Description
ui LogLocation.UI (Required) Log access via UI.
liveStream LogLocation.LiveStream (Optional) Live log access. Provided only for running tasks.
s3 LogLocation.S3 (Required) S3 log location.

LogLocation.LiveStream

URL address to a container log service. When a container is running, its stdout/stderr or any other file in the '/logs' folder can be acccessed via this endpoint. The endpoint becomes unavailable when the container terminates.

A user should provide the 'f' query parameter to specify a file to download. If the 'f' query parameter is net set, it defaults to 'stdout'. The file path must be relative to the '/logs' folder.

Field Type Label Description
url string (Required) Live log URL.

LogLocation.S3

Location of S3 folder containing container's log files.

Field Type Label Description
accountName string (Required) AWS account name.
accountId string (Required) AWS account id.
region string (Required) AWS region.
bucket string (Required) S3 bucket.
key string (Required) The key prefix in the S3 bucket. The assumption is that the consumer finds all objects based on this key prefix.

LogLocation.UI

URL pointing to a UI based log viewer.

Field Type Label Description
url string (Required) UI URL.

MigrationDetails

Migration details

Field Type Label Description
needsMigration bool true when the the task needs to be migrated to another agent.
deadline uint64 The deadline time that the owner must migrate their task by or the system will automatically do it. This value is irrelevant if 'needsMigration' is set to false and will default to the value '0'.
started uint64 Time at which the migration decision was made. This value is irrelevant if 'needsMigration' is set to false and will default to the value '0'.

MigrationPolicy

Migration polices.

Field Type Label Description
systemDefault MigrationPolicy.SystemDefault
selfManaged MigrationPolicy.SelfManaged

MigrationPolicy.SelfManaged

The self managed policy where the owner needs to migrate the tasks.

MigrationPolicy.SystemDefault

The system default migration policy.

NetworkConfiguration

Network settings for tasks launched by this job

Field Type Label Description
networkMode NetworkConfiguration.NetworkMode Sets the overall network mode for all containers for a Task launched by this job

ObserveJobsQuery

The filtering criteria is applied to both Job and Task events. If a criteria applies to task fields, the stream will include both task events matching it, and events for jobs with tasks that match it. The opposite is also true, e.g.: a criteria on applicationName (a job field) will include both job events matching it, and events for tasks belonging to a job that matches it.


Filtering criteria

Field Type Label Description
filteringCriteria ObserveJobsQuery.FilteringCriteriaEntry repeated (Optional) Collection of fields and their values for a filter. Available query criteria: jobIds - list of comma separated job ids taskIds - list of comma separated task ids owner - job owner applicationName - job application name imageName - image name imageTag - image tag capacityGroup - job assigned capacity group jobGroupStack - job group stack jobGroupDetail - job group details jobGroupSequence - job group sequence jobType - job type (batch or service) attributes - comma separated job attribute key/value pairs. The same key may occur multiple times, with different values (any value matches the filter). A value may be omitted, in which case if the key occurs only once, only presence of the key is checked, without value comparison (otherwise the value is an empty string). Example filters: * 'key1' - matches, if the key is present * 'key2:value2' - matches if the attributes contain key 'key2' with value 'value2' * 'key3,key3:value3a,key3:value3b' - matches if the attributes contain key 'key3' with value '' or 'value3a' or 'value3b' All the above can be passed together as 'key1,key2:value2,key3,key3:value3a,key3:value3b' attributes.op - logical 'and' or 'or' operators, which should be applied to multiple attributes specified in the query jobState - job state (one) taskStates - task states (multiple, comma separated). Empty value is the same as no value set. taskStateReasons - reasons associated with task states (multiple, comma separated) needsMigration - if set to true, return only jobs with tasks that require migration
jobFields string repeated (Optional) If set, only job field values explicitly given in this parameter will be returned
taskFields string repeated (Optional) If set, only task field values explicitly given in this parameter will be returned

ObserveJobsQuery.FilteringCriteriaEntry

Field Type Label Description
key string
value string

ObserveJobsWithKeepAliveRequest

Field Type Label Description
query ObserveJobsQuery
keepAliveRequest KeepAliveRequest

Owner

An owner of a job.

Field Type Label Description
teamEmail string (Required) An owner's email address.

SecurityProfile

Container security profile.

Field Type Label Description
securityGroups string repeated (Required) Security groups associated with a container. The expected number of security groups is between 1 and 6.
iamRole string (Required) IAM role.
attributes SecurityProfile.AttributesEntry repeated (Optional) Additional security attributes. Key names starting with 'titus.' are reserved by Titus.

SecurityProfile.AttributesEntry

Field Type Label Description
key string
value string

ServiceJobSpec

Service job specification.

Field Type Label Description
capacity Capacity (Required) Number of tasks to run. If a scaling policy is defined, the number of tasks created will be always within min/max range.
enabled bool (Optional) Job enable/disable status. If a job is disabled, auto-scaling policies are not applied.
retryPolicy RetryPolicy (Required) Task rescheduling policy in case of failure.
migrationPolicy MigrationPolicy (Optional) Migration policy for how the tasks will be migrated during an infrastructure change. If not set, defaults to SystemDefault.
serviceJobProcesses ServiceJobSpec.ServiceJobProcesses (Optional) Job scaling activity configurations.

ServiceJobSpec.ServiceJobProcesses

Configuration of service job processes

Field Type Label Description
disableIncreaseDesired bool Prevents increasing the Job's desired capacity. Existing tasks that exit such as the process exiting will still be replaced.
disableDecreaseDesired bool Prevents decreasing the Job's desired capacity. Existing tasks that exit such as the process exiting will still be replaced.

Task

Task is an entity representing a running container.

Field Type Label Description
id string (Required) The Id of the task.
jobId string (Required) Id of a job that owns this task.
taskContext Task.TaskContextEntry repeated (Required) Includes: * agent execution environment: 'agent.region', 'agent.zone', 'agent.host', 'agent.instanceId' * job type specific information: 'task.index', 'task.resubmitOf' (id of task which this task is replacing), 'task.originalId' (id of task which this task is a replacement)
status TaskStatus (Required) Last known state of this task.
statusHistory TaskStatus repeated (Required) State transition history.
logLocation LogLocation (Required) Container logs.
migrationDetails MigrationDetails (Required) Migration details.
attributes Task.AttributesEntry repeated (Optional) User defined key/value pairs.
version Version (Optional) Task version associated with the given entity. Revision numbers for jobs and tasks are created from the same ordered number generator.

Task.AttributesEntry

Field Type Label Description
key string
value string

Task.TaskContextEntry

Field Type Label Description
key string
value string

TaskAttributesDeleteRequest

Field Type Label Description
taskId string
keys string repeated

TaskAttributesUpdate

Field Type Label Description
taskId string
attributes TaskAttributesUpdate.AttributesEntry repeated

TaskAttributesUpdate.AttributesEntry

Field Type Label Description
key string
value string

TaskDataRecord

Field Type Label Description
metadata DataRecordMetadata
job Job
task Task

TaskId

Field Type Label Description
id string

TaskIds

Field Type Label Description
id string repeated

TaskKillRequest

Field Type Label Description
taskId string (Required) Task to kill.
shrink bool (Optional) Should job size be reduced
preventMinSizeUpdate bool (Optional) If set to true, and this is a terminate and shrink request ('shrink' set to true), reject the kill request if it would cause the job size go below the current minimum size. Otherwise, the job size minimum size is decremented by 1.

TaskMoveRequest

Field Type Label Description
sourceJobId string (Required) Source Job(Service) distinct from target job which is the source of the task.
targetJobId string (Required) Target Job(Service) distinct from source job which is the recipient of the task.
taskId string (Required) Task to move. Task must be in started state.

TaskQuery

Task query request. The query result is limited to the active data set. Finished jobs/tasks are not evaluated when the query is executed.

Field Type Label Description
page Page (Required) Requested page number/size.
filteringCriteria TaskQuery.FilteringCriteriaEntry repeated (Optional) Collection of fields and their values for a filter. Available query criteria: jobIds - list of comma separated job ids taskIds - list of comma separated task ids owner - job owner applicationName - job application name imageName - image name imageTag - image tag capacityGroup - job assigned capacity group jobGroupStack - job group stack jobGroupDetail - job group details jobGroupSequence - job group sequence jobType - job type (batch or service) attributes - comma separated job attribute key/value pairs. The same key may occur multiple times, with different values (any value matches the filter). A value may be omitted, in which case if the key occurs only once, only presence of the key is checked, without value comparison (otherwise the value is an empty string). Example filters: * 'key1' - matches, if the key is present * 'key2:value2' - matches if the attributes contain key 'key2' with value 'value2' * 'key3,key3:value3a,key3:value3b' - matches if the attributes contain key 'key3' with value '' or 'value3a' or 'value3b' All the above can be passed together as 'key1,key2:value2,key3,key3:value3a,key3:value3b' attributes.op - logical 'and' or 'or' operators, which should be applied to multiple attributes specified in the query jobState - job state (one) taskStates - task states (multiple, comma separated). Empty value is the same as no value set. taskStateReasons - reasons associated with task states (multiple, comma separated) needsMigration - if set to true, return only tasks that require migration skipSystemFailures - a filter for finished tasks only (does not affect non-finished tasks). If set to true, a finished task that failed due to a system error is filtered out. System error codes are specified in the TaskStatus type definition. These are container failures due to Titus internal issues.
fields string repeated (Optional) If set, only field values explicitly given in this parameter will be returned

TaskQuery.FilteringCriteriaEntry

Field Type Label Description
key string
value string

TaskQueryResult

Field Type Label Description
items Task repeated
pagination Pagination

TaskStatus

Field Type Label Description
state TaskStatus.TaskState (Required) Task state
reasonCode string (Optional) An identifier of an event that caused a transition to this state. Each job manager can introduce its own set of reason codes. Below are the predefined (common) set of reason codes associated with task state 'Finished': * 'normal' - task completed with the exit code 0 * 'failed' - task completed with a non zero error code * 'killed' - task was explicitly terminated by a user * 'scaledDown' - task was terminated as a result of job scaling down * 'stuckInState' - task was terminated, as it did not progress to the next state in the expected amount of time * 'runtimeLimitExceeded' - task was terminated, as its runtime limit was exceeded * 'lost' - task was lost, and its final status is unknown * 'invalidRequest' - invalid container definition (security group, image name, etc) * 'crashed' - container crashed due to some internal system error * 'transientSystemError' - transient error, not agent specific (for example AWS rate limiting) * 'localSystemError' - an error scoped to an agent instance on which a container was run. The agent should be quarantined or terminated. * 'unknownSystemError' - unknown error which cannot be classified either as local/non-local or transient. If there are multiple occurences of this error, the agent should be quarantined or terminated.
reasonMessage string (Optional) Textual description accompanying the 'reasonCode'.
timestamp uint64 Time when a transition to a state occurred.
containerState TaskStatus.ContainerState repeated An array of ContainerStates, reporting the health of individual containers

TaskStatus.ContainerState

Field Type Label Description
containerName string Name of the container
containerHealth TaskStatus.ContainerState.ContainerHealth Enum representing if the individual container is healthy
containerImage BasicImage Struct containing image information about the container

JobStatus.JobState

State information associated with a job.

Name Number Description
Accepted 0 A job is persisted in Titus and is ready to be scheduled.
KillInitiated 1 A job still has running tasks that were requested to be terminated. No more tasks for this job are deployed. Job policy update operations are not allowed.
Finished 2 A job has no running tasks, and new tasks cannot be created. Job policy update operations are not allowed.

NetworkConfiguration.NetworkMode

Name Number Description
UnknownNetworkMode 0 Unknown, the backend will have to chose a sane default base on other inputs
Ipv4Only 1 IPv4 only means the task will not get an ipv6 address, and will only get a unique v4.
Ipv6AndIpv4 2 IPv6 And IPv4 (True Dual Stack), each task gets a unique v6 and v4 address.
Ipv6AndIpv4Fallback 3 IPv6 and IPv4 Fallback uses the Titus IPv4 "transition mechanism" to give v4 connectivity transparently without providing every container their own IPv4 address. From a spinnaker/task perspective, only an IPv6 address is allocated to the task.
Ipv6Only 4 IPv6 Only is for true believers, no IPv4 connectivity is provided.
HighScale 5 HighScale is a special mode, which applies opinionated network settings to the workload for maximum scalability for the network. Enabling this mode removes the option for the user to select which subnets or security groups in use by the workload. Instead, special HighScale subnets and security groups are chosen.

TaskStatus.ContainerState.ContainerHealth

Name Number Description
Unset 0 Unset means we haven't gotten any signal yet about healthiness
Unhealthy 1 Unhealthy means the container is no longer passing its healthcheck
Healthy 2 Healthy means the container is passing its healthcheck

TaskStatus.TaskState

State information associated with a task.

Name Number Description
Accepted 0 A task was passed to the scheduler but has no resources allocated yet.
Launched 1 A task had resources allocated and was passed to Mesos.
StartInitiated 2 An executor provisioned resources for a task.
Started 3 The container was started.
KillInitiated 4 A user requested the task to be terminated. An executor is stopping the task and releasing its allocated resources.
Disconnected 5 No connectivity between Mesos and an agent running a task. The task's state cannot be determined until the connection is established again.
Finished 6 A task completed or was forced by a user to be terminated. All resources previously assigned to this task are released.

JobManagementService

Method Name Request Type Response Type Description
CreateJob JobDescriptor JobId Create a new job
UpdateJobCapacity JobCapacityUpdate .google.protobuf.Empty Modify the number of instances for a service job.
UpdateJobCapacityWithOptionalAttributes JobCapacityUpdateWithOptionalAttributes .google.protobuf.Empty Modify job capacity for a service job. It allows you to specify only values (min / max / desired) that need to be updated.
UpdateJobStatus JobStatusUpdate .google.protobuf.Empty Mark a job as enabled or disabled. Disabled jobs are not auto-scaled.
UpdateJobProcesses JobProcessesUpdate .google.protobuf.Empty Update service job processes such as disable increase/decrease instance count
UpdateJobDisruptionBudget JobDisruptionBudgetUpdate .google.protobuf.Empty Update a job disruption budget.
FindJobs JobQuery JobQueryResult Return a collection of jobs matching the given criteria. The query result is limited to the active data set. Finished jobs/tasks are not evaluated when the query is executed.
FindJob JobId Job Return a job with given id.
ObserveJob JobId JobChangeNotification stream On subscription, sends complete job (definition and active tasks). Next, send distinct job definition or task state chage notifications. The stream is closed by the server only when the job is finished, which happens after the 'JobFinished' notification is delivered.
ObserveJobs ObserveJobsQuery JobChangeNotification stream Equivalent to ObserveJob, applied to all active jobs. This stream never completes.
ObserveJobsWithKeepAlive ObserveJobsWithKeepAliveRequest stream JobChangeNotification stream ObserveJobsWithKeepAlive extends the ObserveJobs endpoint behavior by supporting keep alive mechanism in the channel. This stream never completes.
KillJob JobId .google.protobuf.Empty Terminate all running tasks of a job, and than terminate the job.
UpdateJobAttributes JobAttributesUpdate .google.protobuf.Empty Update the attributes of a job. This will either create new attributes or replace existing ones with the same key.
DeleteJobAttributes JobAttributesDeleteRequest .google.protobuf.Empty Delete the attributes of a job.
FindTask TaskId Task Get a task with the specified id.
FindTasks TaskQuery TaskQueryResult Return a collection of tasks specified in the 'TaskQuery' request matching the given criteria. The query result is limited to the active data set. Finished jobs/tasks are not evaluated when the query is executed.
KillTask TaskKillRequest .google.protobuf.Empty Terminate a task with the given id. Depending on job type, the task might be immediately restarted/replaced with a new one.
UpdateTaskAttributes TaskAttributesUpdate .google.protobuf.Empty Update the attributes of a task. This will either create new attributes or replace existing ones with the same key.
DeleteTaskAttributes TaskAttributesDeleteRequest .google.protobuf.Empty Delete the attributes of a task.
MoveTask TaskMoveRequest .google.protobuf.Empty Move a task from one service job to another. Source and destination jobs must be service jobs, and compatible. Jobs are compatible when their JobDescriptors are identical, ignoring the following values:
  • owner * applicationName * jobGroupInfo (stack, details, sequence) * disruptionBudget * Any attributes not prefixed with titus. or titusParameter. * Any container.attributes not prefixed with titus. or titusParameter. * All information specific to service jobs (JobSpec): Capacity, RetryPolicy, MigrationPolicy, etc |

Scalar Value Types

.proto Type Notes C++ Java Python Go C# PHP Ruby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)