Releases: cortexlabs/cortex
Releases · cortexlabs/cortex
v0.42.1
v0.42.1
New features
- Add support for new set of EC2 instances amongst which the
c6
andg5
families can be found #2414 (RobertLucian)
Bug fixes
- Esthetic fix where the VPC CNI logging functionality was triggering warn logs when running the
cortex
CLI #2443 (RobertLucian)
Misc
- Update Cortex dependency versions; eksctl, EKS to 1.22, AWS IAM, Python, etc #2414 (RobertLucian, deliahu)
v0.42.0
v0.42.0
New features
- Add support for the Classic Load Balancer for APIs; the Network Load Balancer remains the default (docs) #2413 #2414 (RobertLucian)
Bug fixes
- Fix Async API http/tcp probes when probing the empty root path (
/
) #2407 (RobertLucian) - Fix nil pointer exception in the
cortex cluster export
command #2415 #2414 (RobertLucian) - Ensure that user-specified environment variables are ordered deterministically in the Kubernetes deployment spec #2411 (deliahu)
Misc
- Ensure that the batch on-job-complete request contains a valid JSON body #2409 (RobertLucian)
v0.41.0
v0.41.0
New features
Misc
Bug fixes
- Wait for in-flight requests to reach zero before terminating the proxy container #2402 (deliahu)
- Fix
cortex get --env
command #2404 (deliahu) - Fix cluster price estimate during
cortex cluster up
for spot node groups with on-demand base capacity #2406 (RobertLucian)
Nucleus Model Server
We have released v0.1.0 of the Nucleus model server!
Nucleus is a model server for TensorFlow and generic Python models. It is compatible with Cortex clusters, Kubernetes clusters, and any other container-based deployment platforms. Nucleus can also be run locally via Docker compose.
Some of Nucleus's features include:
- Generic Python models (PyTorch, ONNX, Sklearn, MLFlow, Numpy, Pandas, etc)
- TensorFlow models
- CPU and GPU support
- Serve models directly from S3 paths
- Configurable multiprocessing and multithreadding
- Multi-model endpoints
- Dynamic server-side request batching
- Automatic model reloading when new model versions are uploaded to S3
- Model caching based on LRU policy (on disk and memory)
- HTTP and gRPC support
v0.40.0
v0.40.0
New features
- Support concurrency for Async APIs (via the
max_concurrency
field) #2376 #2200 (miguelvr) - Add graphs for cluster-wide and per-API cost breakdowns to the cluster metrics dashboard #2382 #1962 (RobertLucian)
- Allow worker nodes containing Async APIs to scale to zero (now a shared async gateway is used, which runs on the operator node group) #2380 #2279 (vishalbollu)
- Add
cortex describe API_NAME
command for Realtime and Async APIs #2368 #2320 #2359 (RobertLucian) - Support updating the priority of an existing node group #2369 #2254 (vishalbollu)
Misc
- Improve the reporting of API statuses #2368 #2320 #2359 (RobertLucian)
- Remove the default readiness probe on the target port if a custom readiness probe is specified in the API spec #2379 (RobertLucian)
v0.39.1
v0.39.0
v0.39.0
New features
- Add
cortex cluster health
command to show the health of the cluster's components #2313 #2029 (miguelvr) - Forward request headers to AsyncAPIs #2329 #2296 (miguelvr)
- Add metrics dashboard for Task APIs #2311 #2322 (RobertLucian)
Reliability
- Enable larger cluster sizes (up to 1000 nodes with 10000 pods) by enabling IPVS #2357 #1834 (RobertLucian)
- Automatically limit the rate at which nodes are added to avoid overloading the Kubernetes API server #2331 #2338 #2314 (RobertLucian)
- Ensure cluster autoscaler availability #2347 #2346 (RobertLucian)
- Improve istiod availability at large scale #2342 #2332 (RobertLucian)
- Reduce metrics shown in
cortex get
to improve scalability and reliability of the command #2333 #2319 (vishalbollu) - Show aggregated node statistics in the cluster dashboard #2336 #2318 (RobertLucian)
Bug fixes
- Ensure that the
Content-Type
header is properly set toapplication/json
for responses to Async API submissions #2323 (vishalbollu) - Fix pod autoscaler scale-to-zero edge cases #2350 (miguelvr)
- Allow autoscaling configuration to be updated on a running API #2355 (RobertLucian)
- Fix node group priority calculation for the cluster autoscaler #2358 #2343 (RobertLucian, deliahu)
- Allow the
node_groups
selector to be updated in a running API #2354 (RobertLucian) - Fix the active replicas graph on the Async API dashboard #2328 (RobertLucian)
Docs
- Add a guide for running in production #2334 #2317 (vishalbollu)
- Add a guide for configuring an HTTP API Gateway #2341 (deliahu)
Misc
- Add a graph of the number of active and queued requests to the Async API dashboard #2326 #1960 (deliahu)
- Add a graph of the number of instances to the cluster dashboard #2336 #2318 (RobertLucian)
- Ensure that
cortex cluster info --print-config
displays YAML that is consumable bycortex cluster configure
#2324 (vishalbollu)
v0.38.0
v0.38.0
New features
- Support autoscaling down to zero replicas for Realtime APIs #2298 #445 (miguelvr)
- Allow
ssl_certificate_arn
,api_load_balancer_cidr_white_list
, andoperator_load_balancer_cidr_white_list
to be updated on an existing cluster (via thecortex cluster configure
command) #2305 #2107 (vishalbollu) - Allow Prometheus's instance type to be configured (docs) #2307 #2285 (RobertLucian)
- Allow multiple Inferentia chips to be assigned to a single container #2304 #1123 (deliahu)
Bug fixes
- Fix cluster autoscaler's nodegroup priority calculation #2309 (RobertLucian)
Misc
v0.37.0
v0.37.0
New features
- Support ARM instance types #2268 #1528 (RobertLucian)
- Add
cortex cluster configure
command to add, remove, or scale nodegroups on a running cluster #2246 #2096 (RobertLucian) - Add
cortex cluster info --print-config
command to print the current configuration of a running cluster #2246 (RobertLucian) - Add metrics dashboard for Async APIs #2242 #1958 (miguelvr)
- Support
cortex refresh
command for Async APIs #2265 #2237 (deliahu)
Breaking changes
- The
cortex cluster scale
command has been replaced by thecortex cluster configure
command.
Bug fixes
- Fix Async API metrics reporting for non-200 response status codes #2266 (miguelvr)
- Make batch job metrics persistence resilient to instance termination #2247 #2041 (vishalbollu)
- Make network validations during
cortex cluster up
more permissive (to avoid unnecessarily failing checks on GovCloud) #2248 (vishalbollu) - Fix Inferentia resource requests #2250 (RobertLucian)
Docs
- Add instructions for exporting logs and metrics to external tools (vishalbollu)
Misc
- Improve output of
cortex cluster info
for running batch jobs #2270 (deliahu) - Persist Batch job metrics regardless of job status #2244 (miguelvr)
- Support creating clusters with no node groups #2269 (deliahu)
- Improve handling of container startup errors in batch jobs with multiple containers #2260 #2217 (vishalbollu)
- Add CPU and memory resource requests to the proxy and dequeuer containers #2252 (deliahu)
v0.36.0
v0.36.0
New features
- Support running arbitrary Docker containers in all workload types (Realtime, Async, Batch, Task) #2173 (RobertLucian, miguelvr, vishalbollu, deliahu, ospillinger)
- Support autoscaling Async APIs to zero replicas #2224 #2199 (RobertLucian)
Breaking changes
- With this release, we have generalized Cortex to exclusively support running arbitrary Docker containers for all workload types (Realtime, Async, Batch, and Task). This enables the use of any model server, programming language, etc. As a result, the API configuration has been updated: the
predictor
section has been removed, thepod
section has been added, and theautoscaling
parameters have been modified slightly (depending on the workload type). See updated docs for Realtime, Async, Batch, and Task. If you'd like to to see examples of Dockerizing Python applications, see our test/apis folder. - The
cortex prepare-debug
command has been removed; Cortex now exclusively runs Docker containers, which can be run locally viadocker run
. - The
cortex patch
command as been removed; its behavior is now identical tocortex deploy
. - The
cortex logs
command now prints a CloudWatch Insights URL with a pre-populated query which can be executed to show logs from your workloads, since this is the recommended approach in production. If you wish to stream logs from a pod at random, you can usecortex logs --random-pod
(keep in mind that these logs will not include some system logs related to your workload). - gRPC support has been temporarily removed; we are working on adding it back in v0.37.
Bug fixes
- Handle exception when initializing the Python client when the default environment is not set #2225 #2223 (deliahu)
Docs
- Document how to configure SMTP in Grafana (e.g to enable email alerts) #2219 (RobertLucian)
Misc
- Show CloudWatch Insights URL with a pre-populated query in the output of
cortex logs
#2085 (vishalbollu) - Improve efficiency of batch job submission validations #2179 #2178 (deliahu)
v0.35.0
v0.35.0
New features
- Avoid processing HTTP requests that have been cancelled by the client #2135 #1453 (vishalbollu)
- Support GP3 volumes (and make GP3 the default volume type) #2130 #1843 (RobertLucian)
- Allow setting the shared memory (shm) size for Task APIs #2132 #2115 (RobertLucian)
- Implement automatic 7-day expiration for Async API responses #2151 (RobertLucian)
- Add
cortex env rename
command #2165 #1773 (deliahu)
Breaking changes
- The Python client methods which deploy Python classes have been separated from the
deploy()
method. Now,deploy()
is used only to deploy project folders, anddeploy_realtime_api()
,deploy_async_api()
,deploy_batch_api()
, anddeploy_task_api()
are for deploying Python classes. (docs) - The name of the bucket that Cortex uses for internal purposes is no longer configurable. During cluster creation, Cortex will auto-generate the bucket name (and create the bucket if it doesn't exist). During cluster deletion, the bucket will be emptied (unless the
--keep-aws-resources
flag is provided tocortex cluster down
). Users' files should not be stored in the Cortex internal bucket.
Bug fixes
- Fix the number of Async API replicas shown in
cortex cluster info
#2140 #2129 (RobertLucian)
Misc
- Delete all cortex-created AWS resources when deleting a cluster, and support the
--keep-aws-resources
flag withcortex cluster down
to preserve AWS resources #2161 #1612 (RobertLucian) - Validate the user's AWS service quota for number of security groups and in/out rules during cluster creation #2127 #2087 (RobertLucian)
- Allow specifying only one of
--min-instances
or--max-instances
withcortex cluster scale
#2149 (RobertLucian) - Use 405 status code for un-implemented Realtime API methods #2158 (RobertLucian)
- Decrease file size and project size limits #2152 (deliahu)
- Set the default environment name to the cluster name when creating a cluster #2164 #1546 (deliahu)