Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api,ui,sdk): Make CPU limits configurable #381

Merged

Conversation

deadlycoconuts
Copy link
Contributor

@deadlycoconuts deadlycoconuts commented May 21, 2024

Context

Similar to caraml-dev/merlin#586, this PR aims to make CPU limits configurable for the end user.

As of present, users are not able to configure the CPU limits of the pods in which Turing routers/enrichers/ensemblers (docker and pyfunc) are deployed in - they are instead determined automatically on the platform-level (Turing API server). Depending on how the API server has been configured, one of the following happens:

  • the CPU limit of a component is set as its CPU request value, multiplied by a scaling factor (e.g. 2 CPU * 1.5) or,
    • Note that this is the existing way memory limits are automatically set by the Turing API server
  • the CPU limit is left unset

This PR introduces a new workflow which would allow users to instead override the platform-level CPU limits (described in the paragraph above) set on a component. This workflow is available via the UI, SDK and by extension, directly calling the API endpoint of the API server.

UI:
Screenshot 2024-05-24 at 4 27 46 PM

Screenshot 2024-05-24 at 3 29 57 PM

SDK:

config = router.config
config.resource_request = ResourceRequest(
    min_replica=1,
    max_replica=3,
    cpu_request="100m",
    cpu_limit="2",
    memory_request="512Mi",
)

router.update(config)

In addition, this PR adds a new configuration, DefaultEnvVarsWithoutCPULimits, which is a list of env vars that automatically get added to all Turing routers/enrichers/ensemblers (docker and pyfunc) when CPU limits are not set. This allows the Turing API server's operators to set env vars platform-wide that can potentially improve these deployments' performance, e.g. env vars involving concurrency.

Modifications

  • api/turing/cluster/knative_service.go - Removal of platform-level fields from the KnativeService struct
  • api/turing/cluster/servicebuilder/service_builder.go - Addition of platform-level configs to clusterSvcBuilder and new helper methods to set default env vars when cpu limits are not explicitly set and when the cpu limit scaling factor is set as 0
  • api/turing/config/config.go - Addition of the new field DefaultEnvVarsWithoutCPULimits
  • sdk/turing/router/config/resource_request.py - Addition of a new cpu limit field to the resource request class
  • ui/src/router/components/form/components/CPULimitsFormGroup.js - Addition of a new form group to allow cpu limits to be specified on the UI

@deadlycoconuts deadlycoconuts added the enhancement New feature or request label May 21, 2024
@deadlycoconuts deadlycoconuts self-assigned this May 21, 2024
Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.06%. Comparing base (0d6f578) to head (1f9187e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #381      +/-   ##
==========================================
+ Coverage   62.19%   68.06%   +5.87%     
==========================================
  Files         124      149      +25     
  Lines        9755    11809    +2054     
==========================================
+ Hits         6067     8038    +1971     
- Misses       2954     3033      +79     
- Partials      734      738       +4     
Flag Coverage Δ
api-test 62.26% <ø> (+0.07%) ⬆️
sdk-test-3.10 95.92% <100.00%> (?)
sdk-test-3.8 95.92% <100.00%> (?)
sdk-test-3.9 95.92% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch from 916ed3f to 0cd4838 Compare May 23, 2024 04:29
@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch from f6cb0c4 to 08e0064 Compare May 23, 2024 07:29
@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch from 08e0064 to 4104721 Compare May 23, 2024 07:32
@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch from 2a60787 to 1ac8cde Compare May 24, 2024 03:18
@deadlycoconuts deadlycoconuts requested a review from leonlnj May 24, 2024 08:33
@deadlycoconuts deadlycoconuts marked this pull request as ready for review May 24, 2024 08:33
@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch 2 times, most recently from 03c8e30 to 78dcb4f Compare May 27, 2024 04:19
@deadlycoconuts deadlycoconuts force-pushed the make_cpu_limits_configurable branch from b11e4eb to 89caaa2 Compare May 27, 2024 05:39
Copy link
Contributor

@leonlnj leonlnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks LGTM. +1 for moving platform default out and putting it as part of the service builder

@deadlycoconuts deadlycoconuts merged commit cbebbd6 into caraml-dev:main Jun 3, 2024
29 checks passed
@deadlycoconuts deadlycoconuts deleted the make_cpu_limits_configurable branch June 6, 2024 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants