Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AEP for multi-dimensional pod autoscaler #5342

Merged
merged 16 commits into from
May 25, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion multidimensional-pod-autoscaler/AEP.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ Our proposed MPA framework consists of three controllers (i.e., a recommender, a
**MPA API.** Application owners specify the autoscaling configurations which include:

1. whether they only want to know the recommendations from MPA or they want MPA to directly actuate the autoscaling decisions;
2. application SLOs (e.g., in terms of latency or throughput);
2. application SLOs (e.g., in terms of latency or throughput) if there are;
3. any custom metrics if there are; and
4. other autoscaling configurations that exist in HPA and VPA (e.g., desired resource utilizations, container update policies, min and max number of replicas).

Expand Down Expand Up @@ -254,6 +254,11 @@ status:
value: metric-value
```

Note that application SLO field is **optional** and SLO is defined to be the quality of service target that an application must meet (regarding latency, throughput, and so on).
For example, if the latency SLO is in use, then it could be 99% of the requests finish within 100ms. Accordingly, the replica set can be horizontally scaled when the measured latency is greater than 100ms, i.e., violating the SLO value.
The default MPA recommender implemented in this AEP will not use the `applicationSLO` field and this field will be used by users who want to implement their own recommender. For example, an RL/ML-based recommender can have `applicationSLO` as part of the reward/loss function and thus they can have extra constraints in addition to min/max replicas.
The `applicationSLO` field is a floating point number (most application metrics like latency and throughput are floating point numbers).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that just the target number is not enough.

Let's use the example from the doc: we're targeting 99% of requests handled in 100 ms or faster.

It seems that we need to provide at least 2 pieces of information:

  1. Metric reporting 99th percentile of request handling time and
  2. 100ms (in the same units as used by the metric)

The proposed API has only 1 field of type double, so I can provide 2) (target SLO value) but I don't see a way to specify 1) (metric we're targeting).

For beta I think we ill need ability to specify something more complicated than just a metric value (for example percentile of histogram metric, support for incremental metrics). But for alpha I think we need at least ability to specify what metric we're targetting.

Please amend the proposal to allow that or explain how it's possible with the current proposdal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jbartosik , Thank you for the suggestion! We added an additional field to specify the percentage. It's a floating point number between 0 and 1. For example, if the latency SLO is in use, then it could be 99% of the requests finish within 100ms (i.e., applicationSLO = 100 and percentageSLO = 0.99). Accordingly, the replica set can be horizontally scaled when the measured latency is greater than 100ms, i.e., violating the SLO value. Similarly, throughput SLOs can be defined as throughput greater than 100/s 90% of the time, i.e., applicationSLO = 100 and percentageSLO = 0.9.

For alpha, the percentage is always the 99th percentile and only the SLO value will be used for horizontal autoscaling.
For beta, we will use both fields and to add the ability to specify more complicated types (e.g., incremental metrics).

Our previous design encodes both the percentage and value into a single field, i.e., the SLO field in the API object is simply used to represent the 99th percentile latency. Then the recommender compared the measured 99th percentile latency with the SLO value. This old design is deprecated because of the non-flexibility of adjusting the percentage and SLO value separately.

Please review the amended proposal and let us know if it clarifies your question. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangchen615 @James-QiuHaoran

I still think the proposal doesn't work. Now there are 2 fields:

  • applicationSLO and
  • percentageSLO

This allows us to specify that we want value of 100 or less 99% of the time.

It still doesn't allow us to specify what metric we're targeting. (Which metric should have value 100 or less 99% of the time?)

Maybe it's better to do like @pbetkier suggested and put those parameters in recommender-specific place instead of putting them in the general API?

In addition to what Piotr pointed out (OS recommender is not going to use those fields so it's a bit weird to have them) it looks like this part of the API is not fully ready yet.


### Test Plan

<!--
Expand Down