Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThroughputControl-Global #19183

Merged
merged 73 commits into from
Mar 2, 2021
Merged

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Feb 11, 2021

This is the second PR of the throughput control.
Compared to local control mode, the throughput control group is shared among different clients.

TODO:

  1. CTL run

Benchmark tests results:
image

image

image

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

private final RxPartitionKeyRangeCache partitionKeyRangeCache;

private final LinkedCancellationTokenSource cancellationTokenSource;
private final ConcurrentHashMap<String, LinkedCancellationToken> cancellationTokenMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, on the thread-safely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will discuss offline for this one.

if (request.requestContext != null) {
BridgeInternal.setResourceAddress(requestRateTooLargeException, request.requestContext.resourcePhysicalAddress);
try {
this.throughputReadLock.lock();
Copy link
Contributor

@moderakh moderakh Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here mixing the readLock pattern with reactor Mono is interesting, and I am not sure if it does what you intended to do. thinking ...

Think about this,

  1. let's say you acquire the read lock this.throughputReadLock.lock() and then we check if (this.availableThroughput.get() > 0)
  2. then we are returning aMono from the processRequest(.) method. which means as we go out of the scope of this method the finally block and this.throughputReadLock.unlock(); will be invoked. so now the lock is released.
  3. now trackRequestCharge and trackRequestCharge won't be invoked till we go out of the scope of processRequest and someone subscribes to the returned Mono.

I checked the trackRequestCharge and trackRequestCharge implementation I noticed that you are re-accquiring the read lock.

Now please consider this scenario: does this cause any problem?

  1. processRequest acquires the lock this.throughputReadLock.lock()and read the value ofavailableThroughput`
  2. processRequest returns a Mono and release the this.throughputReadLock.lock() lock in the finally block
  3. now let's say some other thread acquires the write lock and updates the throughputs
  4. someone subsctibes to the Mono returned in step 2, and now trackRequestCharge and trackRequestCharge will be invoked, they will attempt to accquire readLock but because of the step 3. in between the throughput values they see is not the same value that the if branch of processRequest observed.

Does this cause any problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moderakh @FabianMeiswinkel @kushagraThapar

I think this is one of the consequences of we only track the RU usage based on the response compared to pre-calculation.

The consequences of it :
a. could be some requests actually happened in the previous cycle their response will be tracked in the next cycle and eat some throughput from the next cycle.
b. The throughput usage for previous cycle is lower than the real usage.

It can be improved if we do some operation RU predication later on.
Or we can do some stuff like: for each operation, it can be attached to a certain cycle version. and we only track the response if it matches to the current cycle version. Maybe this is a better way?

The reason I introduced the read write lock is mainly to keep ScheduledThroughput and availableThroughput to be in sync, since if we are going to return 429, then we depends on those two values to calculate the backoff time.

Any throughputs for this one?

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @xinlian12
Looks great. This is done very well.

I added a few comments around thread-safety, your usage of locks, and also I think you need to more frequently update the document associated with each client (as discussed at least every ttl -1 / 2 to ensure your document doesn't get ttl due to slight shift in task scheduling, etc)
other than these looks great.


public void cancel() {
if (this.cancellationRequested.compareAndSet(false, true)) {
for (LinkedCancellationTokenSource childTokenSource : this.childTokenSourceList) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non thread-safe access to the this.childTokenSourceList

}
})
.doOnSuccess(requestController -> {
if (requestController != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per reactor-stream spec, null values are never allowed. hence requestController always will be non null. you don't need to check.

per reactive-streams/reactive-streams-jvm#209
it is guaranteed you will never receive a null value here.

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12 xinlian12 merged commit 39a7a66 into Azure:master Mar 2, 2021
azure-sdk pushed a commit to azure-sdk/azure-sdk-for-java that referenced this pull request Jun 23, 2022
Mitryakh/network 2022 01 01 (Azure#19412)

* Adds base for updating Microsoft.Network from version stable/2021-08-01 to version 2022-01-01

* Updates readme

* Updates API version in new specs and examples

* Updated Explicit proxy settings by adding one boolean field to it (Azure#19011)

* API for provider port (Azure#19041)

* Update readme.md

* Create expressRouteProviderPort.json

* Create expressRouteProviderPortList.json

* Create expressRouteProviderPort.json

* Update custom-words.txt

* Update expressRouteProviderPort.json

* Update expressRouteProviderPortList.json

* Update expressRouteProviderPort.json

* Add WAF match variable operators (Azure#18925)

### webapplicationfirewall.json
* Add GreaterThanOrEquals operator and Any operator to custom rule
  match conditions in WAF policy spec

* Add VirtualHub Router autoscale configuration (Azure#19131)

Co-authored-by: Andrii Kalinichenko <ankalini@microsoft.com>

* Adding rule priority to Tls Proxy routing rule object model (Azure#19135)

Co-authored-by: Vinay Mundada <vimundad@microsoft.com>

* swagger changes for new ssl policies (Azure#19183)

* Update Swagger Spec for VMSS Packet Capture (Azure#19202)

* Update Swagger Spec for VMSS Packet Capture

* Remove extra line

* Update Swagger spec for Connection Monitor VMSS (Azure#19203)

* Adding new endpoint in ConnectionMonitor

* Changing ConnectionMonitor endpoints order

* Add flushConnection to NSG (Azure#19085)

* Merge NetworkManger into 2022-01-01 (Azure#19169)

* Merge NetworkManger into 2022-01-01

* Remove EffectiveVnet APIs

* Remove SecurityUser Resource

* update readme

* Fix as comments

* fix as comments

* remove network group type

* Add new parameter noInternetAdvertise to CustomIPPrefix (Azure#19340)

* fix

* fix

Co-authored-by: Weiheng Li <weihl@microsoft.com>

* Route Server Integration feature swagger changes (Azure#19215)

* Route Server Integration feature swagger changes

* prettier run changes

* updating api version in examples file

* fixing test errors

* fixing test errors

* fixing modelvalidation errors

* fixing test errors

* fixing modelvalidation errors

* changes based on review comments

* fixing lintdiff failure

* updating examples

* update wrong enum value for customipprefix (Azure#19382)

* fix

* fix

* fix

Co-authored-by: Weiheng Li <weihl@microsoft.com>

* Updated ExplicitProxySettings to ExplicitProxy on Firewall Policy ver2022-01-01 (Azure#19299)

Co-authored-by: Gizachew Eshetie <v-geshetie@microsoft.com>

* Add resource type (Azure#19434)

Co-authored-by: Andrii Kalinichenko <ankalini@microsoft.com>

* Fix prettier errors (Azure#19462)

Co-authored-by: Andrii Kalinichenko <ankalini@microsoft.com>

Co-authored-by: uditmisra52 <103006702+uditmisra52@users.noreply.github.com>
Co-authored-by: jashsing-mic <79445297+jashsing-mic@users.noreply.github.com>
Co-authored-by: Anurag Kishore <kishore.1337.anurag@gmail.com>
Co-authored-by: AndriiKalinichenko <kalinichenkoandrew@gmail.com>
Co-authored-by: Andrii Kalinichenko <ankalini@microsoft.com>
Co-authored-by: Vinay Jayant Mundada <vinaymundada27@gmail.com>
Co-authored-by: Vinay Mundada <vimundad@microsoft.com>
Co-authored-by: kaushik-ms <103559254+kaushik-ms@users.noreply.github.com>
Co-authored-by: snagpal99 <95475229+snagpal99@users.noreply.github.com>
Co-authored-by: kumaam <102141910+kumaam@users.noreply.github.com>
Co-authored-by: Satya-anshu <70507845+Satya-anshu@users.noreply.github.com>
Co-authored-by: yanfa317 <53584318+yanfa317@users.noreply.github.com>
Co-authored-by: Weiheng Li <weihengli.tj@gmail.com>
Co-authored-by: Weiheng Li <weihl@microsoft.com>
Co-authored-by: Anchal Kapoor <ankapoo@microsoft.com>
Co-authored-by: Gizachew-Eshetie <gizchanie@gmail.com>
Co-authored-by: Gizachew Eshetie <v-geshetie@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants