Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos DB: Bulk operations can go over the 2Mb limit #23923

Closed
1 task done
ealsur opened this issue Nov 18, 2022 · 0 comments
Closed
1 task done

Cosmos DB: Bulk operations can go over the 2Mb limit #23923

ealsur opened this issue Nov 18, 2022 · 0 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Cosmos

Comments

@ealsur
Copy link
Member

ealsur commented Nov 18, 2022

  • Package Name: @azure/cosmos
  • Package Version: All
  • Operating system:
  • nodejs

Describe the bug
Cosmos DB has a limit in the request size to 2Mb. When Bulk is used, the SDK applies hashing to distribute documents across partitions/batches and then sends them:

public async bulk(
operations: OperationInput[],
bulkOptions?: BulkOptions,
options?: RequestOptions
): Promise<OperationResponse[]> {
const { resources: partitionKeyRanges } = await this.container
.readPartitionKeyRanges()
.fetchAll();
const { resource: definition } = await this.container.getPartitionKeyDefinition();
const batches: Batch[] = partitionKeyRanges.map((keyRange: PartitionKeyRange) => {
return {
min: keyRange.minInclusive,
max: keyRange.maxExclusive,
rangeId: keyRange.id,
indexes: [],
operations: [],
};
});
operations
.map((operation) => decorateOperation(operation, definition, options))
.forEach((operation: Operation, index: number) => {
const partitionProp = definition.paths[0].replace("/", "");
const isV2 = definition.version && definition.version === 2;
const toHashKey = getPartitionKeyToHash(operation, partitionProp);
const hashed = isV2 ? hashV2PartitionKey(toHashKey) : hashV1PartitionKey(toHashKey);
const batchForKey = batches.find((batch: Batch) => {
return isKeyInRange(batch.min, batch.max, hashed);
});
batchForKey.operations.push(operation);
batchForKey.indexes.push(index);
});
const path = getPathFromLink(this.container.url, ResourceType.item);
const orderedResponses: OperationResponse[] = [];
await Promise.all(
batches
.filter((batch: Batch) => batch.operations.length)
.map(async (batch: Batch) => {
if (batch.operations.length > 100) {
throw new Error("Cannot run bulk request with more than 100 operations per partition");
}
try {
const response = await this.clientContext.bulk({
body: batch.operations,
partitionKeyRangeId: batch.rangeId,
path,
resourceId: this.container.url,
bulkOptions,
options,
});
response.result.forEach((operationResponse: OperationResponse, index: number) => {
orderedResponses[batch.indexes[index]] = operationResponse;
});
} catch (err) {
// In the case of 410 errors, we need to recompute the partition key ranges
// and redo the batch request, however, 410 errors occur for unsupported
// partition key types as well since we don't support them, so for now we throw
if (err.code === 410) {
throw new Error(
"Partition key error. Either the partitions have split or an operation has an unsupported partitionKey type"
);
}
throw new Error(`Bulk request errored with: ${err.message}`);
}
})
);
return orderedResponses;
}

For each batch (physical partition) it will send 1 request with all the operations.

The problem is that the volume of operations for a single partition can exceed the 2Mb, there is no limiting or filtering happening.

In other Cosmos DB SDKs with Bulk, there is a size limiting factor being applied:

To Reproduce

  1. Send through Bulk a volume of operations (less than 100) that surpasses 2Mb in total size.

Expected behavior
The SDK would split the operations into requests less than 2Mb size

@ghost ghost added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 18, 2022
@azure-sdk azure-sdk added Client This issue points to a problem in the data-plane of the library. Cosmos needs-team-triage Workflow: This issue needs the team to triage. labels Nov 18, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 18, 2022
@xirzec xirzec removed the needs-team-triage Workflow: This issue needs the team to triage. label Nov 28, 2022
sajeetharan pushed a commit that referenced this issue Feb 10, 2023
…ize. (#23987)

### Packages impacted by this PR
@azure/cosmos

### Issues associated with this PR
#23923

### Describe the problem that is addressed by this PR
CosmosDB Items.bulk api doens't honour 2Mb cap imposed on a single batch
request. With these changes if size of a batch (cumulative size of it's
operations) exceeds 2Mb it is split into smaller batches before sending.

### What are the possible designs available to address the problem? If
there are more than one possible design, why was the one in this PR
chosen?


### Are there test cases added in this PR? _(If not, why?)_
Yes

### Provide a list of related PRs _(if any)_


### Command used to generate this PR:**_(Applicable only to SDK release
request PRs)_

### Checklists
- [ ] Added impacted package name to the issue description
- [ ] Does this PR needs any fixes in the SDK Generator?** _(If so,
create an Issue in the
[Autorest/typescript](https://github.com/Azure/autorest.typescript)
repository and link it here)_
- [ ] Added a changelog (if necessary)

---------

Co-authored-by: FAREAST\vikassingh <vikassingh@microsoft.com>
@v1k1 v1k1 closed this as completed Feb 10, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jul 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Cosmos
Projects
Archived in project
Development

No branches or pull requests

6 participants