Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cosmos] CosmosDB asynchronous client #21404

Merged
merged 61 commits into from
Dec 7, 2021
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
61ba8d1
initial commit
simorenoh Aug 13, 2021
15dcceb
Client Constructor (#20310)
annatisch Aug 20, 2021
bda95c3
read database
simorenoh Aug 27, 2021
c9648ab
Update simon_testfile.py
simorenoh Aug 27, 2021
80540dc
with coroutine
simorenoh Aug 30, 2021
1285438
Update simon_testfile.py
simorenoh Aug 30, 2021
992b0cd
small changes
simorenoh Aug 31, 2021
47cb688
async with returns no exceptions
simorenoh Aug 31, 2021
f3fa79f
Merge pull request #1 from Azure/simonmoreno/async
simorenoh Aug 31, 2021
0c49739
async read container
simorenoh Sep 1, 2021
47f4af5
async item read
simorenoh Sep 2, 2021
c97c946
cleaning up
simorenoh Sep 3, 2021
fcd95db
create item/ database methods
simorenoh Sep 13, 2021
36c5b90
item delete working
simorenoh Sep 13, 2021
44db2a2
docs replace functionality
simorenoh Sep 16, 2021
ec5b6ed
upsert functionality
simorenoh Sep 17, 2021
d63d052
Merge pull request #2 from simorenoh/item-read
simorenoh Oct 8, 2021
5d74c8f
missing query methods
simorenoh Oct 11, 2021
89fc2f7
CRUD for udf, sproc, triggers
simorenoh Oct 12, 2021
fdaa880
Merge branch 'Azure:main' into async-client
simorenoh Oct 12, 2021
3f9baf2
Merge branch 'Azure:main' into async-client
simorenoh Oct 12, 2021
d6650bc
Merge branch 'Azure:main' into query-functionality
simorenoh Oct 12, 2021
043dfe0
initial query logic + container methods
simorenoh Oct 13, 2021
befdb41
Merge branch 'async-client' into query-functionality
simorenoh Oct 13, 2021
8cffbe2
Merge pull request #3 from simorenoh/query-functionality
simorenoh Oct 13, 2021
72de7c8
missing some execution logic and tests
simorenoh Oct 21, 2021
5b805b8
oops
simorenoh Oct 21, 2021
8d8d0c4
fully working queries
simorenoh Oct 22, 2021
b597ca8
small fix to query_items()
simorenoh Oct 22, 2021
18319df
Update _cosmos_client_connection_async.py
simorenoh Oct 22, 2021
162c44d
Update _cosmos_client_connection.py
simorenoh Oct 22, 2021
ebbac51
documentation update
simorenoh Oct 22, 2021
43f78e6
Merge branch 'Azure:main' into main
simorenoh Oct 22, 2021
470aa5b
updated MIT dates and get_user_client() description
simorenoh Oct 22, 2021
74da690
Update CHANGELOG.md
simorenoh Oct 22, 2021
7104d63
Merge branch 'Azure:main' into main
simorenoh Oct 25, 2021
20718c7
Delete simon_testfile.py
simorenoh Oct 25, 2021
d825eaa
Merge pull request #4 from simorenoh/async-client
simorenoh Oct 25, 2021
e3c27a5
leftover retry utility
simorenoh Oct 25, 2021
3b778ad
Update README.md
simorenoh Oct 25, 2021
c6e352e
docs and removed six package
simorenoh Oct 28, 2021
8971a25
Merge remote-tracking branch 'upstream/main'
simorenoh Oct 28, 2021
52736ac
changes based on comments
simorenoh Nov 4, 2021
ad98039
small change in type hints
simorenoh Nov 4, 2021
f76c595
updated readme
simorenoh Nov 9, 2021
3f02a65
fixes based on conversations
simorenoh Nov 10, 2021
e719869
added missing type comments
simorenoh Nov 11, 2021
d03ee05
Merge branch 'Azure:main' into main
simorenoh Nov 11, 2021
02c52ee
update changelog for ci pipeline
simorenoh Nov 23, 2021
2cb4551
added typehints, moved params into keywords, added decorators, made _…
simorenoh Nov 29, 2021
cf20d35
changes based on sync with central sdk
simorenoh Dec 2, 2021
f456817
remove is_system_key from scripts (only used in execute_sproc)
simorenoh Dec 3, 2021
ea9bd16
Revert "remove is_system_key from scripts (only used in execute_sproc)"
simorenoh Dec 3, 2021
709d2eb
async script proxy using composition
simorenoh Dec 3, 2021
3277dd8
pylint
simorenoh Dec 3, 2021
a57cb4d
capitalized constants
simorenoh Dec 6, 2021
014578b
Apply suggestions from code review
simorenoh Dec 6, 2021
0d79695
closing python code snippet
simorenoh Dec 6, 2021
fdabea1
last doc updates
simorenoh Dec 7, 2021
016d0dd
Update sdk/cosmos/azure-cosmos/CHANGELOG.md
tjprescott Dec 7, 2021
8228aa9
version update
simorenoh Dec 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
## 4.2.1 (Unreleased)
## 4.3.0 (Unreleased)
**New features**
- Added language native async i/o client

## 4.2.1 (Unreleased)
tjprescott marked this conversation as resolved.
Show resolved Hide resolved
tjprescott marked this conversation as resolved.
Show resolved Hide resolved

## 4.2.0 (2020-10-08)

Expand Down
83 changes: 82 additions & 1 deletion sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,14 +90,17 @@ For more information about these resources, see [Working with Azure Cosmos datab

The keyword-argument `enable_cross_partition_query` accepts 2 options: `None` (default) or `True`.

## Note on using queries by id

When using queries that try to find items based on an **id** value, always make sure you are passing in a string type variable. Azure Cosmos DB only allows string id values and if you use any other datatype, this SDK will return no results and no error messages.

## Limitations

Currently the features below are **not supported**. For alternatives options, check the **Workarounds** section below.

### Data Plane Limitations:

* Group By queries
* Language Native async i/o
* Queries with COUNT from a DISTINCT subquery: SELECT COUNT (1) FROM (SELECT DISTINCT C.ID FROM C)
* Bulk/Transactional batch processing
* Direct TCP Mode access
Expand Down Expand Up @@ -177,6 +180,7 @@ The following sections provide several code snippets covering some of the most c
* [Get database properties](#get-database-properties "Get database properties")
* [Get database and container throughputs](#get-database-and-container-throughputs "Get database and container throughputs")
* [Modify container properties](#modify-container-properties "Modify container properties")
* [Using the asynchronous client](#using-the-asynchronous-client "Using the asynchronous client")

### Create a database

Expand Down Expand Up @@ -428,6 +432,83 @@ print(json.dumps(container_props['defaultTtl']))

For more information on TTL, see [Time to Live for Azure Cosmos DB data][cosmos_ttl].

### Using the asynchronous client

The asynchronous cosmos client looks and works in a very similar fashion to the already existing client, with the exception of its package within the sdk and the need of using async/await keywords in order to interact with it.
simorenoh marked this conversation as resolved.
Show resolved Hide resolved

```Python
from azure.cosmos.aio import CosmosClient
import os

url = os.environ['ACCOUNT_URI']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants such as these should be in all caps: https://www.python.org/dev/peps/pep-0008/#constants

key = os.environ['ACCOUNT_KEY']
client = CosmosClient(url, credential=key)
database_name = 'testDatabase'
database = client.get_database_client(database_name)
container_name = 'products'
container = database.get_container_client(container_name)

async def create_items():
for i in range(1, 10):
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
await container.upsert_item({
'id': 'item{0}'.format(i),
'productName': 'Widget',
'productModel': 'Model {0}'.format(i)
}
)
await client.close()
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
```

It is also worth pointing out that the asynchronous client has to be closed manually after its use, either by initializing it using async with or calling the close() method directly like shown above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simorenoh

closed manually after its use
Does it need to be closed after every function it's used in (like above), or can the client be passed around safely when doing things async?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be passed around safely, but needs to be manually disposed of once you're done using it - the alternative to this being to use the client using a with statement:
async with CosmosClient(url, key) as client: and then starting your cosmos db logic within that context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simorenoh What happens if these async clients are not closed properly? Memory leak or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to test, but I assume that is the case - you also get an error in your code stating the client was not properly disposed of


```Python
from azure.cosmos.aio import CosmosClient
import os

url = os.environ['ACCOUNT_URI']
key = os.environ['ACCOUNT_KEY']
database_name = 'testDatabase'
container_name = 'products'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. I was a bit confused when I saw these being referred in the function below without realizing they are constants.


async with CosmosClient(url, credential=key) as client:
database = client.get_database_client(database_name)
container = database.get_container_client(container_name)
for i in range(1, 10):
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
await container.upsert_item({
'id': 'item{0}'.format(i),
'productName': 'Widget',
'productModel': 'Model {0}'.format(i)
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to close the client here too?

await client.close()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is that second way to initialize I had mentioned above - the way it is done here, it gets disposed once the with statement gets exited

```

### Queries with the asynchronous client

Queries work the same way for the most part, with one exception being the absence of the `enable_cross_partition` flag in the request; queries without a specified partition key value will now by default atempt to do a cross partition query. Results can be directly iterated on, but because queries made by the asynchronous client return AsyncIterable objects, results can't be cast into lists directly; instead, if you need to create lists from your results, use Python's list comprehension to populate a list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Queries work the same way for the most part, with one exception being the absence of the `enable_cross_partition` flag in the request; queries without a specified partition key value will now by default atempt to do a cross partition query. Results can be directly iterated on, but because queries made by the asynchronous client return AsyncIterable objects, results can't be cast into lists directly; instead, if you need to create lists from your results, use Python's list comprehension to populate a list:
Queries work the same way for the most part, with one exception being the absence of the `enable_cross_partition` flag in the request; queries without a specified partition key value will now by default attempt to do a cross partition query. Results can be directly iterated on, but because queries made by the asynchronous client return AsyncIterable objects, results can't be cast into lists directly; instead, if you need to create lists from your results, use Python's list comprehension to populate a list:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this the reason that we don't have an await when querying? Should we make that more explicit?
  2. Why do we need the word "async" again in the function
  3. IIRC point queries operate differently. Should we add an example and explain the differentiation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah this is that point I had brought up with you before, I feel like I tried to explain it but I'm not sure it's completely understandable... let's talk on this to reach a conclusion.
  2. We need the word async in the function definition not because of the query, but because we do an async for loop right after - any function that has either await or async within has to be defined as async by python
  3. This is the case even for the normal sync client, and I believe should be documented in the actual Cosmos docs (not just SDK), found this for instance: https://devblogs.microsoft.com/cosmosdb/point-reads-versus-queries/


```Python
from azure.cosmos.aio import CosmosClient
import os

url = os.environ['ACCOUNT_URI']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on the constants

key = os.environ['ACCOUNT_KEY']
client = CosmosClient(url, credential=key)
database_name = 'testDatabase'
database = client.get_database_client(database_name)
container_name = 'products'
container = database.get_container_client(container_name)

async def create_lists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to close the client in this example too? I'm not seeing "with" being used either.

results = container.query_items(
query='SELECT * FROM products p WHERE p.productModel = "Model 2"')

# Iterating directly on results
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
async for item in results:
print(item)
simorenoh marked this conversation as resolved.
Show resolved Hide resolved

# Making a list from the results
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
item_list = [item async for item in results]
```
simorenoh marked this conversation as resolved.
Show resolved Hide resolved

## Troubleshooting

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# The MIT License (MIT)
# Copyright (c) 2021 Microsoft Corporation

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# The MIT License (MIT)
# Copyright (c) 2021 Microsoft Corporation

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

"""Internal class for query execution context implementation in the Azure Cosmos
database service.
"""

from collections import deque
import copy

from ...aio import _retry_utility_async
from ... import http_constants

# pylint: disable=protected-access


class _QueryExecutionContextBase(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we are forking the implementation code and copying from the non async io code path, we need to have a plan on maintenance.

now if there is any bug let's say in the query stack we need to fix this into different codebase.

multiple options to consider for future:

  1. make the sync api a shim layer on top of async API, rather than having two separate implementations
  2. deprecated the sync API

@kushagraThapar

"""
This is the abstract base execution context class.
"""

def __init__(self, client, options):
"""
:param CosmosClient client:
:param dict options: The request options for the request.
"""
self._client = client
self._options = options
self._is_change_feed = "changeFeed" in options and options["changeFeed"] is True
self._continuation = self._get_initial_continuation()
self._has_started = False
self._has_finished = False
self._buffer = deque()

def _get_initial_continuation(self):
if "continuation" in self._options:
if "enableCrossPartitionQuery" in self._options:
raise ValueError("continuation tokens are not supported for cross-partition queries.")
return self._options["continuation"]
return None

def _has_more_pages(self):
return not self._has_started or self._continuation

async def fetch_next_block(self):
"""Returns a block of results with respecting retry policy.

This method only exists for backward compatibility reasons. (Because
QueryIterable has exposed fetch_next_block api).

:return: List of results.
:rtype: list
"""
if not self._has_more_pages():
return []

if self._buffer:
# if there is anything in the buffer returns that
res = list(self._buffer)
self._buffer.clear()
return res

# fetches the next block
return await self._fetch_next_block()

async def _fetch_next_block(self):
raise NotImplementedError

async def __aiter__(self):
"""Returns itself as an iterator"""
return self

async def __anext__(self):
"""Return the next query result.

:return: The next query result.
:rtype: dict
:raises StopAsyncIteration: If no more result is left.
"""
if self._has_finished:
raise StopAsyncIteration

if not self._buffer:

results = await self.fetch_next_block()
self._buffer.extend(results)

if not self._buffer:
raise StopAsyncIteration

return self._buffer.popleft()

async def _fetch_items_helper_no_retries(self, fetch_function):
"""Fetches more items and doesn't retry on failure

:return: List of fetched items.
:rtype: list
"""
fetched_items = []
# Continues pages till finds a non empty page or all results are exhausted
while self._continuation or not self._has_started:
if not self._has_started:
self._has_started = True
new_options = copy.deepcopy(self._options)
new_options["continuation"] = self._continuation
(fetched_items, response_headers) = await fetch_function(new_options)
continuation_key = http_constants.HttpHeaders.Continuation
# Use Etag as continuation token for change feed queries.
if self._is_change_feed:
continuation_key = http_constants.HttpHeaders.ETag
# In change feed queries, the continuation token is always populated. The hasNext() test is whether
# there is any items in the response or not.
if not self._is_change_feed or fetched_items:
self._continuation = response_headers.get(continuation_key)
else:
self._continuation = None
if fetched_items:
break
return fetched_items

async def _fetch_items_helper_with_retries(self, fetch_function):
async def callback():
return await self._fetch_items_helper_no_retries(fetch_function)

return await _retry_utility_async.ExecuteAsync(self._client, self._client._global_endpoint_manager, callback)


class _DefaultQueryExecutionContext(_QueryExecutionContextBase):
"""
This is the default execution context.
"""

def __init__(self, client, options, fetch_function):
"""
:param CosmosClient client:
:param dict options: The request options for the request.
:param method fetch_function:
Will be invoked for retrieving each page

Example of `fetch_function`:

>>> def result_fn(result):
>>> return result['Databases']

"""
super(_DefaultQueryExecutionContext, self).__init__(client, options)
self._fetch_function = fetch_function

async def _fetch_next_block(self):
while super(_DefaultQueryExecutionContext, self)._has_more_pages() and not self._buffer:
return await self._fetch_items_helper_with_retries(self._fetch_function)
Loading