Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Creation of azure-health-deidentification Dataplane SDK #36041

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3c99c2e
Initial commit of Health.Deidentification dataplane
Jun 11, 2024
11a7faf
Use MI instead of SAS
Jul 2, 2024
a4aa310
Regenerates with Plaintext
Jul 3, 2024
c0e83ad
Adds rest of tests
Jul 3, 2024
0247b2f
First attempt patch
Jul 3, 2024
c2ee2cb
Patch Attempt #2
Jul 3, 2024
38279e9
Patch Attempt #3
Jul 3, 2024
89a9779
Creates base recordings
Jul 5, 2024
93d7193
Fixes sanitizers; Test replay functioning
Jul 5, 2024
6c63f22
Creates all sync samples
Jul 8, 2024
42c7dda
Creates all async samples
Jul 8, 2024
efc6d9f
Adds description in readme
Jul 8, 2024
5595bc0
Adds tsplocation
Jul 17, 2024
9668d4c
Merge branch 'main' into release/healthdataaiservices/health-deidenti…
Jul 17, 2024
76e4b6e
Checkpoint
Jul 19, 2024
fbc861e
Merge branch 'main' into release/healthdataaiservices/health-deidenti…
Jul 19, 2024
0a8104e
Executes test recording migration
Jul 22, 2024
8268af5
Adds pipeline yamls
Jul 22, 2024
1f7be14
Updates ci.yml triggers
Jul 22, 2024
be2cba7
Removes ArtifactName from ci.yaml
Jul 22, 2024
37b1826
Fixes analysis failures
Jul 22, 2024
18750d4
Fixes analysis failures 2
Jul 22, 2024
77a1b08
Merge branch 'main' into release/healthdataaiservices/health-deidenti…
Jul 22, 2024
8b8a13a
Update sdk/healthdataaiservices/ci.yml
GrahamMThomas Jul 22, 2024
651d8f6
Updates test.yml
Jul 22, 2024
d68afd9
Merge branch 'release/healthdataaiservices/health-deidentification' o…
Jul 22, 2024
0878258
Uniquifier default to false for pipelines
Jul 23, 2024
52fa4e0
Updates from feedback
Jul 23, 2024
331f964
Updates from feedback 2
Jul 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,10 @@
# PRLabel: %Cognitive - Text Analytics
/sdk/textanalytics/ @quentinRobinson @wangyuantao

# ServiceLabel: %Health Deidentification
# PRLabel: %Health Deidentification
/sdk/healthdataaiservices/ @GrahamMThomas @danielszaniszlo

# AzureSdkOwners: @YalinLi0312
# ServiceLabel: %Cognitive - Form Recognizer
# ServiceOwners: @bojunehsu @vkurpad
Expand Down
14 changes: 14 additions & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@
"guids",
"hanaonazure",
"hdinsight",
"healthdataaiservices",
"heapq",
"hexlify",
"himds",
Expand Down Expand Up @@ -402,6 +403,7 @@
"unpad",
"unpadder",
"unpartial",
"uniquifier",
"unredacted",
"unseekable",
"unsubscriptable",
Expand Down Expand Up @@ -440,6 +442,7 @@
"BUILDID",
"documentdb",
"chdir",
"radiculopathy",
"reqs",
"rgpy",
"swaggertosdk",
Expand Down Expand Up @@ -1840,6 +1843,17 @@
"words": [
"dcid"
]
},
{
"filename": "sdk/healthdataaiservices/azure-health-deidentification/**",
"words": [
"deid",
"deidservices",
"deidentification",
"healthdataaiservices",
"deidentify",
"deidentified"
]
}
],
"allowCompoundWords": true
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Release History

## 1.0.0b1 (1970-01-01)

- Initial version

### Features Added

- Initial Code
21 changes: 21 additions & 0 deletions sdk/healthdataaiservices/azure-health-deidentification/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Copyright (c) Microsoft Corporation.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include *.md
include LICENSE
include azure/health/deidentification/py.typed
recursive-include tests *.py
recursive-include samples *.py *.md
include azure/__init__.py
include azure/health/__init__.py
108 changes: 108 additions & 0 deletions sdk/healthdataaiservices/azure-health-deidentification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@


# Azure Health Deidentification client library for Python
Azure.Health.Deidentification is a managed service that enables users to tag, redact, or surrogate health data.

## Getting started

### Install the package

```bash
python -m pip install azure-health-deidentification
```

#### Prequisites

- Python 3.8 or later is required to use this package.
- You need an [Azure subscription][azure_sub] to use this package.
- An existing Azure Health Deidentification instance.
#### Create with an Azure Active Directory Credential
To use an [Azure Active Directory (AAD) token credential][authenticate_with_token],
provide an instance of the desired credential type obtained from the
[azure-identity][azure_identity_credentials] library.

To authenticate with AAD, you must first [pip][pip] install [`azure-identity`][azure_identity_pip]

After setup, you can choose which type of [credential][azure_identity_credentials] from azure.identity to use.
As an example, [DefaultAzureCredential][default_azure_credential] can be used to authenticate the client:

Set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables:
`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`

Use the returned token credential to authenticate the client:

```python
>>> from azure.health.deidentification import DeidentificationClient
>>> from azure.identity import DefaultAzureCredential
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
```

## Key concepts

**Operation Modes**
- Tag: Will return a structure of offset and length with the PHI category of the related text spans.
- Redact: Will return output text with placeholder stubbed text. ex. `[name]`
- Surrogate: Will return output text with synthetic replacements.
- `My name is John Smith`
- `My name is Tom Jones`

**Job Integration with Azure Storage**
Instead of sending text, you can send an Azure Storage Location to the service. We will asynchronously
process the list of files and output the deidentified files to a location of your choice.

Limitations:
- Maximum file count per job: 1000 documents
- Maximum file size per file: 2 MB

## Examples

```python
>>> from azure.health.deidentification import DeidentificationClient
>>> from azure.identity import DefaultAzureCredential
>>> from azure.core.exceptions import HttpResponseError

>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
>>> try:
<!-- write test code here -->
except HttpResponseError as e:
print('service responds error: {}'.format(e.response.json()))

```

## Next steps

- Find a bug, or have feedback? Raise an issue with "Health Deidentification" Label.


## Troubleshooting

- **Unabled to Access Source or Target Storage**
- Ensure you create your deid service with a system assigned managed identity
- Ensure your storage account has given permissions to that managed identity

## Contributing

This project welcomes contributions and suggestions. Most contributions require
you to agree to a Contributor License Agreement (CLA) declaring that you have
the right to, and actually do, grant us the rights to use your contribution.
For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether
you need to provide a CLA and decorate the PR appropriately (e.g., label,
comment). Simply follow the instructions provided by the bot. You will only
need to do this once across all repos using our CLA.

This project has adopted the
[Microsoft Open Source Code of Conduct][code_of_conduct]. For more information,
see the Code of Conduct FAQ or contact opencode@microsoft.com with any
additional questions or comments.

<!-- LINKS -->
[code_of_conduct]: https://opensource.microsoft.com/codeofconduct/
[authenticate_with_token]: https://docs.microsoft.com/azure/cognitive-services/authentication?tabs=powershell#authenticate-with-an-authentication-token
[azure_identity_credentials]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#credentials
[azure_identity_pip]: https://pypi.org/project/azure-identity/
[default_azure_credential]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity#defaultazurecredential
[pip]: https://pypi.org/project/pip/
[azure_sub]: https://azure.microsoft.com/free/

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/healthdataaiservices/azure-health-deidentification",
"Tag": "python/healthdataaiservices/azure-health-deidentification_a8eed6d322"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# coding=utf-8
# --------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# Code generated by Microsoft (R) Python Code Generator.
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

from ._client import DeidentificationClient
from ._version import VERSION

__version__ = VERSION

try:
from ._patch import __all__ as _patch_all
from ._patch import * # pylint: disable=unused-wildcard-import
except ImportError:
_patch_all = []
from ._patch import patch_sdk as _patch_sdk

__all__ = [
"DeidentificationClient",
]
__all__.extend([p for p in _patch_all if p not in __all__])

_patch_sdk()
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# coding=utf-8
# --------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# Code generated by Microsoft (R) Python Code Generator.
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

from copy import deepcopy
from typing import Any, TYPE_CHECKING
from typing_extensions import Self

from azure.core import PipelineClient
from azure.core.pipeline import policies
from azure.core.rest import HttpRequest, HttpResponse

from ._configuration import DeidentificationClientConfiguration
from ._operations import DeidentificationClientOperationsMixin
from ._serialization import Deserializer, Serializer

if TYPE_CHECKING:
# pylint: disable=unused-import,ungrouped-imports
from azure.core.credentials import TokenCredential


class DeidentificationClient(
DeidentificationClientOperationsMixin
): # pylint: disable=client-accepts-api-version-keyword
"""DeidentificationClient.

:param endpoint: Url of your De-identification Service. Required.
:type endpoint: str
:param credential: Credential used to authenticate requests to the service. Required.
:type credential: ~azure.core.credentials.TokenCredential
:keyword api_version: The API version to use for this operation. Default value is
"2024-07-12-preview". Note that overriding this default value may result in unsupported
behavior.
:paramtype api_version: str
:keyword int polling_interval: Default waiting time between two polls for LRO operations if no
Retry-After header is present.
"""

def __init__(self, endpoint: str, credential: "TokenCredential", **kwargs: Any) -> None:
_endpoint = "https://{endpoint}"
self._config = DeidentificationClientConfiguration(endpoint=endpoint, credential=credential, **kwargs)
_policies = kwargs.pop("policies", None)
if _policies is None:
_policies = [
policies.RequestIdPolicy(**kwargs),
self._config.headers_policy,
self._config.user_agent_policy,
self._config.proxy_policy,
policies.ContentDecodePolicy(**kwargs),
self._config.redirect_policy,
self._config.retry_policy,
self._config.authentication_policy,
self._config.custom_hook_policy,
self._config.logging_policy,
policies.DistributedTracingPolicy(**kwargs),
policies.SensitiveHeaderCleanupPolicy(**kwargs) if self._config.redirect_policy else None,
self._config.http_logging_policy,
]
self._client: PipelineClient = PipelineClient(base_url=_endpoint, policies=_policies, **kwargs)

self._serialize = Serializer()
self._deserialize = Deserializer()
self._serialize.client_side_validation = False

def send_request(self, request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse:
"""Runs the network request through the client's chained policies.

>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

:param request: The network request you want to make. Required.
:type request: ~azure.core.rest.HttpRequest
:keyword bool stream: Whether the response payload will be streamed. Defaults to False.
:return: The response of your network call. Does not do error handling on your response.
:rtype: ~azure.core.rest.HttpResponse
"""

request_copy = deepcopy(request)
path_format_arguments = {
"endpoint": self._serialize.url("self._config.endpoint", self._config.endpoint, "str"),
}

request_copy.url = self._client.format_url(request_copy.url, **path_format_arguments)
return self._client.send_request(request_copy, stream=stream, **kwargs) # type: ignore

def close(self) -> None:
self._client.close()

def __enter__(self) -> Self:
self._client.__enter__()
return self

def __exit__(self, *exc_details: Any) -> None:
self._client.__exit__(*exc_details)
Loading
Loading