Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text analytics] add back PII endpoint #12673

Merged
merged 14 commits into from
Jul 30, 2020
Merged
2 changes: 2 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@

## 5.0.1 (Unreleased)

**New features**
- We are now targeting the service's v3.1-preview.1 API as the default. If you would like to still use version v3.0 of the service,
pass in `v3.0` to the kwarg `api_version` when creating your TextAnalyticsClient
- We have added an API `recognize_pii_entities` which returns entities containing personal information for a batch of documents. Only available for API version v3.1-preview.1 and up.

## 5.0.0 (2020-07-27)

Expand Down
35 changes: 35 additions & 0 deletions sdk/textanalytics/azure-ai-textanalytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Text Analytics is a cloud-based service that provides advanced natural language
* Sentiment Analysis
* Named Entity Recognition
* Linked Entity Recognition
* Personally Identifiable Information (PII) Entity Recognition
* Language Detection
* Key Phrase Extraction

Expand Down Expand Up @@ -184,6 +185,7 @@ The following section provides several code snippets covering some of the most c
* [Analyze Sentiment](#analyze-sentiment "Analyze sentiment")
* [Recognize Entities](#recognize-entities "Recognize entities")
* [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities")
* [Recognize PII Entities](#recognize-pii-entities "Recognize pii entities")
* [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
* [Detect Language](#detect-language "Detect language")

Expand Down Expand Up @@ -290,6 +292,35 @@ The returned response is a heterogeneous list of result and error objects: list[
Please refer to the service documentation for a conceptual discussion of [entity linking][linked_entity_recognition]
and [supported types][linked_entities_categories].

### Recognize PII entities
[recognize_pii_entities][recognize_pii_entities] recognizes and categorizes Personally Identifiable Information (PII) entities in its input text, such as
iscai-msft marked this conversation as resolved.
Show resolved Hide resolved
Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only available for v3.1-preview.1 and up.

```python
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient, ApiVersion

credential = AzureKeyCredential("<api_key>")
endpoint="https://<region>.api.cognitive.microsoft.com/"

text_analytics_client = TextAnalyticsClient(endpoint, credential)

documents = [
"The employee's SSN is 859-98-0987.",
"The employee's phone number is 555-555-5555."
]
response = text_analytics_client.recognize_pii_entities(documents, language="en")
result = [doc for doc in response if not doc.is_error]
for doc in result:
for entity in doc.entities:
print("Entity: \t", entity.text, "\tCategory: \t", entity.category,
"\tConfidence Score: \t", entity.confidence_score)
```

The returned response is a heterogeneous list of result and error objects: list[[RecognizePiiEntitiesResult][recognize_pii_entities_result], [DocumentError][document_error]]

Please refer to the service documentation for [supported PII entity types][pii_entity_categories].

### Extract key phrases
[extract_key_phrases][extract_key_phrases] determines the main talking points in its input text. For example, for the input text "The food was delicious and there were wonderful staff", the API returns: "food" and "wonderful staff".

Expand Down Expand Up @@ -412,6 +443,7 @@ Authenticate the client with a Cognitive Services/Text Analytics API key or a to
In a batch of documents:
* Analyze sentiment: [sample_analyze_sentiment.py][analyze_sentiment_sample] ([async version][analyze_sentiment_sample_async])
* Recognize entities: [sample_recognize_entities.py][recognize_entities_sample] ([async version][recognize_entities_sample_async])
* Recognize personally identifiable information: [sample_recognize_pii_entities.py](`https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/textanalytics/azure-ai-textanalytics/samples/sample_recognize_pii_entities.py`)([async version](`https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/textanalytics/azure-ai-textanalytics/samples/async_samples/sample_recognize_pii_entities_async.py`))
* Recognize linked entities: [sample_recognize_linked_entities.py][recognize_linked_entities_sample] ([async version][recognize_linked_entities_sample_async])
* Extract key phrases: [sample_extract_key_phrases.py][extract_key_phrases_sample] ([async version][extract_key_phrases_sample_async])
* Detect language: [sample_detect_language.py][detect_language_sample] ([async version][detect_language_sample_async])
Expand Down Expand Up @@ -458,6 +490,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
[document_error]: https://aka.ms/azsdk-python-textanalytics-documenterror
[detect_language_result]: https://aka.ms/azsdk-python-textanalytics-detectlanguageresult
[recognize_entities_result]: https://aka.ms/azsdk-python-textanalytics-recognizeentitiesresult
[recognize_pii_entities_result]: https://aka.ms/azsdk-python-textanalytics-recognizepiientitiesresult
[recognize_linked_entities_result]: https://aka.ms/azsdk-python-textanalytics-recognizelinkedentitiesresult
[analyze_sentiment_result]: https://aka.ms/azsdk-python-textanalytics-analyzesentimentresult
[extract_key_phrases_result]: https://aka.ms/azsdk-python-textanalytics-extractkeyphrasesresult
Expand All @@ -467,6 +500,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con

[analyze_sentiment]: https://aka.ms/azsdk-python-textanalytics-analyzesentiment
[recognize_entities]: https://aka.ms/azsdk-python-textanalytics-recognizeentities
[recognize_pii_entities]: https://aka.ms/azsdk-python-textanalytics-recognizepiientities
[recognize_linked_entities]: https://aka.ms/azsdk-python-textanalytics-recognizelinkedentities
[extract_key_phrases]: https://aka.ms/azsdk-python-textanalytics-extractkeyphrases
[detect_language]: https://aka.ms/azsdk-python-textanalytics-detectlanguage
Expand All @@ -477,6 +511,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
[key_phrase_extraction]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-keyword-extraction
[linked_entities_categories]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=general
[linked_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
[pii_entity_categories]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal
[named_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
[named_entity_categories]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=general

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@
LinkedEntityMatch,
TextDocumentBatchStatistics,
SentenceSentiment,
SentimentConfidenceScores
SentimentConfidenceScores,
RecognizePiiEntitiesResult,
PiiEntity
)

__all__ = [
Expand All @@ -48,7 +50,9 @@
'LinkedEntityMatch',
'TextDocumentBatchStatistics',
'SentenceSentiment',
'SentimentConfidenceScores'
'SentimentConfidenceScores',
'RecognizePiiEntitiesResult',
'PiiEntity',
]

__version__ = VERSION
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ class RecognizeEntitiesResult(DictMixin):
:vartype entities:
list[~azure.ai.textanalytics.CategorizedEntity]
:ivar warnings: Warnings encountered while processing document. Results will still be returned
if there are warnings, but they may not be fully accurate.
if there are warnings, but they may not be fully accurate.
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
:ivar statistics: If show_stats=true was specified in the request this
field will contain information about the document payload.
Expand All @@ -124,6 +124,40 @@ def __repr__(self):
.format(self.id, repr(self.entities), repr(self.warnings), repr(self.statistics), self.is_error)[:1024]


class RecognizePiiEntitiesResult(DictMixin):
"""RecognizePiiEntitiesResult is a result object which contains
the recognized Personally Identifiable Information (PII) entities
from a particular document.

:ivar str id: Unique, non-empty document identifier that matches the
document id that was passed in with the request. If not specified
in the request, an id is assigned for the document.
:ivar entities: Recognized PII entities in the document.
:vartype entities:
list[~azure.ai.textanalytics.PiiEntity]
:ivar warnings: Warnings encountered while processing document. Results will still be returned
if there are warnings, but they may not be fully accurate.
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
:ivar statistics: If show_stats=true was specified in the request this
field will contain information about the document payload.
:vartype statistics:
~azure.ai.textanalytics.TextDocumentStatistics
:ivar bool is_error: Boolean check for error item when iterating over list of
results. Always False for an instance of a RecognizePiiEntitiesResult.
"""

def __init__(self, **kwargs):
self.id = kwargs.get("id", None)
self.entities = kwargs.get("entities", None)
self.warnings = kwargs.get("warnings", [])
self.statistics = kwargs.get("statistics", None)
self.is_error = False

def __repr__(self):
return "RecognizePiiEntitiesResult(id={}, entities={}, warnings={}, statistics={}, is_error={})" \
.format(self.id, repr(self.entities), repr(self.warnings), repr(self.statistics), self.is_error)[:1024]


class DetectLanguageResult(DictMixin):
"""DetectLanguageResult is a result object which contains
the detected language of a particular document.
Expand All @@ -135,7 +169,7 @@ class DetectLanguageResult(DictMixin):
:ivar primary_language: The primary language detected in the document.
:vartype primary_language: ~azure.ai.textanalytics.DetectedLanguage
:ivar warnings: Warnings encountered while processing document. Results will still be returned
if there are warnings, but they may not be fully accurate.
if there are warnings, but they may not be fully accurate.
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
:ivar statistics: If show_stats=true was specified in the request this
field will contain information about the document payload.
Expand Down Expand Up @@ -193,6 +227,39 @@ def __repr__(self):
self.text, self.category, self.subcategory, self.confidence_score
)[:1024]

class PiiEntity(DictMixin):
"""PiiEntity contains information about a Personally Identifiable
Information (PII) entity found in text.

:ivar str text: Entity text as appears in the request.
:ivar str category: Entity category, such as Financial Account
Identification/Social Security Number/Phone Number, etc.
:ivar str subcategory: Entity subcategory, such as Credit Card/EU
Phone number/ABA Routing Numbers, etc.
:ivar float confidence_score: Confidence score between 0 and 1 of the extracted
entity.
"""

def __init__(self, **kwargs):
self.text = kwargs.get('text', None)
self.category = kwargs.get('category', None)
self.subcategory = kwargs.get('subcategory', None)
self.confidence_score = kwargs.get('confidence_score', None)

@classmethod
def _from_generated(cls, entity):
return cls(
text=entity.text,
category=entity.category,
subcategory=entity.subcategory,
confidence_score=entity.confidence_score,
)

def __repr__(self):
return "PiiEntity(text={}, category={}, subcategory={}, confidence_score={})".format(
self.text, self.category, self.subcategory, self.confidence_score
)[:1024]


class TextAnalyticsError(DictMixin):
"""TextAnalyticsError contains the error code, message, and
Expand Down Expand Up @@ -274,7 +341,7 @@ class ExtractKeyPhrasesResult(DictMixin):
in the input document.
:vartype key_phrases: list[str]
:ivar warnings: Warnings encountered while processing document. Results will still be returned
if there are warnings, but they may not be fully accurate.
if there are warnings, but they may not be fully accurate.
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
:ivar statistics: If show_stats=true was specified in the request this
field will contain information about the document payload.
Expand Down Expand Up @@ -308,7 +375,7 @@ class RecognizeLinkedEntitiesResult(DictMixin):
:vartype entities:
list[~azure.ai.textanalytics.LinkedEntity]
:ivar warnings: Warnings encountered while processing document. Results will still be returned
if there are warnings, but they may not be fully accurate.
if there are warnings, but they may not be fully accurate.
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
:ivar statistics: If show_stats=true was specified in the request this
field will contain information about the document payload.
Expand Down Expand Up @@ -344,7 +411,7 @@ class AnalyzeSentimentResult(DictMixin):
'neutral', 'negative', 'mixed'
:vartype sentiment: str
:ivar warnings: Warnings encountered while processing document. Results will still be returned
if there are warnings, but they may not be fully accurate.
if there are warnings, but they may not be fully accurate.
:vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
:ivar statistics: If show_stats=true was specified in the request this
field will contain information about the document payload.
Expand Down Expand Up @@ -429,7 +496,7 @@ def __init__(self, **kwargs):
def __getattr__(self, attr):
result_set = set()
result_set.update(
RecognizeEntitiesResult().keys()
RecognizeEntitiesResult().keys() + RecognizePiiEntitiesResult().keys()
+ DetectLanguageResult().keys() + RecognizeLinkedEntitiesResult().keys()
+ AnalyzeSentimentResult().keys() + ExtractKeyPhrasesResult().keys()
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@
DocumentError,
SentimentConfidenceScores,
TextAnalyticsError,
TextAnalyticsWarning
TextAnalyticsWarning,
RecognizePiiEntitiesResult,
PiiEntity,
)

def _get_too_many_documents_error(obj):
Expand Down Expand Up @@ -162,3 +164,12 @@ def sentiment_result(sentiment):
confidence_scores=SentimentConfidenceScores._from_generated(sentiment.confidence_scores), # pylint: disable=protected-access
sentences=[SentenceSentiment._from_generated(s) for s in sentiment.sentences], # pylint: disable=protected-access
)

@prepare_result
def pii_entities_result(entity):
return RecognizePiiEntitiesResult(
id=entity.id,
entities=[PiiEntity._from_generated(e) for e in entity.entities], # pylint: disable=protected-access
warnings=[TextAnalyticsWarning._from_generated(w) for w in entity.warnings], # pylint: disable=protected-access
statistics=TextDocumentStatistics._from_generated(entity.statistics), # pylint: disable=protected-access
)
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
linked_entities_result,
key_phrases_result,
sentiment_result,
language_result
language_result,
pii_entities_result
)

if TYPE_CHECKING:
Expand All @@ -36,6 +37,7 @@
ExtractKeyPhrasesResult,
AnalyzeSentimentResult,
DocumentError,
RecognizePiiEntitiesResult,
)


Expand Down Expand Up @@ -222,6 +224,77 @@ def recognize_entities( # type: ignore
except HttpResponseError as error:
process_batch_error(error)

@distributed_trace
def recognize_pii_entities( # type: ignore
self,
documents, # type: Union[List[str], List[TextDocumentInput], List[Dict[str, str]]]
**kwargs # type: Any
):
# type: (...) -> List[Union[RecognizePiiEntitiesResult, DocumentError]]
"""Recognize entities containing personal information for a batch of documents.

Returns a list of personal information entities ("SSN",
"Bank Account", etc) in the document. For the list of supported entity types,
check https://aka.ms/tanerpii

See https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview#data-limits
for document length limits, maximum batch size, and supported text encoding.

:param documents: The set of documents to process as part of this batch.
If you wish to specify the ID and language on a per-item basis you must
use as input a list[:class:`~azure.ai.textanalytics.TextDocumentInput`] or a list of
dict representations of :class:`~azure.ai.textanalytics.TextDocumentInput`, like
`{"id": "1", "language": "en", "text": "hello world"}`.
:type documents:
list[str] or list[~azure.ai.textanalytics.TextDocumentInput] or
list[dict[str, str]]
:keyword str language: The 2 letter ISO 639-1 representation of language for the
entire batch. For example, use "en" for English; "es" for Spanish etc.
If not set, uses "en" for English as default. Per-document language will
take precedence over whole batch language. See https://aka.ms/talangs for
supported languages in Text Analytics API.
:keyword str model_version: This value indicates which model will
be used for scoring, e.g. "latest", "2019-10-01". If a model-version
is not specified, the API will default to the latest, non-preview version.
:keyword bool show_stats: If set to true, response will contain document level statistics.
:return: The combined list of :class:`~azure.ai.textanalytics.RecognizePiiEntitiesResult`
and :class:`~azure.ai.textanalytics.DocumentError` in the order the original documents
were passed in.
:rtype: list[~azure.ai.textanalytics.RecognizePiiEntitiesResult,
~azure.ai.textanalytics.DocumentError]
:raises ~azure.core.exceptions.HttpResponseError or TypeError or ValueError or NotImplementedError:

.. admonition:: Example:

.. literalinclude:: ../samples/sample_recognize_pii_entities.py
:start-after: [START batch_recognize_pii_entities]
:end-before: [END batch_recognize_pii_entities]
kristapratico marked this conversation as resolved.
Show resolved Hide resolved
:language: python
:dedent: 8
:caption: Recognize personally identifiable information entities in a batch of documents.
"""
language_arg = kwargs.pop("language", None)
language = language_arg if language_arg is not None else self._default_language
docs = _validate_batch_input(documents, "language", language)
model_version = kwargs.pop("model_version", None)
show_stats = kwargs.pop("show_stats", False)
try:
return self._client.entities_recognition_pii(
documents=docs,
model_version=model_version,
show_stats=show_stats,
cls=kwargs.pop("cls", pii_entities_result),
**kwargs
)
except AttributeError as error:
if "'TextAnalyticsClient' object has no attribute 'entities_recognition_pii'" in str(error):
raise NotImplementedError(
"'recognize_pii_entities' endpoint is only available for API version v3.1-preview.1 and up"
)
raise error
except HttpResponseError as error:
process_batch_error(error)

@distributed_trace
def recognize_linked_entities( # type: ignore
self,
Expand Down
Loading