Azure · iscai-msft · Jul 30, 2020 · Jul 21, 2020 · Jul 22, 2020 · Jul 22, 2020
diff --git a/sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md b/sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md
@@ -2,8 +2,10 @@
 
 ## 5.0.1 (Unreleased)
 
+**New features**
 - We are now targeting the service's v3.1-preview.1 API as the default. If you would like to still use version v3.0 of the service,
 pass in `v3.0` to the kwarg `api_version` when creating your TextAnalyticsClient
+- We have added an API `recognize_pii_entities` which returns entities containing personal information for a batch of documents. Only available for API version v3.1-preview.1 and up.
 
 ## 5.0.0 (2020-07-27)
 

diff --git a/sdk/textanalytics/azure-ai-textanalytics/README.md b/sdk/textanalytics/azure-ai-textanalytics/README.md
@@ -4,6 +4,7 @@ Text Analytics is a cloud-based service that provides advanced natural language
 * Sentiment Analysis
 * Named Entity Recognition
 * Linked Entity Recognition
+* Personally Identifiable Information (PII) Entity Recognition
 * Language Detection
 * Key Phrase Extraction
 
@@ -184,6 +185,7 @@ The following section provides several code snippets covering some of the most c
 * [Analyze Sentiment](#analyze-sentiment "Analyze sentiment")
 * [Recognize Entities](#recognize-entities "Recognize entities")
 * [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities")
+* [Recognize PII Entities](#recognize-pii-entities "Recognize pii entities")
 * [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
 * [Detect Language](#detect-language "Detect language")
 
@@ -290,6 +292,35 @@ The returned response is a heterogeneous list of result and error objects: list[
 Please refer to the service documentation for a conceptual discussion of [entity linking][linked_entity_recognition]
 and [supported types][linked_entities_categories].
 
+### Recognize PII entities
+[recognize_pii_entities][recognize_pii_entities] recognizes and categorizes Personally Identifiable Information (PII) entities in its input text, such as
+Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only available for v3.1-preview.1 and up.
+
+```python
+from azure.core.credentials import AzureKeyCredential
+from azure.ai.textanalytics import TextAnalyticsClient, ApiVersion
+
+credential = AzureKeyCredential("<api_key>")
+endpoint="https://<region>.api.cognitive.microsoft.com/"
+
+text_analytics_client = TextAnalyticsClient(endpoint, credential)
+
+documents = [
+    "The employee's SSN is 859-98-0987.",
+    "The employee's phone number is 555-555-5555."
+]
+response = text_analytics_client.recognize_pii_entities(documents, language="en")
+result = [doc for doc in response if not doc.is_error]
+for doc in result:
+    for entity in doc.entities:
+        print("Entity: \t", entity.text, "\tCategory: \t", entity.category,
+              "\tConfidence Score: \t", entity.confidence_score)
+```
+
+The returned response is a heterogeneous list of result and error objects: list[[RecognizePiiEntitiesResult][recognize_pii_entities_result], [DocumentError][document_error]]
+
+Please refer to the service documentation for [supported PII entity types][pii_entity_categories].
+
 ### Extract key phrases
 [extract_key_phrases][extract_key_phrases] determines the main talking points in its input text. For example, for the input text "The food was delicious and there were wonderful staff", the API returns: "food" and "wonderful staff".
 
@@ -412,6 +443,7 @@ Authenticate the client with a Cognitive Services/Text Analytics API key or a to
 In a batch of documents:
 * Analyze sentiment: [sample_analyze_sentiment.py][analyze_sentiment_sample] ([async version][analyze_sentiment_sample_async])
 * Recognize entities: [sample_recognize_entities.py][recognize_entities_sample] ([async version][recognize_entities_sample_async])
+* Recognize personally identifiable information: [sample_recognize_pii_entities.py](`https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/textanalytics/azure-ai-textanalytics/samples/sample_recognize_pii_entities.py`)([async version](`https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/textanalytics/azure-ai-textanalytics/samples/async_samples/sample_recognize_pii_entities_async.py`))
 * Recognize linked entities: [sample_recognize_linked_entities.py][recognize_linked_entities_sample] ([async version][recognize_linked_entities_sample_async])
 * Extract key phrases: [sample_extract_key_phrases.py][extract_key_phrases_sample] ([async version][extract_key_phrases_sample_async])
 * Detect language: [sample_detect_language.py][detect_language_sample] ([async version][detect_language_sample_async])
@@ -458,6 +490,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
 [document_error]: https://aka.ms/azsdk-python-textanalytics-documenterror
 [detect_language_result]: https://aka.ms/azsdk-python-textanalytics-detectlanguageresult
 [recognize_entities_result]: https://aka.ms/azsdk-python-textanalytics-recognizeentitiesresult
+[recognize_pii_entities_result]: https://aka.ms/azsdk-python-textanalytics-recognizepiientitiesresult
 [recognize_linked_entities_result]: https://aka.ms/azsdk-python-textanalytics-recognizelinkedentitiesresult
 [analyze_sentiment_result]: https://aka.ms/azsdk-python-textanalytics-analyzesentimentresult
 [extract_key_phrases_result]: https://aka.ms/azsdk-python-textanalytics-extractkeyphrasesresult
@@ -467,6 +500,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
 
 [analyze_sentiment]: https://aka.ms/azsdk-python-textanalytics-analyzesentiment
 [recognize_entities]: https://aka.ms/azsdk-python-textanalytics-recognizeentities
+[recognize_pii_entities]: https://aka.ms/azsdk-python-textanalytics-recognizepiientities
 [recognize_linked_entities]: https://aka.ms/azsdk-python-textanalytics-recognizelinkedentities
 [extract_key_phrases]: https://aka.ms/azsdk-python-textanalytics-extractkeyphrases
 [detect_language]: https://aka.ms/azsdk-python-textanalytics-detectlanguage
@@ -477,6 +511,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
 [key_phrase_extraction]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-keyword-extraction
 [linked_entities_categories]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=general
 [linked_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
+[pii_entity_categories]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal
 [named_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
 [named_entity_categories]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=general
 

diff --git a/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/__init__.py b/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/__init__.py
@@ -25,7 +25,9 @@
     LinkedEntityMatch,
     TextDocumentBatchStatistics,
     SentenceSentiment,
-    SentimentConfidenceScores
+    SentimentConfidenceScores,
+    RecognizePiiEntitiesResult,
+    PiiEntity
 )
 
 __all__ = [
@@ -48,7 +50,9 @@
     'LinkedEntityMatch',
     'TextDocumentBatchStatistics',
     'SentenceSentiment',
-    'SentimentConfidenceScores'
+    'SentimentConfidenceScores',
+    'RecognizePiiEntitiesResult',
+    'PiiEntity',
 ]
 
 __version__ = VERSION
diff --git a/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py b/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_models.py
@@ -102,7 +102,7 @@ class RecognizeEntitiesResult(DictMixin):
     :vartype entities:
         list[~azure.ai.textanalytics.CategorizedEntity]
     :ivar warnings: Warnings encountered while processing document. Results will still be returned
-     if there are warnings, but they may not be fully accurate.
+        if there are warnings, but they may not be fully accurate.
     :vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
     :ivar statistics: If show_stats=true was specified in the request this
         field will contain information about the document payload.
@@ -124,6 +124,40 @@ def __repr__(self):
             .format(self.id, repr(self.entities), repr(self.warnings), repr(self.statistics), self.is_error)[:1024]
 
 
+class RecognizePiiEntitiesResult(DictMixin):
+    """RecognizePiiEntitiesResult is a result object which contains
+    the recognized Personally Identifiable Information (PII) entities
+    from a particular document.
+
+    :ivar str id: Unique, non-empty document identifier that matches the
+        document id that was passed in with the request. If not specified
+        in the request, an id is assigned for the document.
+    :ivar entities: Recognized PII entities in the document.
+    :vartype entities:
+        list[~azure.ai.textanalytics.PiiEntity]
+    :ivar warnings: Warnings encountered while processing document. Results will still be returned
+        if there are warnings, but they may not be fully accurate.
+    :vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
+    :ivar statistics: If show_stats=true was specified in the request this
+        field will contain information about the document payload.
+    :vartype statistics:
+        ~azure.ai.textanalytics.TextDocumentStatistics
+    :ivar bool is_error: Boolean check for error item when iterating over list of
+        results. Always False for an instance of a RecognizePiiEntitiesResult.
+    """
+
+    def __init__(self, **kwargs):
+        self.id = kwargs.get("id", None)
+        self.entities = kwargs.get("entities", None)
+        self.warnings = kwargs.get("warnings", [])
+        self.statistics = kwargs.get("statistics", None)
+        self.is_error = False
+
+    def __repr__(self):
+        return "RecognizePiiEntitiesResult(id={}, entities={}, warnings={}, statistics={}, is_error={})" \
+            .format(self.id, repr(self.entities), repr(self.warnings), repr(self.statistics), self.is_error)[:1024]
+
+
 class DetectLanguageResult(DictMixin):
     """DetectLanguageResult is a result object which contains
     the detected language of a particular document.
@@ -135,7 +169,7 @@ class DetectLanguageResult(DictMixin):
     :ivar primary_language: The primary language detected in the document.
     :vartype primary_language: ~azure.ai.textanalytics.DetectedLanguage
     :ivar warnings: Warnings encountered while processing document. Results will still be returned
-     if there are warnings, but they may not be fully accurate.
+        if there are warnings, but they may not be fully accurate.
     :vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
     :ivar statistics: If show_stats=true was specified in the request this
         field will contain information about the document payload.
@@ -193,6 +227,39 @@ def __repr__(self):
             self.text, self.category, self.subcategory, self.confidence_score
         )[:1024]
 
+class PiiEntity(DictMixin):
+    """PiiEntity contains information about a Personally Identifiable
+    Information (PII) entity found in text.
+
+    :ivar str text: Entity text as appears in the request.
+    :ivar str category: Entity category, such as Financial Account
+        Identification/Social Security Number/Phone Number, etc.
+    :ivar str subcategory: Entity subcategory, such as Credit Card/EU
+        Phone number/ABA Routing Numbers, etc.
+    :ivar float confidence_score: Confidence score between 0 and 1 of the extracted
+        entity.
+    """
+
+    def __init__(self, **kwargs):
+        self.text = kwargs.get('text', None)
+        self.category = kwargs.get('category', None)
+        self.subcategory = kwargs.get('subcategory', None)
+        self.confidence_score = kwargs.get('confidence_score', None)
+
+    @classmethod
+    def _from_generated(cls, entity):
+        return cls(
+            text=entity.text,
+            category=entity.category,
+            subcategory=entity.subcategory,
+            confidence_score=entity.confidence_score,
+        )
+
+    def __repr__(self):
+        return "PiiEntity(text={}, category={}, subcategory={}, confidence_score={})".format(
+                   self.text, self.category, self.subcategory, self.confidence_score
+                )[:1024]
+
 
 class TextAnalyticsError(DictMixin):
     """TextAnalyticsError contains the error code, message, and
@@ -274,7 +341,7 @@ class ExtractKeyPhrasesResult(DictMixin):
         in the input document.
     :vartype key_phrases: list[str]
     :ivar warnings: Warnings encountered while processing document. Results will still be returned
-     if there are warnings, but they may not be fully accurate.
+        if there are warnings, but they may not be fully accurate.
     :vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
     :ivar statistics: If show_stats=true was specified in the request this
         field will contain information about the document payload.
@@ -308,7 +375,7 @@ class RecognizeLinkedEntitiesResult(DictMixin):
     :vartype entities:
         list[~azure.ai.textanalytics.LinkedEntity]
     :ivar warnings: Warnings encountered while processing document. Results will still be returned
-     if there are warnings, but they may not be fully accurate.
+        if there are warnings, but they may not be fully accurate.
     :vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
     :ivar statistics: If show_stats=true was specified in the request this
         field will contain information about the document payload.
@@ -344,7 +411,7 @@ class AnalyzeSentimentResult(DictMixin):
         'neutral', 'negative', 'mixed'
     :vartype sentiment: str
     :ivar warnings: Warnings encountered while processing document. Results will still be returned
-     if there are warnings, but they may not be fully accurate.
+        if there are warnings, but they may not be fully accurate.
     :vartype warnings: list[~azure.ai.textanalytics.TextAnalyticsWarning]
     :ivar statistics: If show_stats=true was specified in the request this
         field will contain information about the document payload.
@@ -429,7 +496,7 @@ def __init__(self, **kwargs):
     def __getattr__(self, attr):
         result_set = set()
         result_set.update(
-            RecognizeEntitiesResult().keys()
+            RecognizeEntitiesResult().keys() + RecognizePiiEntitiesResult().keys()
             + DetectLanguageResult().keys() + RecognizeLinkedEntitiesResult().keys()
             + AnalyzeSentimentResult().keys() + ExtractKeyPhrasesResult().keys()
         )

diff --git a/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_response_handlers.py b/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_response_handlers.py
@@ -24,7 +24,9 @@
     DocumentError,
     SentimentConfidenceScores,
     TextAnalyticsError,
-    TextAnalyticsWarning
+    TextAnalyticsWarning,
+    RecognizePiiEntitiesResult,
+    PiiEntity,
 )
 
 def _get_too_many_documents_error(obj):
@@ -162,3 +164,12 @@ def sentiment_result(sentiment):
         confidence_scores=SentimentConfidenceScores._from_generated(sentiment.confidence_scores),  # pylint: disable=protected-access
         sentences=[SentenceSentiment._from_generated(s) for s in sentiment.sentences],  # pylint: disable=protected-access
     )
+
+@prepare_result
+def pii_entities_result(entity):
+    return RecognizePiiEntitiesResult(
+        id=entity.id,
+        entities=[PiiEntity._from_generated(e) for e in entity.entities],  # pylint: disable=protected-access
+        warnings=[TextAnalyticsWarning._from_generated(w) for w in entity.warnings],  # pylint: disable=protected-access
+        statistics=TextDocumentStatistics._from_generated(entity.statistics),  # pylint: disable=protected-access
+    )
diff --git a/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_text_analytics_client.py b/sdk/textanalytics/azure-ai-textanalytics/azure/ai/textanalytics/_text_analytics_client.py
@@ -22,7 +22,8 @@
     linked_entities_result,
     key_phrases_result,
     sentiment_result,
-    language_result
+    language_result,
+    pii_entities_result
 )
 
 if TYPE_CHECKING:
@@ -36,6 +37,7 @@
         ExtractKeyPhrasesResult,
         AnalyzeSentimentResult,
         DocumentError,
+        RecognizePiiEntitiesResult,
     )
 
 
@@ -222,6 +224,77 @@ def recognize_entities(  # type: ignore
         except HttpResponseError as error:
             process_batch_error(error)
 
+    @distributed_trace
+    def recognize_pii_entities(  # type: ignore
+        self,
+        documents,  # type: Union[List[str], List[TextDocumentInput], List[Dict[str, str]]]
+        **kwargs  # type: Any
+    ):
+        # type: (...) -> List[Union[RecognizePiiEntitiesResult, DocumentError]]
+        """Recognize entities containing personal information for a batch of documents.
+
+        Returns a list of personal information entities ("SSN",
+        "Bank Account", etc) in the document.  For the list of supported entity types,
+        check https://aka.ms/tanerpii
+
+        See https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview#data-limits
+        for document length limits, maximum batch size, and supported text encoding.
+
+        :param documents: The set of documents to process as part of this batch.
+            If you wish to specify the ID and language on a per-item basis you must
+            use as input a list[:class:`~azure.ai.textanalytics.TextDocumentInput`] or a list of
+            dict representations of :class:`~azure.ai.textanalytics.TextDocumentInput`, like
+            `{"id": "1", "language": "en", "text": "hello world"}`.
+        :type documents:
+            list[str] or list[~azure.ai.textanalytics.TextDocumentInput] or
+            list[dict[str, str]]
+        :keyword str language: The 2 letter ISO 639-1 representation of language for the
+            entire batch. For example, use "en" for English; "es" for Spanish etc.
+            If not set, uses "en" for English as default. Per-document language will
+            take precedence over whole batch language. See https://aka.ms/talangs for
+            supported languages in Text Analytics API.
+        :keyword str model_version: This value indicates which model will
+            be used for scoring, e.g. "latest", "2019-10-01". If a model-version
+            is not specified, the API will default to the latest, non-preview version.
+        :keyword bool show_stats: If set to true, response will contain document level statistics.
+        :return: The combined list of :class:`~azure.ai.textanalytics.RecognizePiiEntitiesResult`
+            and :class:`~azure.ai.textanalytics.DocumentError` in the order the original documents
+            were passed in.
+        :rtype: list[~azure.ai.textanalytics.RecognizePiiEntitiesResult,
+            ~azure.ai.textanalytics.DocumentError]
+        :raises ~azure.core.exceptions.HttpResponseError or TypeError or ValueError or NotImplementedError:
+
+        .. admonition:: Example:
+
+            .. literalinclude:: ../samples/sample_recognize_pii_entities.py
+                :start-after: [START batch_recognize_pii_entities]
+                :end-before: [END batch_recognize_pii_entities]
+                :language: python
+                :dedent: 8
+                :caption: Recognize personally identifiable information entities in a batch of documents.
+        """
+        language_arg = kwargs.pop("language", None)
+        language = language_arg if language_arg is not None else self._default_language
+        docs = _validate_batch_input(documents, "language", language)
+        model_version = kwargs.pop("model_version", None)
+        show_stats = kwargs.pop("show_stats", False)
+        try:
+            return self._client.entities_recognition_pii(
+                documents=docs,
+                model_version=model_version,
+                show_stats=show_stats,
+                cls=kwargs.pop("cls", pii_entities_result),
+                **kwargs
+            )
+        except AttributeError as error:
+            if "'TextAnalyticsClient' object has no attribute 'entities_recognition_pii'" in str(error):
+                raise NotImplementedError(
+                    "'recognize_pii_entities' endpoint is only available for API version v3.1-preview.1 and up"
+                )
+            raise error
+        except HttpResponseError as error:
+            process_batch_error(error)
+
     @distributed_trace
     def recognize_linked_entities(  # type: ignore
         self,