Skip to content

Commit

Permalink
Add multilingual data anon chain (#10346)
Browse files Browse the repository at this point in the history
  • Loading branch information
baskaryan authored Sep 7, 2023
2 parents 3005596 + 41a2548 commit 8826293
Show file tree
Hide file tree
Showing 6 changed files with 595 additions and 17 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Data anonymization with Microsoft Presidio\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/guides/privacy/presidio_data_anonymization.ipynb)\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/guides/privacy/presidio_data_anonymization/index.ipynb)\n",
"\n",
"## Use case\n",
"\n",
Expand Down Expand Up @@ -439,8 +439,6 @@
"metadata": {},
"source": [
"## Future works\n",
"\n",
"- **deanonymization** - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data.\n",
"- **instance anonymization** - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object."
]
}
Expand All @@ -461,7 +459,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.9.1"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 8826293

Please sign in to comment.