Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag #26812

Merged
merged 2 commits into from
Sep 25, 2024

Conversation

Raj725
Copy link
Contributor

@Raj725 Raj725 commented Sep 24, 2024

  • Description: The flag is named anonymize_snippets. When set to true, the Pebblo server will anonymize snippets by redacting all personally identifiable information (PII) from the snippets going into VectorDB and the generated reports
  • Issue: NA
  • Dependencies: NA
  • docs: Updated

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Sep 24, 2024
Copy link

vercel bot commented Sep 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 25, 2024 5:39am

@dosubot dosubot bot added community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Sep 24, 2024
"source": [
"### Anonymize the snippets to redact all PII details\n",
"\n",
"Set `anonymize_snippets` to `True` to anonymize all personally identifiable information (PII) from the snippets going into VectorDB and the generated reports."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the recall on identifying PII 100%? If not might be worth a clarification or link to other documentation to set expectations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback! I’ll add a link to the Pebblo classifier documentation to clarify the recall on identifying PII

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccurme I’ve added a note and a link to our docs. We’re planning to update it with more details soon. Please review.

@eyurtsev eyurtsev self-assigned this Sep 25, 2024
@eyurtsev eyurtsev merged commit 7e5a9c3 into langchain-ai:master Sep 25, 2024
33 checks passed
Sheepsta300 pushed a commit to Sheepsta300/langchain that referenced this pull request Oct 1, 2024
… flag (langchain-ai#26812)

- **Description:** The flag is named `anonymize_snippets`. When set to
true, the Pebblo server will anonymize snippets by redacting all
personally identifiable information (PII) from the snippets going into
VectorDB and the generated reports
- **Issue:** NA
- **Dependencies:** NA
- **docs**: Updated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants