privacy-engineering-tools/deidentification at main · UtrechtUniversity/privacy-engineering-tools

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
deidentification-other.md		deidentification-other.md
deidentification-python.md		deidentification-python.md
deidentification-r.md		deidentification-r.md

README.md

De-identification tools

The General Data Protection Regulation (GDPR) distinguishes between pseudonymization and anonymization. The goal of both processes is to make data less identifiable, the difference being that anonymization results in data that are no longer subject to the GDPR, whereas pseudonymization results in less identifiable but still personal data. To avoid confusion, we use the term de-identification for the tools listed here, as the goal of these tools is to make data less identifiable, but the result of that process can either consist of pseudonymized or anonymized data.

De-identification can consist of completely removing identifiable information, generalizing data points (e.g., replace birth date with age), replacing identifiable values with non-sensitive ones, etc. For a comprehensive overview of what de-identification is and how it can be applied to research data, please refer to the Data Privacy Handbook.

We have divided the de-identification tools into three major groups:

De-identification tools for which no programming is required (e.g., complete software packages or graphical user interfaces);
Python packages to de-identify data;
R packages to de-identify data.

Currently included tools

The following tools are currently included in this folder:

Tabular data:
- User interfaces: Amnesia, ARX, Datacheck, mu-Argus, OpenPseudonymiser, sdcMicro, tau-Argus
- Python packages: anonymoUUs, mysto, pynonymizer
- R packages: Datacheck, sdcMicro, sdcTable
Textual data:
- User interfaces: Stanford Named Entity Recognizer, Text Anonymization Helper
- Python packages: anonymoUUs, deduce, presidio, Textwash
- R packages: -
Images/video:
- User interfaces: -
- Python packages: DeepPrivacy2, Masked-Piper, presidio
- R packages: -

Other types of data

Tools for de-identifying the following data types are currently not included:

Audio data: You can de-identify audio data by distorting or otherwise editing the audio track. There are many audio editing tools available, e.g., like the privacy-friendly tools listed here.
File metadata: Many files contain metadata, such as date of creation, instrument settings used, location, etc. These metadata could also be removed if they could reveal personal information. Here is a list of such metadata removal tools.
Neuroimaging data: Neuroimaging data can be de-identified through skull stripping, defacing, face blurring, and face substitution (Eke et al., 2021). There are many tools available for this (many Python-based), see this list compiled by the Open Brain Consent working group.
Databases (SQL data): There are many tools to de-identify SQL data. However, since this type of data is not often collected as part of the research data (mostly for storing contact details of participants), they are not included in these overviews.

Other resources

There is a GitHub organization dedicated to tools to perform Statistical Disclosure Control. They offer support and documentation for sdcMicro, sdcTable, mu-Argus and tau-Argus, among others.
If you want to know more about Statistical Disclosure Control, the book "Statistical Disclosure Control" (Hundepool et al., 2012) gives an extensive overview.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deidentification

deidentification

README.md

De-identification tools

Currently included tools

Other types of data

Other resources

Files

deidentification

Directory actions

More options

Directory actions

More options

Latest commit

History

deidentification

Folders and files

parent directory

README.md

De-identification tools

Currently included tools

Other types of data

Other resources