Skip to content

Hashing & Obfuscation

Leandro Matayoshi edited this page May 6, 2020 · 1 revision

Overview

Maap Store offers hashing and obfuscation capabilities to avoid exposing sensitive data.

Patient Identification (PID)

Lab Record Imports may include a special Patient ID column, that indicates which is the patient associated with a particular lab record. These associations between patient IDs and lab records are extremely sensitive data, and they can't remain accessible server side.

For this reason, after uploading a Lab Record Import, PIDs are submitted to a hashing process that comprises the following steps:

  1. Hash each plain PID
  2. Add an entry in PatientIdHashes table, to keep register of which patient is associated with each hash
  3. Modify the uploaded file, replacing each value on the PID column with its hashed version and save the file

This hashing process occurs every ~10 minutes through the execution of a Sidekiq cron job.

For example, suppose the given Lab Record Import file is being uploaded:


Patient Id Lab Id Test Id Result Date
1235 Main Lab Flu Positive Dec 15, 2019
1237 Main Lab Flu Positive Dec 17, 2019

Then a hash will be generated for each PID, and two entries will be added to PatientIdHashes's table:


Patient Id Hashed value
1235 b37eebc6-8c7e-4b64-90d1-2a15950a7500
1237 6691083c-8e4a-4040-848a-573a113d1504

And each value on the Patient Id column of the uploaded file will be replaced with its associated hash:


Patient Id Lab Id Test Id Result Date
b37eebc6-8c7e-4b64-90d1-2a15950a7500 Main Lab Flu Positive Dec 15, 2019
6691083c-8e4a-4040-848a-573a113d1504 Main Lab Flu Positive Dec 17, 2019

⚠️ PatientIdHashes table is the only way to keep track of the association between PIDs and hashes. This table should be kept during the whole lifespan of the project, in case it's necessary. At the end, it's admins responsibility to delete it so sensitive data is no longer available.

Personal Health Information (PHI)

Some columns in both Lab Record Imports and Electronic Pharmacy Stock Record Imports may be marked as phi during an upload, which indicates that those columns contain sensitive information and have to be obfuscated.

For this reason, after uploading a file with phi columns, an obfuscation step is performed, in which the values of phi columns are replaced with 'Not available' string and the file is saved.

Recalling our previous example where the following Lab Record Import is being uploaded:


Patient Id Lab Id Test Id Result Date
1235 Main Lab Flu Positive Dec 15, 2019
1237 Main Lab Flu Positive Dec 17, 2019

Suppose that Result column has been marked as phi. Then, the resulting file would be:


Patient Id Lab Id Test Id Result Date
b37eebc6-8c7e-4b64-90d1-2a15950a7500 Main Lab Flu Not available Dec 15, 2019
6691083c-8e4a-4040-848a-573a113d1504 Main Lab Flu Not available Dec 17, 2019

Clone this wiki locally