Skip to content

aws-samples/data-masking-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Data-Masking-Framework

Data Masking Framework (DMF) is a configuration driven approach to mask sensitive data using hashing or encryption in an AWS Data Lakes. This uses PySpark with EMR or Glue based environment. The configuration contains Glue catalog tables and columns and the associated data masking approach. The data masking uses the following following are the used for data masking.

  1. Reversible data encryption using a key

  2. Non-reversible data masking using hashing algorithm (sha256, sha512)

This framework also supports key based lookup using the original data.

This project has two main components.

A python util to for the basic datamasking process

See datamask-pyutil for more information.

EMR-launch stack to make the process automated.

See datamask-emr-launch for more information.