The repo provides reference architecture to deploy Apache Ranger on Amazon EMR. Apache Ranger is a framework to enable, monitor, and manage comprehensive data security across the Hadoop platform. Features include centralized security administration, fine-grained authorization across many Hadoop components (eg - Hadoop, Hive, HBase, Storm, Knox, Solr, Kafka, and YARN) and central auditing. It uses agents to sync policies and users, and plugins that run within the same process as the Hadoop component, like NameNode and HiveServer2.
The repo contains code tied to AWS Big Data Blog.
NOTE: the code has gone through unit and functional test against a few recent versions of Amazon EMR. It is likely that it may not work with all EMR versions. Code/plugin marked as beta has not been suitable for production use.
Please submit Pull Request or to create an Issue
Module | Description | Architecture | Details |
---|---|---|---|
V1 | Open Source Ranger Plugins with LDAP | Basic deployment using AWS Simple AD, Hive and HDFS plugins and optional Presto Plugin | |
V2 | Open Source Ranger Plugins with Kerberos enabled cluster and AD | Deploy a Kerberos enabled EMR cluster using Windows AD, Hive and HDFS plugins and optional Presto and HBase Plugin | |
V3 | EMR Native plugins for Spark/S3 with Kerberos enabled cluster and AD | Deploy a Kerberos enabled EMR cluster with the Amazon EMR native integration of Apache Ranger - Supports Hive, Spark and Amazon S3 |
Module | Tag | Region | Region Code | Cloudformation stack | Apache Ranger Version | EMR Version | Supported Plugins |
---|---|---|---|---|---|---|---|
V1 | 1.0 | All | All | Apache Ranger 1.0, 2.1 | emr-5.28.1, emr-5.29.0, emr-5.30.1 | Hive 2.x, Hadoop 2.x, PrestoDB 0.227/0.232 (Presto plugin needs Ranger 2.0) | |
V1 | 1.1 | All | All | Apache Ranger 2.2 | emr-5.29.0, emr-5.30.1, emr-6.1.0 | Hive 3.x, Hadoop 3.x, PrestoSQL 338 OR PrestoDB 0.232 | |
V2 | 2.0 | US East (Virginia) | us-east-1 | Step1 - Setup VPC/AD server - Step 2 - Setup the Ranger Server/RDS Instance/EMR Cluster - |
Apache Ranger 2.1 | emr-5.30.1, emr-6.1.0, emr-6.2.0 | Hive 2.x, Hadoop 2.x, PrestoSQL 338/343, PrestoDB 0.227/0.232 (Presto plugin needs Ranger 2.0) |
V3 EMR Ranger GA Launch | 3.0 | US East (Virginia) | us-east-1 | Step1 - Use this script to Upload SSL key and certs to AWS Secrets Manager Script Step2 - Setup VPC/AD server Step 3 - Setup the Ranger Server/RDS Instance/EMR Cluster |
Apache Ranger 2.1 | emr-5.32.0 | Hive 2.x, Hadoop 2.x, Spark 2.x |
WARNING: The current V1 setup does not enable strong cluster level Auth (Kerberos) for EMR. Only LDAP enabled Hue UI. V2 will support Kerberos - refer to the roadmap for details.
Shows how the plugin can be used to enable column level access controls, column masking and row filter. Demo uses the Presto Redshift connector. The same functionality should work with other Presto connectors.
Please open Git Issues if you would like to see updates/other plugin integrations.
- Amazon EMR: https://aws.amazon.com/emr/
- EMRFS: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html
- Amazon EMR + Kerberos: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-kerberos.html
- Apache Ranger: https://ranger.apache.org/
- Apache Ranger + Amazon EMR Blog: https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/
- Apache Ranger Presto Plugin:
- Code for PrestoDB plugin - Link
- https://cwiki.apache.org/confluence/display/RANGER/Presto+Plugin
If you encounter a bug, please create a new issue with as much detail as possible and steps for reproducing the bug. See the Contributing Guidelines for more details.
This sample code is made available under a modified MIT license. See the LICENSE file.