GitHub - madhavi-mal/ADLSGen2-Databricks-Connectivity: ADLSGen2-Databricks-Connectivity

ADLSGen2-Databricks-Connectivity

This repo will address how to use Azure Data Lake Store (ADLS) Gen2 as external storage with Azure Databricks and contains automation scripts.

There are currently four options for connecting from Databricks to ADLS Gen2:

Using the ADLS Gen2 storage account access key directly
Using a service principal directly (OAuth 2.0)
Mounting an ADLS Gen2 filesystem to DBFS using a service principal (OAuth 2.0)
Azure Active Directory (AAD) credential passthrough

We will focus on authenticating to ADLS Gen 2 storage from Azure databricks clusters

Requirements

Create and initialize ADLS gen 2 file system, enabling the hierarchical namespaces.

spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")

dbutils.fs.ls("abfss://@.dfs.core.windows.net/")

spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")

Important

When the hierarchical namespace is enabled for an Azure Data Lake Storage Gen2 account, you do not need to create any Blob containers through the Azure portal.

When the hierarchical namespace is enabled, Azure Blob storage APIs are not available. See this Known issue description. For example, you cannot use the wasb or wasbs scheme to access the blob.core.windows.net endpoint.

If you enable the hierarchical namespace there is no interoperability of data or operations between Azure Blob storage and Azure Data Lake Storage Gen2 REST APIs.

Security Requirements

Cluster requirements

Enable Azure Data Lake Storage credential passthrough for a high-concurrency cluster

spark.read.csv("abfss://@.dfs.core.windows.net/MyData.csv").collect()

Enable Azure Data Lake Storage credential passthrough for a standard cluster

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
adls-credential-passthrough.png		adls-credential-passthrough.png
credential-passthrough-single.png		credential-passthrough-single.png
initializeFileSystem.scala		initializeFileSystem.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADLSGen2-Databricks-Connectivity

Requirements

About

Releases

Packages

Languages

License

madhavi-mal/ADLSGen2-Databricks-Connectivity

Folders and files

Latest commit

History

Repository files navigation

ADLSGen2-Databricks-Connectivity

Requirements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages