ADF ETL

Identify the IP address that will be administering the resources:

curl ifconfig.me

Create the .auto.tfvars file:

cp config/template.tfvars .auto.tfvars

Set the required variables:

subscription_id    = "<subscriptionId>"
allowed_public_ips = ["<your ip>"]

Create the resources:

terraform init
terraform apply -auto-approve

Private Endpoint

Approve the managed private endpoints generated for:

Data lake
Synapse

Data set

Using NYC taxi dataset for this project.

Create the data directory and download the database file:

mkdir nyctls
curl -L https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2023-01.parquet -o nyctls/nyc-trip-records.parquet

Create the file system and upload the file replacing the account-name option value:

az storage fs create --auth-mode login -n database --account-name <storage-name>
az storage blob upload --auth-mode login -f ./nyctls/nyc-trip-records.parquet -c synapse -n database/nyc-trip-records.parquet --account-name <storage-name>

Synapse

Lake database

Create a new Lake database:

Name: Database1
Linked service: The data lake storage
Input folder: synapse/database
Data format: Parquet

Create a new Table from the data lake:

External tablet name: nyc_taxi
Linked service: The data lake storage
Input file: synapse/database/nyc-trip-records.parquet

Spark

Upload the spark/synapse-transform.ipynb notebook to Synapse.

Connect to the Spark pool and run the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
modules		modules
spark		spark
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
LICENSE		LICENSE
README.md		README.md
backend.tf		backend.tf
main.tf		main.tf
output.tf		output.tf
provider.tf		provider.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADF ETL

Private Endpoint

Data set

Synapse

Lake database

Spark

Reference

About

Releases

Packages

Languages

License

epomatti/azure-datafactory

Folders and files

Latest commit

History

Repository files navigation

ADF ETL

Private Endpoint

Data set

Synapse

Lake database

Spark

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages