Skip to content

A Delta Table pipeline in Rust, triggered by Azure Functions responding to blob storage events in a specific container subfolder. The pipeline processes CSV files, updating or creating Delta Tables as needed, using merges for row changes.

Notifications You must be signed in to change notification settings

dsaad68/azurefunction-deltatable-pipeline-with-rust

Repository files navigation

Rust Azure Functions Delta Table Pipeline Triggered by Blob Storage Events

This example demonstrates how to create Delta Table pipeline with blob triggered throughes Event Grid Azure function using Rust.

Diagram

Whenever a CSV file is uploaded to a specific subfolder in container, the Azure Function is invoked to process the file. The content is then added to the Delta Table. If table does not exist, it is created. If there are any updates to a row determined by a specific column, the function will execute a merge, integrating the revised data into the existing Delta Table.

This approach is efficient for small files and eliminates the necessity for Spark.

Resources

  • Azure Functions
  • Azure Event Grid
  • Azure Blob Storage

How to setup the local dev environment

  1. Use VS Code Azure Functions extension to install Azure Functions
  2. Create a Function App with Custom Handler
  3. Create a Blob Storage container and add environment variables to local.settings.json
  4. Run ngrok to expose the function locally using this url schema below, for more details here.
    <ngrok_url>/runtime/webhooks/eventgrid?functionName=<FunctionName>
  5. Create a Event Grid topic using webhook use filtering with schema below for more details here.
    /blobServices/default/containers/<containername>/blobs/<subfolder>/

Note:

start.bat can be used to start the local dev environment.

Note about schema of files in Azure Blob Storage:

The Hadoop Filesystem driver for Azure Data Lake Storage Gen2 is identified by its scheme identifier "abfs," which stands for Azure Blob File System. Like other Hadoop Filesystem drivers, it uses a URI format to locate files and directories in a Data Lake Storage Gen2 account. For more details here.

From Deltalake version 0.17.0, can call the storage crate via: deltalake::azure::register_handlers(None); at the entrypoint for their code. For more information here.

abfss://<container_name>@<account_name>.dfs.core.windows.net/<path>/<file_name>
https://<account_name>.blob.core.windows.net/<container_name>/<path>/<file_name>

About

A Delta Table pipeline in Rust, triggered by Azure Functions responding to blob storage events in a specific container subfolder. The pipeline processes CSV files, updating or creating Delta Tables as needed, using merges for row changes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published