Skip to content

siilisolutions/azure-data-factory-runtime-aci

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure Data Factory self-hosted integration runtime in Azure Container Instances(ACI) service

This sample illustrates how to host an Azure Data Factory self-hosted integration runtime in Azure Container Instances service.

By using this approach, you can gain the benefits of

  • using a self-hosted integration runtime while avoiding having to manage virtual machines or other infrastructure
  • orchestrating ACI up and down when needed

Related work

Approach and architecture

This sample runs the self-hosted integration in a Windows container on ACI. Azure Data Factory supports running a self-hosted integration runtime on Windows containers, and they provide a GitHub repository with a Dockerfile and associated scripts. Azure Container Registry builds the Dockerfile by using ACR tasks.

The ACI uses VNet integration to connect to a virtual network. This means that the self-hosted integration runtime can connect to Data Factory by using a private endpoint, and it can also access servers and other resources that are accessible thorugh the virtual network.

To illustrate the end-to-end flow, the sample deploys an example Data Factory pipelines which

  • starts ACI and polls integration runtime(IR) status until IR is available
  • connects to a web server on a virtual machine by using a private IP address.

Architecture diagram

Architecture diagram

These are the data flows used by the solution:

  1. When the ACI starts, ACI pulls the container image from the container registry.
    • The ACI ACR's admin credentials to pull the image. This has been selected to simplify the demo.
  2. After the container is started, the self-hosted integration runtime loads. It connects to the data factory by using a private endpoint.
  3. When the data factory's pipeline runs, the self-hosted integration runtime accesses the web server on the virtual machine.

Deploy and test the sample

The entire deployment is defined as a Bicep file, with a series of modules to deploy each part of the solution.

To run the deployment, first create a resource group, such as by using the following Azure CLI command:

az group create \
  --name SHIR \
  --location australiaeast

Next, initiate the deployment of the Bicep file. The only mandatory parameter is vmAdminPassword, which must be set to a value that conforms to the password naming rules for virtual machines. The following Azure CLI command initiates the deployment:

az deployment group create \
  --resource-group SHIR \
  --template-file deploy/main.bicep \
  --parameters 'vmAdminPassword=<YOUR-VM-ADMIN-PASSWORD>' ['irNodeExpirationTime=<TIME-IN-SECONDS>'] ['triggerBuildTask=<true/false>']

where the optional parameter irNodeExpirationTime specifies the time in seconds when the offline nodes expire after App Service stops or restarts. The expired nodes will be removed automatically during next restarting. The minimum expiration time, as well as the default value, is 600 seconds (10 minutes).

The deployment takes approximately 30-45 minutes to complete. The majority of this time is the step to build the container image within Azure Container Registry. To decrease deployment time after first deployment triggerBuildTask parameter can be changed to false to disable automated container build on the ACI.

After the deployment completes, wait about another 10-15 minutes for App Service to deploy the container image. You can monitor the progress of this step by using the Deployment Center page on the App Service app resource in the Azure portal.

To test the deployment when it's completed:

  1. Open the Azure portal, navigate to the resource group (named SHIR by default), and open the data factory.
  2. Select Open Azure Data Factory Studio. A separate page opens up in your browser.
  3. On the left navigation bar, select Author.
  4. Invoke Azure Data Factory pipelines
  5. Under Pipelines, select Start ACI. On the toolbar, select Debug to start the pipeline running. 1. The task shows the Succeeded status. This means the self-hosted integration runtime is running and successfully connected with Azure Data Factory service. SHIR is available for other pipelines.
  6. Under Pipelines, select sample-pipeline. On the toolbar, select Debug to start the pipeline running. 1. The task shows the Succeeded status. This means the self-hosted integration runtime successfully connected to the web server on the virtual machine and accessed its data.

Security

To simplify templating ACI admin username and password are used:

Note

In this sample, we use ACI to host the container because the other serverless/PaaS container hosting options in Azure don't support or support has limitations VNet integration with Windows containers (at least at the time of writing, June 2023).

App service is not really serverless: it is generating costs 24/7 as soon as Microsoft.Web/serverfarms is being deployed. Official app service demo

TODO: comments about AKS. New free tier allows running small Kubbernetes cluster with 24/7 cluster management cost. (Released 02/2023)

TODO

  • Test if ACI can pull images through a VNet-integrated container registry
  • Refactor templating when Bicep support resources as module inputs and outputs
  • MSI support as soon as Windows ACI supports MSI

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages