-
Notifications
You must be signed in to change notification settings - Fork 54
Customizing Your Instance
- How can I customize my Cromwell on Azure deployment?
- How can I use a specific Cromwell image version?
- How do I use input data files for my workflows from a different Azure Storage account that my lab or team is currently using?
- Can I connect a different batch account with previously increased quotas to run my workflows?
- How can I use private Docker containers for my workflows?
- A lot of tasks for my workflows run longer than 24 hours and have been randomly stopped. How can I run all my tasks on dedicated batch VMs?
- Can I get direct access to Cromwell's REST API?
- How do I get
Content-MD5
set on all Task Outputs?
To get logs from all the Docker containers or to use the Cromwell REST API endpoints, you may want to connect to the Linux host VM. At installation, a user is created to allow managing the host VM with username "vmadmin". The password is randomly generated and shown during installation. If you need to reset your VM password, you can do this using the Azure Portal or by following these instructions.
Starting with Release 3.0, unless you used the --KeepSshPortOpen true
deployer option, in order to connect you can either
- Enable and use the Azure-provided
just-in-time
feature (if that option is available to you in your subscription and for the host VM) OR - Change the
Action
of theSSH
inbound security rule on the network security group in the VM's resource group fromDeny
toAllow
. Be sure to switch it back toDeny
when you log out.
To connect to your host VM, you can either
- Construct your ssh connection string if you have the VM name
ssh vmadmin@<hostname>
OR - Navigate to the Connect button on the Overview blade of your Azure VM instance, then copy the ssh connection string.
Paste the ssh connection string in a command line, PowerShell, or terminal application to log in.
Before deploying, you can choose to customize some input parameters to use existing Azure resources. Example:
.\deploy-cromwell-on-azure.exe --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string> --VmSize "Standard_D2_v2"
Here is the summary of common configuration parameters:
Configuration parameter | Has default | Validated | Used by update | Comment |
---|---|---|---|---|
string SubscriptionId | N | Y | Y | Azure Subscription Id - Always required. |
string RegionName | N | Y | N | Azure region name to deploy to - Required for new install. |
string MainIdentifierPrefix = "coa" | Y | Y | N | Prefix for all resources to be deployed - Required to deploy but defaults to "coa". |
string VmSize = "Standard_D3_v2" | Y | N | N | VM size of the Linux Ubuntu VM to use as the host - Not required and defaults to Standard_D3_v2. |
string VnetResourceGroupName | Y | Y | N | Available starting version 2.1. The resource group name of the specified virtual network to use - Not required, generated automatically if not provided. If specified, VnetName and SubnetName must be provided. |
string VnetName | Y | Y | N | Available starting version 2.1. The name of the specified virtual network to use - Not required, generated automatically if not provided. If specified, VnetResourceGroupName and SubnetName must be provided. |
string SubnetName | Y | Y | N | Available starting version 2.1. The subnet name of the specified virtual network to use - Not required, generated automatically if not provided. If specified, VnetResourceGroupName and VnetName must be provided. |
string VmSubnetName | Y | Y | N | Available starting version 3.1. The subnet name of the specified virtual network to use for the VM. If specified, VnetResourceGroupName, VnetName, and PostgreSqlSubnetName must be provided. |
string PostgreSqlSubnetName | Y | Y | N | Available starting version 3.1. The subnet name of the specified virtual network to use for the Azure PostgreSQL database. If specified, VnetResourceGroupName, VnetName, and VmSubnetName must be provided. |
string ResourceGroupName | Y | Y | Y | Required for update. If provided for new Cromwell on Azure deployment, it must already exist. |
string BatchAccountName | Y | N | N | The name of the Azure Batch Account to use ; must be in the SubscriptionId and RegionName provided - Not required, generated automatically if not provided. |
string StorageAccountName | Y | N | N | The name of the Azure Storage Account to use ; must be in the SubscriptionId provided - Not required, generated automatically if not provided. |
string ApplicationInsightsAccountName | Y | N | N | The name of the Application Insights Account to use; must be in the SubscriptionId provided - Not required, generated automatically if not provided. |
string CromwellVersion | Y | N | Y | Cromwell version to use. |
bool SkipTestWorkflow = false; | Y | Y | Y | Set to true to skip running the default test workflow. |
bool Update = false; | Y | Y | Y | Set to true if you want to update your existing Cromwell on Azure deployment to the latest version. Required for update. |
bool PrivateNetworking = false; | Y | Y | N | Available starting version 2.2. Set to true to create the host VM without public IP address. If set, VnetResourceGroupName, VnetName and SubnetName must be provided (and already exist). The deployment must be initiated from a machine that has access to that subnet. |
string LogAnalyticsArmId | Y | N | N | Arm resource id for an exising Log Analytics workspace, workspace is used for App Insights - Not required, a workspace will be generated automatically if not provided. |
string AksClusterName | Y | Y | Y | Cluster name of existing Azure Kubernetes Service cluster to use rather than provisioning a new one. |
string AksCoANamespace = "coa" | Y | N | N | Kubernetes namespace. |
bool ManualHelmDeployment | Y | N | N | For use if user doesn't have direct access to existing AKS cluster. |
string HelmBinaryPath = "C:\ProgramData\chocolatey\bin\helm.exe" | Y | N | N | Path to helm binary for AKS deployment. |
int AksPoolSize = 2 | Y | N | N | Size of AKS node pool, two nodes are recommended for reliability, however a minimum of one can be used to save COGS. |
bool DebugLogging = false | Y | N | N | Prints all log information. |
string PostgreSqlServerName | Y | Y | N | Name of existing postgresql server. |
string KeyVaultName | Y | Y | N | Name of an existing key vault. |
bool CrossSubscriptionAKSDeployment | Y | N | N | AKS cluster is in a different subscription than the storage account, so a keyvault and storage key will be used for storage auth for AKS. |
string BatchPrefix | Y | Y | N | An identifier used as part of the prefix for batch pool and job names. Used to enable sharing batch accounts. |
string AadGroupIds | Y | N | N | This flag sets an AAD group to be used when authenticating to the AKS cluster. This is not required but considered a best practice if users will be directly connecting to the cluster to troubleshoot. See: https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-identity#use-azure-active-directory-azure-ad |
string AzureCloudName | Y | Y | N | Sets the Azure cloud to configure CoA with. Options are "AzureCloud" (default), "AzureUSGovernment", "AzureChinaCloud" |
string IdentityResourceId | Y | N | N | Allows you to use a pre-existing user-assigned managed identity, by specifying its resource ID, like "/subscriptions/12345678-abcd-1234-efgh-ijklmnopqrst/resourcegroups/example-rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/example-identity" |
The following are more advanced configuration parameters:
Configuration parameter | Has default | Validated | Used by update | Comment |
---|---|---|---|---|
string VnetAddressSpace = "10.1.0.0/16" | Y | N | N | Total address space for CoA vnet. |
string VmSubnetAddressSpace = "10.1.0.0/24" | Y | N | N | Address space for compute, VM or AKS. |
string KubernetesServiceCidr = "10.1.4.0/22" | Y | N | N | Address space for kubernetes system services, must not overlap with any subnets. |
string KubernetesDnsServiceIP = "10.1.4.10" | Y | N | N | Kubernetes DNS service IP Address. |
string KubernetesDockerBridgeCidr = "172.17.0.1/16" | Y | N | N | Kubernetes dock bridge Cidr. |
To choose a specific Cromwell version, you can specify the version as a configuration parameter before deploying Cromwell on Azure. Here is an example:
.\deploy-cromwell-on-azure.exe --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string> --CromwellVersion 53
This version will persist through future updates until you set it again or revert to the default behavior by specifying --CromwellVersion ""
. See note below.
After deployment, you can still change the Cromwell docker image version being used.
Cromwell on Azure version 2.x
Run the deployer in update mode and specify the new Cromwell version.
.\deploy-cromwell-on-azure.exe --Update true --SubscriptionId <Your subscription ID> --ResourceGroupName <Your RG> --VmPassword <Your VM password> --CromwellVersion 54
The new version will persist through future updates until you set it again.
To revert to the default Cromwell version that is shipped with each deployer version, specify --CromwellVersion ""
.
Be aware of compatibility issues if downgrading the version.
The default version is listed here.
Cromwell on Azure version 1.x
Log on to the host VM using the ssh connection string as described in the instructions. Replace image name with the tag of your choice for the "cromwell" service in the docker-compose.yml
file.
cd /data/cromwellazure/
sudo nano docker-compose.yml
# Modify the cromwell service image name and save the file
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run sudo reboot
. or run sudo reboot
. You can also restart the docker containers.
-
Add the VM identity as a Contributor to the Storage Account via Azure Portal or Azure CLI.
-
Navigate to the "configuration" container in the default storage account. Replace the values below with your Storage Account and Container names and add the line to the end of the
containers-to-mount
file:/yourstorageaccountname/yourcontainername
-
Save the changes and restart the VM
This is applicable if the VM and storage account are in different Azure tenants, or if you want to use SAS token anyway for security reasons
-
Add a SAS URL for your desired container to the end of the
containers-to-mount
file. The SAS token can be at the account or container level and may be read-only or read-write depending on the usage.https://<yourstorageaccountname>.blob.core.windows.net/<yourcontainername>?<sastoken>
-
Save the changes and restart the VM
In both cases, the specified containers will be mounted as /yourstorageaccountname/yourcontainername/
on the Cromwell server. You can then use /yourstorageaccountname/yourcontainername/path
in the trigger, WDL, CWL, inputs and workflow options files.
Use a batch account for which I have already requested or received increased cores quota from Azure Support
Log on to the host VM using the ssh connection string as described in the instructions.
Cromwell on Azure version 2.x
Replace BatchAccountName
variable in the env-01-account-names.txt
file with the name of the desired batch account and save your changes.
cd /data/cromwellazure/
sudo nano env-01-account-names.txt
# Modify the BatchAccountName to your Batch Account name and save the file
Cromwell on Azure version 1.x
Replace BatchAccountName
environment variable for the "tes" service in the docker-compose.yml
file with the name of the desired batch account and save your changes.
cd /data/cromwellazure/
sudo nano docker-compose.yml
# Modify the BatchAccountName to your Batch Account name and save the file
To allow the host VM to use a batch account, add the VM identity as a Contributor to the Azure batch account via Azure Portal or Azure CLI.
To allow the host VM to read prices and information about types of machines available for the batch account, add the VM identity as a Billing Reader to the subscription with the configured Batch Account.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run sudo reboot
. or run sudo reboot
.
Cromwell on Azure supports private Docker images for your WDL tasks hosted on Azure Container Registry or ACR. You can choose between two access modes for authenticating to your Azure Container Registry (ACR).
- Enabling anonymous pull access to your ACR (the easiest way to get started)
- Setting up authentication using the Cromwell on Azure managed identity
Cromwell on Azure will pull your images via anonymous pull access, see Make your container registry content publicly available for more information. To enable this for your CoA deployment:
- Ensure you are logged into the Azure cli,
az login
oraz login --tenant mytenant.domain.name
- Enable anonymous pull access for each ACR you need CoA to access,
az acr update --name myACRName --anonymous-pull-enabled
Cromwell on Azure will authenticate access to your ACR using its managed identity. Note that as of CoA <4.3/4.4 CoA will use a docker-in-docker mode to run your docker images so you will need to supply a deployment parameter to update your CoA deployment. This requirement is likely to go away in future versions of CoA. Before beginning note your CoA managed name. The managed identity is the resource created in your CoA deployment. If you want multiple CoA deployments to access your ACR you will need to repeat these steps for each CoA deployment/each ACR you want access to.
- Add the CoA managed identity to your Azure Container Registry. See this tutorial for how to perform role assignments, Assign Azure roles using the Azure portal. You can do this in the Azure Portal or via the Azure CLI. The steps here are roughly:
- Go to the Azure Container Registry in the Azure Portal
- Select
Access control (IAM)
and clickAdd
,Add role assignment
- Select
Contributor
from thePrivileged administrator roles
(note the default isJob function roles
, so you won't seeContributor
) - Hit next and then click
Select members
and add the name of your managed identity - Click next and make sure that you are able to successfully add the managed identity as a
Contributor
- Go to your ACR in the Auzre Portal and select
Access keys
(under Settings). And make sureAdmin user
is set to "Enabled" - Next you will need to add a default docker image to run your docker images in. This is a bit of a hold over, but if this is not done you may see errors such as
/usr/local/bin/docker-entrypoint.sh: exec: line 61: /bin/bash: not found
when you attempt to run your jobs.- Create the docker image from CoA dockerfile
- Upload the generated docker image to your private ACR. Make a note of the image name, it should be of the form
{acr_hostname}.azurecr.io/{image_name}
- Run the CoA deployer with
--update true --DockerInDockerImageName {acr_hostname}.azurecr.io/{image_name}
. This should configure your CoA instance to use this default docker image. - You can validate that this worked by going to the
tes
kubernettes workload, looking for the value ofNodeImages__Docker
in your YAML. Or you can retrieve this value by first setting up your kubectl environment (see here), and then runningkubectl get deployment.apps tes -n coa -o yaml | grep -A 1 NodeImages__Docker
which should give you something like:
- name: NodeImages__Docker
value: cromwellonazuretestacr.azurecr.io/docker/docker:01
Configure my Cromwell on Azure instance to always use dedicated batch VMs to avoid getting preempted
By default, your workflows will run on low priority Azure batch nodes.
If you prefer to use dedicated Azure Batch nodes for all tasks, do the following:
Cromwell on Azure version 2.x
In file cromwell-application.conf
, in the configuration
container in the default storage account, in backend section, change preemptible: true
to preemptible: false
. Save your changes and restart the VM.
Note that you can override this setting for each task individually by setting the preemptible
boolean flag to true
or false
in the "runtime" attributes section of your task.
Cromwell on Azure version 1.x
Log on to the host VM using the ssh connection string as described in the instructions. Change the UsePreemptibleVmsOnly
environment variable for the "tes" service to "false" in the docker-compose.yml
file and save your changes.
cd /data/cromwellazure/
sudo nano docker-compose.yml
# Modify UsePreemptibleVmsOnly to false and save the file
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run sudo reboot
.
Cromwell is run in server mode on the Linux host VM. After logging in to the host VM, it can be accessed via curl as described below:
Get all workflows
curl -X GET "http://localhost:8000/api/workflows/v1/query" -H "accept: application/json"
Get specific workflow's status by id
curl -X GET "http://localhost:8000/api/workflows/v1/{id}/status" -H "accept: application/json"
Get call-caching difference between two workflow calls
curl -X GET "http://localhost:8000/api/workflows/v1/callcaching/diff?workflowA={workflowId1}&callA={workflowName.callName1}&workflowB={workflowId2}&callB={workflowName.callName2}" -H "accept: application/json"
You can perform other Cromwell API calls following a similar pattern. To see all available API endpoints, see Cromwell's REST API here
Added in v5.3.3.
This applies to both ga4gh-tes
and CromwellOnAzure
deployments
By default, the Content-MD5
property is not set on blobs uploaded from tasks, due to the costs in time, costs, and traffic/server loading. However, if your file workflow requires that property to be set, follow these instructions.
- Either in the HELM
values.yaml
when initially deploying (advanced) or theaksValues.yaml
blob in theconfiguration
container in the deployment's storage account, locate theconfig.batchNodes.contentMD5
value and change its value from "false" to "true" (case does not matter in the value). Save the file/blob. - If you updated the
aksValues.yaml
blob (make sure the change was saved in the storage account), rerun the same version's deployer with the--Update true
argument (along with the required--SubscriptionId
,--ResourceGroupName
, and any others possibly required by your deployment).
Now, all future tasks started after this update will set that property on all uploaded blobs in Azure Storage. To reverse this, repeat the instructions above for the aksValues.yaml
blob, restoring its value to "false".
To search, expand the Pages section above.