-
Notifications
You must be signed in to change notification settings - Fork 54
Known Issues and Mitigation
There is currently a bug (which we are tracking) in a dependency tool we use to get files from Azure Storage to the VM to perform a task. For now, follow these steps as a workaround if you are running into errors getting access to your files using SAS tokens on Cromwell on Azure. If you followed these instructions to create a SAS URL, you’ll get something similar to
https://YourStorageAccount.blob.core.windows.net/inputs?sv=2018-03-28si=inputs-key&sr=c&sig=somestring
Focus on this part: si=inputs-key&sr=c
Manually change order of sr
and si
fields to get something similar to
https://YourStorageAccount.blob.core.windows.net/inputs?sv=2018-03-28&sr=c&si=inputs-keysig=somestring
After the change, sr=c&si=inputs-key should be the order in your SAS URL.
Update all the SAS URLs similarly and retry your workflow.
All TES tasks for my workflow are done running, but the trigger JSON file is still in the "inprogress" directory in the workflows container
- The root cause is most likely memory pressure on the host Linux VM because blobfuse processes grow to consume all physical memory.
You may see the following Cromwell container logs as a symptom:
Cromwell shutting down because it cannot access the database): Shutting down cromid-5bd1d24 as at least 15 minutes of heartbeat write errors have occurred between 2020-02-18T22:03:01.110Z and 2020-02-18T22:19:01.111Z (16.000016666666667 minutes
To mitigate, please resize your VM in the resource group to a machine with at least 14GB memory/RAM. Any workflows still in progress will not be affected.
- Another possible scenario is that the "mysql" database is in an unusable state, which means Cromwell cannot continue processing workflows.
You may see the following Cromwell container logs as a symptom:
Failed to instantiate Cromwell System. Shutting down Cromwell. liquibase.exception.LockException: Could not acquire change log lock. Currently locked by 012ec19c3285 (172.18.0.4) since 2/19/20 4:10 PM
Note: This has been fixed in Release 2.1. If you use the 2.1 deployer or update to this version, you can skip the mitigation steps below
For Release 2.0 and below: To mitigate, log on to the host VM and execute the following and then restart the VM:
sudo docker exec -it cromwellazure_mysqldb_1 bash -c 'mysql -ucromwell -Dcromwell_db -pcromwell -e"SELECT * FROM DATABASECHANGELOGLOCK;UPDATE DATABASECHANGELOGLOCK SET LOCKED=0, LOCKGRANTED=null, LOCKEDBY=null where ID=1;SELECT * FROM DATABASECHANGELOGLOCK;"'
Trigger JSON file for my workflow stays in the "new" directory in the workflows container and no task is started
The root cause is most likely failing MySQL upgrade.
You may see the following Cromwell container logs:
Failed to instantiate Cromwell System. Shutting down Cromwell. java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 15000ms.
and MySQL container logs as a symptom:
Upgrade is not supported after a crash or shutdown with innodb_fast_shutdown = 2.
To mitigate, follow instructions on MySQL update.
To search, expand the Pages section above.