Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disk directive in Azure Batch #5120

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Jul 8, 2024

This PR adds the process.disk directive support to Azure Batch. After adding this directive, Azure Batch will:

  • Factor disk and VM disk size into the number of slots a single task occupies
  • Create autopools that use VMs with sufficiently large enough storage disks for a single Azure Batch task
  • When using autopools, select the node with the fewest features (likely to be the cheapest)

This should help with the recurring issue where Azure Batch nodes run out of disk space because too many tasks are packed onto a single machine. Since we don't support using a network attached disk as the working directory this is the only solution to handling larger files.

Similarly, when using Fusion we can ringfence a larger fraction of the local storage drive to ensure the local cache is not overloaded.

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Changes:
 - Azure calculates autoPool while including resourceDiskSizeGB in the calculation
 - Determines best fit for VM type using criteria of vCPUs, memory, and resourceDiskSizeGB

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot adamrtalbot requested a review from a team as a code owner July 8, 2024 10:02
Copy link

netlify bot commented Jul 8, 2024

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit fda939a
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6759949357e9dd0008bbd1d0
😎 Deploy Preview https://deploy-preview-5120--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot adamrtalbot requested a review from bentsherman July 8, 2024 10:17
@adamrtalbot
Copy link
Collaborator Author

If anyone fancies being a rubber duck, I can't work out why the best match for northeurope is a Standard_B4ls_v2 instead of a Basic_A3 now, especially when a Standard_B4ls_v2 has zero disk attached!

@pditommaso pditommaso marked this pull request as draft July 8, 2024 13:49
…rviceTest.groovy

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Adds support for disk directive in Azure Batch tasks by considering disk requirements
when calculating VM slots and selecting VM types. Disk requirements are now factored
into VM scoring alongside CPU and memory.

Modifies the VM scoring algorithm to include disk requirements and weight for CPUs more than memory or disk.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot adamrtalbot marked this pull request as ready for review December 2, 2024 18:42
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@bentsherman bentsherman changed the title 4920: Azure Batch supports process.disk directive Support disk directive in Azure Batch (#4920) Dec 4, 2024
@bentsherman bentsherman changed the title Support disk directive in Azure Batch (#4920) Support disk directive in Azure Batch Dec 4, 2024
@bentsherman bentsherman linked an issue Dec 4, 2024 that may be closed by this pull request
@adamrtalbot
Copy link
Collaborator Author

@bentsherman after many long hours and mental calculations about slots, I've finally finished this one.

The only problem I've found is it doesn't account for existing the funnel for cacheing the pool after creation. Not sure how necessary this is but it currently has no awareness of the cpus, memory and disk, just the name. But I wonder if this is indicating I'm missing something?

docs/azure.md Outdated Show resolved Hide resolved
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Changes:
 - Include the inverse length of the VM name as a modifier for the score
 - Shorter names are preferred, they have fewer features and are generally cheaper
 - In doing this, I had to switch from a TreeMap to a List of Tuples because TreeMap was only considering the final set of similar VMs

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure Batch: Add disk size to slots calculation
2 participants