Skip to content

Commit

Permalink
Harmonise documentation for hybrid cloud execution (#5362) [ci fast]
Browse files Browse the repository at this point in the history
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Co-authored-by: Christopher Hakkaart <chris.hakkaart@seqera.io>
  • Loading branch information
adamrtalbot and christopher-hakkaart authored Oct 18, 2024
1 parent cf0f969 commit a69407d
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 31 deletions.
41 changes: 19 additions & 22 deletions docs/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -408,44 +408,41 @@ To do that, first create a **Job Definition** in the AWS Console (or by other me
process.container = 'job-definition://your-job-definition-name'
```

### Pipeline execution

The pipeline can be launched either in a local computer or an EC2 instance. The latter is suggested for heavy or long-running workloads.

Pipeline input data can be stored either locally or in an [S3](https://aws.amazon.com/s3/) bucket. The pipeline execution must specify an S3 bucket to store intermediate results with the `-bucket-dir` (`-b`) command line option. For example:

```bash
nextflow run my-pipeline -bucket-dir s3://my-bucket/some/path
```

:::{warning}
The bucket path should include at least a top level directory name, e.g. `s3://my-bucket/work` rather than `s3://my-bucket`.
:::

### Hybrid workloads

Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to AWS Batch.

To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example:

```groovy
aws {
region = 'eu-west-1'
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
}
process {
withLabel: bigTask {
executor = 'awsbatch'
queue = 'my-batch-queue'
container = 'my/image:tag'
}
}
aws {
region = 'eu-west-1'
}
```

With the above configuration, processes with the `bigTask` {ref}`process-label` will run on AWS Batch, while the remaining processes with run in the local computer.
With the above configuration, processes with the `bigTask` {ref}`process-label` will run on AWS Batch, while the remaining processes will run in the local computer.

Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path for the jobs computed with AWS Batch and, optionally, the -work-dir to specify the local storage for the jobs computed locally:

```bash
nextflow run <script or project name> -bucket-dir s3://my-bucket/some/path
```

:::{warning}
The AWS S3 path needs to contain at least one sub-directory (e.g. `s3://my-bucket/work` rather than `s3://my-bucket`).
:::

:::{note}
Nextflow will automatically manage the transfer of input and output files between the local and cloud environments when using hybrid workloads.
:::

### Volume mounts

Expand Down
11 changes: 7 additions & 4 deletions docs/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,9 +407,9 @@ Batch Authentication with Shared Keys does not allow to link external resources

### Hybrid workloads

Nextflow allows the use of multiple executors in the same workflow application. This feature lets you deploy hybrid workloads, where some jobs run on the local computer or local computing cluster, while others are offloaded to Azure Batch.
Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to Azure Batch.

To enable this feature, configure one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the Azure Batch settings to the processes you want to offload. For example:
To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the Azure Batch configuration to the subset of processes that you want to offload. For example:

```groovy
process {
Expand All @@ -433,9 +433,9 @@ azure {
}
```

With the above configuration, processes with the bigTask {ref}`process-label` run on Azure Batch, while the remaining processes run on the local computer.
With the above configuration, processes with the bigTask {ref}`process-label` will run on Azure Batch, while the remaining processes will run on the local computer.

Next, launch the pipeline with the `-bucket-dir` option to specify an Azure Blob Storage path for the jobs running on Azure Batch, and optionally, use the `-work-dir` option to specify local storage for the jobs running locally:
Then launch the pipeline with the `-bucket-dir` option to specify an Azure Blob Storage path for the jobs computed with Azure Batch, and optionally, use the `-work-dir` option to specify the local storage for the jobs computed locally:

```bash
nextflow run <script or project name> -bucket-dir az://my-container/some/path
Expand All @@ -445,6 +445,9 @@ nextflow run <script or project name> -bucket-dir az://my-container/some/path
The Azure Blob Storage path needs to contain at least one sub-directory (e.g. `az://my-container/work` rather than `az://my-container`).
:::

:::{note}
Nextflow will automatically manage the transfer of input and output files between the local and cloud environments when using hybrid workloads.

:::{tip}
When using [Fusion](./fusion.md), the `-bucket-dir` option is not required. Fusion implements a distributed virtual file system that allows seamless access to Azure Blob Storage using a standard POSIX interface, enabling direct mounting of remote blob storage as if it were a local file system. This simplifies and speeds up most operations, bridging the gap between cloud-native storage and data analysis workflows.
:::
Expand Down
15 changes: 10 additions & 5 deletions docs/google.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,25 +398,26 @@ For an exhaustive list of error codes, refer to the official Google Life Science

### Hybrid execution

Nextflow allows the use of multiple executors in the same workflow. This feature enables the deployment of hybrid workloads, in which some jobs are executed in the local computer or local computing cluster, and some jobs are offloaded to Google Life Sciences.
Nextflow allows the use of multiple executors in the same workflow. This feature enables the deployment of hybrid workloads, in which some jobs are executed in the local computer or local computing cluster, and some jobs are offloaded to Google Cloud (either Google Batch or Google Life Sciences).

To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration file to apply the Google Life Sciences executor to the subset of processes that you want to offload. For example:
To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration file to apply the Google Cloud executor to the subset of processes that you want to offload. For example:

```groovy
process {
withLabel: bigTask {
executor = 'google-lifesciences'
executor = 'google-batch' // or 'google-lifesciences'
container = 'my/image:tag'
}
}
google {
project = 'your-project-id'
zone = 'europe-west1-b'
location = 'us-central1' // for Google Batch
// zone = 'us-central1-a' // for Google Life Sciences
}
```

Then launch the pipeline with the `-bucket-dir` option to specify a Google Storage path for the jobs computed with Google Life Sciences and, optionally, the `-work-dir` to specify the local storage for the jobs computed locally:
Then launch the pipeline with the `-bucket-dir` option to specify a Google Storage path for the jobs computed with Google Cloud and, optionally, the `-work-dir` to specify the local storage for the jobs computed locally:

```bash
nextflow run <script or project name> -bucket-dir gs://my-bucket/some/path
Expand All @@ -426,6 +427,10 @@ nextflow run <script or project name> -bucket-dir gs://my-bucket/some/path
The Google Storage path needs to contain at least one sub-directory (e.g. `gs://my-bucket/work` rather than `gs://my-bucket`).
:::

:::{note}
Nextflow will automatically manage the transfer of input and output files between the local and cloud environments when using hybrid workloads.
:::

### Limitations

- Compute resources in Google Cloud are subject to [resource quotas](https://cloud.google.com/compute/quotas), which may affect your ability to run pipelines at scale. You can request quota increases, and your quotas may automatically increase over time as you use the platform. In particular, GPU quotas are initially set to 0, so you must explicitly request a quota increase in order to use GPUs. You can initially request an increase to 1 GPU at a time, and after one billing cycle you may be able to increase it further.
Expand Down

0 comments on commit a69407d

Please sign in to comment.