Harmonise documentation for hybrid cloud execution (#5362) [ci fast]

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com> Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com> Co-authored-by: Christopher Hakkaart <chris.hakkaart@seqera.io>
nextflow-io · Oct 18, 2024 · a69407d · a69407d
1 parent cf0f969
commit a69407d
Show file tree

Hide file tree

Showing 3 changed files with 36 additions and 31 deletions.
diff --git a/docs/aws.md b/docs/aws.md
@@ -408,44 +408,41 @@ To do that, first create a **Job Definition** in the AWS Console (or by other me
 process.container = 'job-definition://your-job-definition-name'
 ```
 
-### Pipeline execution
-
-The pipeline can be launched either in a local computer or an EC2 instance. The latter is suggested for heavy or long-running workloads.
-
-Pipeline input data can be stored either locally or in an [S3](https://aws.amazon.com/s3/) bucket. The pipeline execution must specify an S3 bucket to store intermediate results with the `-bucket-dir` (`-b`) command line option. For example:
-
-```bash
-nextflow run my-pipeline -bucket-dir s3://my-bucket/some/path
-```
-
-:::{warning}
-The bucket path should include at least a top level directory name, e.g. `s3://my-bucket/work` rather than `s3://my-bucket`.
-:::
-
 ### Hybrid workloads
 
 Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to AWS Batch.
 
 To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the AWS Batch configuration to the subset of processes that you want to offload. For example:
 
 ```groovy
-aws {
-    region = 'eu-west-1'
-    batch {
-        cliPath = '/home/ec2-user/miniconda/bin/aws'
-    }
-}
-
 process {
     withLabel: bigTask {
         executor = 'awsbatch'
         queue = 'my-batch-queue'
         container = 'my/image:tag'
     }
 }
+
+aws {
+    region = 'eu-west-1'
+}
 ```
 
-With the above configuration, processes with the `bigTask` {ref}`process-label` will run on AWS Batch, while the remaining processes with run in the local computer.
+With the above configuration, processes with the `bigTask` {ref}`process-label` will run on AWS Batch, while the remaining processes will run in the local computer.
+
+Then launch the pipeline with the -bucket-dir option to specify an AWS S3 path for the jobs computed with AWS Batch and, optionally, the -work-dir to specify the local storage for the jobs computed locally:
+
+```bash
+nextflow run <script or project name> -bucket-dir s3://my-bucket/some/path
+```
+
+:::{warning}
+The AWS S3 path needs to contain at least one sub-directory (e.g. `s3://my-bucket/work` rather than `s3://my-bucket`).
+:::
+
+:::{note}
+Nextflow will automatically manage the transfer of input and output files between the local and cloud environments when using hybrid workloads.
+:::
 
 ### Volume mounts
 

diff --git a/docs/azure.md b/docs/azure.md
@@ -407,9 +407,9 @@ Batch Authentication with Shared Keys does not allow to link external resources
 
 ### Hybrid workloads
 
-Nextflow allows the use of multiple executors in the same workflow application. This feature lets you deploy hybrid workloads, where some jobs run on the local computer or local computing cluster, while others are offloaded to Azure Batch.
+Nextflow allows the use of multiple executors in the same workflow application. This feature enables the deployment of hybrid workloads in which some jobs are executed in the local computer or local computing cluster and some jobs are offloaded to Azure Batch.
 
-To enable this feature, configure one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the Azure Batch settings to the processes you want to offload. For example:
+To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration to apply the Azure Batch configuration to the subset of processes that you want to offload. For example:
 
 ```groovy
 process {
@@ -433,9 +433,9 @@ azure {
 }
 ```
 
-With the above configuration, processes with the bigTask {ref}`process-label` run on Azure Batch, while the remaining processes run on the local computer.
+With the above configuration, processes with the bigTask {ref}`process-label` will run on Azure Batch, while the remaining processes will run on the local computer.
 
-Next, launch the pipeline with the `-bucket-dir` option to specify an Azure Blob Storage path for the jobs running on Azure Batch, and optionally, use the `-work-dir` option to specify local storage for the jobs running locally:
+Then launch the pipeline with the `-bucket-dir` option to specify an Azure Blob Storage path for the jobs computed with Azure Batch, and optionally, use the `-work-dir` option to specify the local storage for the jobs computed locally:
 
 ```bash
 nextflow run <script or project name> -bucket-dir az://my-container/some/path
@@ -445,6 +445,9 @@ nextflow run <script or project name> -bucket-dir az://my-container/some/path
 The Azure Blob Storage path needs to contain at least one sub-directory (e.g. `az://my-container/work` rather than `az://my-container`).
 :::
 
+:::{note}
+Nextflow will automatically manage the transfer of input and output files between the local and cloud environments when using hybrid workloads.
+
 :::{tip}
 When using [Fusion](./fusion.md), the `-bucket-dir` option is not required. Fusion implements a distributed virtual file system that allows seamless access to Azure Blob Storage using a standard POSIX interface, enabling direct mounting of remote blob storage as if it were a local file system. This simplifies and speeds up most operations, bridging the gap between cloud-native storage and data analysis workflows.
 :::

diff --git a/docs/google.md b/docs/google.md
@@ -398,25 +398,26 @@ For an exhaustive list of error codes, refer to the official Google Life Science
 
 ### Hybrid execution
 
-Nextflow allows the use of multiple executors in the same workflow. This feature enables the deployment of hybrid workloads, in which some jobs are executed in the local computer or local computing cluster, and some jobs are offloaded to Google Life Sciences.
+Nextflow allows the use of multiple executors in the same workflow. This feature enables the deployment of hybrid workloads, in which some jobs are executed in the local computer or local computing cluster, and some jobs are offloaded to Google Cloud (either Google Batch or Google Life Sciences).
 
-To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration file to apply the Google Life Sciences executor to the subset of processes that you want to offload. For example:
+To enable this feature, use one or more {ref}`config-process-selectors` in your Nextflow configuration file to apply the Google Cloud executor to the subset of processes that you want to offload. For example:
 
 ```groovy
 process {
     withLabel: bigTask {
-        executor = 'google-lifesciences'
+        executor = 'google-batch' // or 'google-lifesciences'
         container = 'my/image:tag'
     }
 }
 
 google {
     project = 'your-project-id'
-    zone = 'europe-west1-b'
+    location = 'us-central1' // for Google Batch
+    // zone = 'us-central1-a' // for Google Life Sciences
 }
 ```
 
-Then launch the pipeline with the `-bucket-dir` option to specify a Google Storage path for the jobs computed with Google Life Sciences and, optionally, the `-work-dir` to specify the local storage for the jobs computed locally:
+Then launch the pipeline with the `-bucket-dir` option to specify a Google Storage path for the jobs computed with Google Cloud and, optionally, the `-work-dir` to specify the local storage for the jobs computed locally:
 
 ```bash
 nextflow run <script or project name> -bucket-dir gs://my-bucket/some/path
@@ -426,6 +427,10 @@ nextflow run <script or project name> -bucket-dir gs://my-bucket/some/path
 The Google Storage path needs to contain at least one sub-directory (e.g. `gs://my-bucket/work` rather than `gs://my-bucket`).
 :::
 
+:::{note}
+Nextflow will automatically manage the transfer of input and output files between the local and cloud environments when using hybrid workloads.
+:::
+
 ### Limitations
 
 - Compute resources in Google Cloud are subject to [resource quotas](https://cloud.google.com/compute/quotas), which may affect your ability to run pipelines at scale. You can request quota increases, and your quotas may automatically increase over time as you use the platform. In particular, GPU quotas are initially set to 0, so you must explicitly request a quota increase in order to use GPUs. You can initially request an increase to 1 GPU at a time, and after one billing cycle you may be able to increase it further.