From f37925e2ba00bd978d751436147fddfbd709d858 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 22 May 2024 09:49:22 -0400 Subject: [PATCH 1/8] Update readme with GHA instructions and more detail --- README.md | 48 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b9e54f0..480c542 100644 --- a/README.md +++ b/README.md @@ -8,14 +8,47 @@ See https://github.com/AlexsLemonade/OpenScPCA-admin/blob/main/technical-docs/ne The workflow is currently set up to run best via AWS batch, but some testing may work locally. You will need to have appropriate AWS credentials set up to run the workflow on AWS and access the data files. -Further instructions for this will be added in the future, and we expect this to be run via a GitHub Action for most use cases. +In general, you must have `workload` access in an OpenScPCA AWS account to run the workflow. -The following base command will run the workflow, assuming all AWS permissions are set up correctly: +### Running the workflow from GitHub Actions + +The most common way to run the workflow will be through GitHub Actions (GHA), using the [Batch CodeDeploy workflow](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml). +This will send an AWS CodeDeploy action to the `Nextflow-workload` instance in the OpenScPCA AWS account, which will launch the Nextflow workflow on AWS Batch. +The script that launches the workflow is [run_workflow.sh](scripts/run_nextflow.sh), which is run in a tmux session on the `Nextflow-workload` instance, allowing the workflow to run in the background and be monitored by logging into the instance. + +The GHA workflow will run automatically when a new release tag is created, which will include the following steps: + +1. Run the `simulate` entry point to create simulated SCE objects for the OpenScPCA project. +2. Run the main workflow using the simulated data. +3. Run the main workflow using the real ScPCA data. +4. Upload all Nextflow logs, traces, and html run reports to `s3://openscpca-nf-data/logs/full/`, organized by date. + +Alternatively, manual launches of the GHA workflow can be triggered by a [`workflow_dispatch` trigger](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml), which will allow you to specify a specific run mode. +The run modes available are: + +- `test`: runs only a simple test workflow to check configuration +- `simulated`: runs the workflow using simulated data +- `scpca`: runs the workflow using the current ScPCA data release +- `full`: simulates data based on the current ScPCA datarelease, then runs the workflow using the simulated data and current ScPCA data release (this is same as the behavior of the automatic release workflow) + +For each run, all Nextflow logs, traces, and html run reports will be uploaded to `s3://openscpca-nf-data/logs/{run_mode}/`, organized by date of the run. + +### Running the workflow manually + +The following base command will run the main workflow, assuming all AWS permissions are set up correctly: ```bash nextflow run AlexsLemonade/OpenScPCA-nf -profile batch ``` +For most use cases you will want to specify an output directory other than the default using the `--results_bucket` argument. +Note that despite the name, this can be a local directory as well as an S3 bucket. +For an S3 bucket, the format should be `s3://bucket-name/path/to/results/`. + +```bash +nextflow run AlexsLemonade/OpenScPCA-nf -profile batch --results_bucket {OUTDIR} +``` + The workflow also has a couple of entry points other than the main workflow, for testing and creating simulated data. To run a test version of the workflow to check permissions and infrastructure setup: @@ -24,10 +57,17 @@ To run a test version of the workflow to check permissions and infrastructure se nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry test ``` -To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command: +To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, but note that you will need to specify directory for the simulation output, as the default bucket is not generally writeable, even with the correct permissions: + +```bash +nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate --sim_pubdir {SIMDIR} +``` + +To run the workflow with simulated data, you can add the `simulated` profile. +As with the main workflow, you will want to specify an output directory for the simulated results with the `--results_bucket` argument. ```bash -nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate +nextflow run AlexsLemonade/OpenScPCA-nf -profile batch,simulated --results_bucket {SIM_RESULTS_DIR} ``` ## Repository setup From 44edaf91e89a1b75992f6f57faee1ec2696e31ef Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 13:33:52 -0400 Subject: [PATCH 2/8] update wording --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 480c542..d974505 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ To run a test version of the workflow to check permissions and infrastructure se nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry test ``` -To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, but note that you will need to specify directory for the simulation output, as the default bucket is not generally writeable, even with the correct permissions: +To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, but note that you will need to specify directory for the simulation output, as the default output bucket is not writeable except by a few specific roles: ```bash nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate --sim_pubdir {SIMDIR} From d87d0e39037c58dd6ff030427afcdaea2a5f49a5 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 16:05:44 -0400 Subject: [PATCH 3/8] Apply suggestions from code review Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com> --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d974505..349eab0 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ The script that launches the workflow is [run_workflow.sh](scripts/run_nextflow. The GHA workflow will run automatically when a new release tag is created, which will include the following steps: -1. Run the `simulate` entry point to create simulated SCE objects for the OpenScPCA project. +1. Run the workflow using the `simulate` entry point to create simulated SCE objects for the OpenScPCA project. 2. Run the main workflow using the simulated data. 3. Run the main workflow using the real ScPCA data. 4. Upload all Nextflow logs, traces, and html run reports to `s3://openscpca-nf-data/logs/full/`, organized by date. @@ -35,6 +35,7 @@ For each run, all Nextflow logs, traces, and html run reports will be uploaded t ### Running the workflow manually +Alternatively, you can run the workflow locally. The following base command will run the main workflow, assuming all AWS permissions are set up correctly: ```bash @@ -57,7 +58,8 @@ To run a test version of the workflow to check permissions and infrastructure se nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry test ``` -To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, but note that you will need to specify directory for the simulation output, as the default output bucket is not writeable except by a few specific roles: +To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, which specifies running the workflow with the `simulate` entry point. +Note that you will need to specify the directory for the simulation output using the `--sim_pubdir` argument, as the default output bucket is not writeable except by a few specific roles: ```bash nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate --sim_pubdir {SIMDIR} From d9bf9c74faf4bb2464607b5a7ab2b0e44e4e6cf0 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 16:05:54 -0400 Subject: [PATCH 4/8] Update README.md Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com> --- README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 349eab0..3b99191 100644 --- a/README.md +++ b/README.md @@ -12,9 +12,12 @@ In general, you must have `workload` access in an OpenScPCA AWS account to run t ### Running the workflow from GitHub Actions -The most common way to run the workflow will be through GitHub Actions (GHA), using the [Batch CodeDeploy workflow](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml). -This will send an AWS CodeDeploy action to the `Nextflow-workload` instance in the OpenScPCA AWS account, which will launch the Nextflow workflow on AWS Batch. -The script that launches the workflow is [run_workflow.sh](scripts/run_nextflow.sh), which is run in a tmux session on the `Nextflow-workload` instance, allowing the workflow to run in the background and be monitored by logging into the instance. +The most common way to run the workflow will be to run the GitHub Action (GHA) responsible for running the workflow. +The GHA is run automatically when a new release tag is created or by manually triggering the workflow. + +The GHA that runs the workflow uses the [Batch CodeDeploy workflow](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml) to send an AWS CodeDeploy action to the `Nextflow-workload` instance in the OpenScPCA AWS account. +This will launch the Nextflow workflow on AWS Batch by running the the [run_workflow.sh](scripts/run_nextflow.sh) script in a tmux session on the `Nextflow-workload` instance. +Using tmux allows the workflow to run in the background and be monitored by logging into the instance. The GHA workflow will run automatically when a new release tag is created, which will include the following steps: From be536353aa85071c2f0fb395776e3d895e31307f Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 16:11:03 -0400 Subject: [PATCH 5/8] let pre-commit fix things automatically --- .pre-commit-config.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index e88c16b..a6324e9 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -28,7 +28,7 @@ repos: hooks: - id: typos - repo: https://github.com/astral-sh/ruff-pre-commit - rev: v0.4.2 + rev: v0.4.6 hooks: - id: ruff-format - repo: https://github.com/lorenzwalthert/precommit @@ -41,5 +41,5 @@ repos: hooks: - id: prettier ci: - autofix_prs: false + autofix_prs: true autoupdate_schedule: quarterly From 4447690a1e8bb97bab2bdd4fdc3382f325839959 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 16:18:13 -0400 Subject: [PATCH 6/8] reorganize and add headers --- README.md | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 3b99191..ae1cad5 100644 --- a/README.md +++ b/README.md @@ -12,11 +12,11 @@ In general, you must have `workload` access in an OpenScPCA AWS account to run t ### Running the workflow from GitHub Actions -The most common way to run the workflow will be to run the GitHub Action (GHA) responsible for running the workflow. -The GHA is run automatically when a new release tag is created or by manually triggering the workflow. +The most common way to run the workflow will be to run the GitHub Action (GHA) responsible for running the workflow. +The GHA is run automatically when a new release tag is created or by manually triggering the workflow. The GHA that runs the workflow uses the [Batch CodeDeploy workflow](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml) to send an AWS CodeDeploy action to the `Nextflow-workload` instance in the OpenScPCA AWS account. -This will launch the Nextflow workflow on AWS Batch by running the the [run_workflow.sh](scripts/run_nextflow.sh) script in a tmux session on the `Nextflow-workload` instance. +This will launch the Nextflow workflow on AWS Batch by running the the [run_workflow.sh](scripts/run_nextflow.sh) script in a tmux session on the `Nextflow-workload` instance. Using tmux allows the workflow to run in the background and be monitored by logging into the instance. The GHA workflow will run automatically when a new release tag is created, which will include the following steps: @@ -38,14 +38,14 @@ For each run, all Nextflow logs, traces, and html run reports will be uploaded t ### Running the workflow manually -Alternatively, you can run the workflow locally. +Alternatively, you can run the workflow locally. The following base command will run the main workflow, assuming all AWS permissions are set up correctly: ```bash nextflow run AlexsLemonade/OpenScPCA-nf -profile batch ``` -For most use cases you will want to specify an output directory other than the default using the `--results_bucket` argument. +For most use cases you will want to use the `--results_bucket` argument to avoid writing to the default output bucket. Note that despite the name, this can be a local directory as well as an S3 bucket. For an S3 bucket, the format should be `s3://bucket-name/path/to/results/`. @@ -53,6 +53,17 @@ For an S3 bucket, the format should be `s3://bucket-name/path/to/results/`. nextflow run AlexsLemonade/OpenScPCA-nf -profile batch --results_bucket {OUTDIR} ``` +### Profiles + +To run the workflow with simulated data, you can add the `simulated` profile. +As with the main workflow, you will want to specify an output directory for the simulated results with the `--results_bucket` argument. + +```bash +nextflow run AlexsLemonade/OpenScPCA-nf -profile batch,simulated --results_bucket {SIM_RESULTS_DIR} +``` + +### Entry points + The workflow also has a couple of entry points other than the main workflow, for testing and creating simulated data. To run a test version of the workflow to check permissions and infrastructure setup: @@ -61,20 +72,24 @@ To run a test version of the workflow to check permissions and infrastructure se nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry test ``` -To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, which specifies running the workflow with the `simulate` entry point. +To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, which specifies running the workflow with the `simulate` entry point. Note that you will need to specify the directory for the simulation output using the `--sim_pubdir` argument, as the default output bucket is not writeable except by a few specific roles: ```bash nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate --sim_pubdir {SIMDIR} ``` -To run the workflow with simulated data, you can add the `simulated` profile. -As with the main workflow, you will want to specify an output directory for the simulated results with the `--results_bucket` argument. +### Stub runs + +All of the above commands will run the complete workflow processes. +To test the general logic of the workflow without , you can use a stub run by including the `-stub` argument and `-profile stub`. ```bash -nextflow run AlexsLemonade/OpenScPCA-nf -profile batch,simulated --results_bucket {SIM_RESULTS_DIR} +nextflow run AlexsLemonade/OpenScPCA-nf -stub -profile stub ``` +This version of the workflow is run for every pull request to the `main` branch. + ## Repository setup This repository uses [`pre-commit`](https://pre-commit.com) to enforce code style and formatting. From 5307b32e54e1f95d5a5b6492ec383ba5972710df Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 18:49:32 -0400 Subject: [PATCH 7/8] Update README.md Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ae1cad5..60f3fbb 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ The run modes available are: - `test`: runs only a simple test workflow to check configuration - `simulated`: runs the workflow using simulated data - `scpca`: runs the workflow using the current ScPCA data release -- `full`: simulates data based on the current ScPCA datarelease, then runs the workflow using the simulated data and current ScPCA data release (this is same as the behavior of the automatic release workflow) +- `full`: simulates data based on the current ScPCA data release, then runs the workflow using the simulated data and current ScPCA data release (this is same as the behavior of the automatic release workflow) For each run, all Nextflow logs, traces, and html run reports will be uploaded to `s3://openscpca-nf-data/logs/{run_mode}/`, organized by date of the run. From 298d68cf034d01abbd202ecf0063fb0019852048 Mon Sep 17 00:00:00 2001 From: Joshua Shapiro Date: Wed, 29 May 2024 18:49:47 -0400 Subject: [PATCH 8/8] Update README.md Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 60f3fbb..a6ab574 100644 --- a/README.md +++ b/README.md @@ -82,7 +82,7 @@ nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate --sim_pub ### Stub runs All of the above commands will run the complete workflow processes. -To test the general logic of the workflow without , you can use a stub run by including the `-stub` argument and `-profile stub`. +To test the general logic of the workflow without running the full workflow you can use a stub run by including the `-stub` argument and `-profile stub`. ```bash nextflow run AlexsLemonade/OpenScPCA-nf -stub -profile stub