Skip to content

Commit

Permalink
Merge pull request #52 from AlexsLemonade/jashapiro/readme-gha
Browse files Browse the repository at this point in the history
  • Loading branch information
jashapiro authored May 29, 2024
2 parents 368b91f + 298d68c commit b4940c7
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 6 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ repos:
hooks:
- id: typos
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.2
rev: v0.4.6
hooks:
- id: ruff-format
- repo: https://github.com/lorenzwalthert/precommit
Expand All @@ -41,5 +41,5 @@ repos:
hooks:
- id: prettier
ci:
autofix_prs: false
autofix_prs: true
autoupdate_schedule: quarterly
68 changes: 64 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,62 @@ See https://github.com/AlexsLemonade/OpenScPCA-admin/blob/main/technical-docs/ne

The workflow is currently set up to run best via AWS batch, but some testing may work locally.
You will need to have appropriate AWS credentials set up to run the workflow on AWS and access the data files.
Further instructions for this will be added in the future, and we expect this to be run via a GitHub Action for most use cases.
In general, you must have `workload` access in an OpenScPCA AWS account to run the workflow.

The following base command will run the workflow, assuming all AWS permissions are set up correctly:
### Running the workflow from GitHub Actions

The most common way to run the workflow will be to run the GitHub Action (GHA) responsible for running the workflow.
The GHA is run automatically when a new release tag is created or by manually triggering the workflow.

The GHA that runs the workflow uses the [Batch CodeDeploy workflow](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml) to send an AWS CodeDeploy action to the `Nextflow-workload` instance in the OpenScPCA AWS account.
This will launch the Nextflow workflow on AWS Batch by running the the [run_workflow.sh](scripts/run_nextflow.sh) script in a tmux session on the `Nextflow-workload` instance.
Using tmux allows the workflow to run in the background and be monitored by logging into the instance.

The GHA workflow will run automatically when a new release tag is created, which will include the following steps:

1. Run the workflow using the `simulate` entry point to create simulated SCE objects for the OpenScPCA project.
2. Run the main workflow using the simulated data.
3. Run the main workflow using the real ScPCA data.
4. Upload all Nextflow logs, traces, and html run reports to `s3://openscpca-nf-data/logs/full/`, organized by date.

Alternatively, manual launches of the GHA workflow can be triggered by a [`workflow_dispatch` trigger](https://github.com/AlexsLemonade/OpenScPCA-nf/actions/workflows/run-batch.yml), which will allow you to specify a specific run mode.
The run modes available are:

- `test`: runs only a simple test workflow to check configuration
- `simulated`: runs the workflow using simulated data
- `scpca`: runs the workflow using the current ScPCA data release
- `full`: simulates data based on the current ScPCA data release, then runs the workflow using the simulated data and current ScPCA data release (this is same as the behavior of the automatic release workflow)

For each run, all Nextflow logs, traces, and html run reports will be uploaded to `s3://openscpca-nf-data/logs/{run_mode}/`, organized by date of the run.

### Running the workflow manually

Alternatively, you can run the workflow locally.
The following base command will run the main workflow, assuming all AWS permissions are set up correctly:

```bash
nextflow run AlexsLemonade/OpenScPCA-nf -profile batch
```

For most use cases you will want to use the `--results_bucket` argument to avoid writing to the default output bucket.
Note that despite the name, this can be a local directory as well as an S3 bucket.
For an S3 bucket, the format should be `s3://bucket-name/path/to/results/`.

```bash
nextflow run AlexsLemonade/OpenScPCA-nf -profile batch --results_bucket {OUTDIR}
```

### Profiles

To run the workflow with simulated data, you can add the `simulated` profile.
As with the main workflow, you will want to specify an output directory for the simulated results with the `--results_bucket` argument.

```bash
nextflow run AlexsLemonade/OpenScPCA-nf -profile batch,simulated --results_bucket {SIM_RESULTS_DIR}
```

### Entry points

The workflow also has a couple of entry points other than the main workflow, for testing and creating simulated data.

To run a test version of the workflow to check permissions and infrastructure setup:
Expand All @@ -24,12 +72,24 @@ To run a test version of the workflow to check permissions and infrastructure se
nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry test
```

To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command:
To run the workflow that creates simulated SCE objects for the OpenScPCA project, you can use the following command, which specifies running the workflow with the `simulate` entry point.
Note that you will need to specify the directory for the simulation output using the `--sim_pubdir` argument, as the default output bucket is not writeable except by a few specific roles:

```bash
nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate
nextflow run AlexsLemonade/OpenScPCA-nf -profile batch -entry simulate --sim_pubdir {SIMDIR}
```

### Stub runs

All of the above commands will run the complete workflow processes.
To test the general logic of the workflow without running the full workflow you can use a stub run by including the `-stub` argument and `-profile stub`.

```bash
nextflow run AlexsLemonade/OpenScPCA-nf -stub -profile stub
```

This version of the workflow is run for every pull request to the `main` branch.

## Repository setup

This repository uses [`pre-commit`](https://pre-commit.com) to enforce code style and formatting.
Expand Down

0 comments on commit b4940c7

Please sign in to comment.