Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
  • Loading branch information
jfy133 authored Sep 12, 2024
1 parent 373c546 commit 0a3181f
Show file tree
Hide file tree
Showing 9 changed files with 30 additions and 54 deletions.
5 changes: 2 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#417](https://github.com/nf-core/taxprofiler/pull/417) - Added reference-free metagenome estimation with Nonpareil (added by @jfy133)
- [#417](https://github.com/nf-core/taxprofiler/pull/417) - Added reference-free metagenome complexity/coverage estimation with Nonpareil (added by @jfy133)
- [#466](https://github.com/nf-core/taxprofiler/pull/466) - Input database sheets now require a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
- [#505](https://github.com/nf-core/taxprofiler/pull/505) - Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
- [#508](https://github.com/nf-core/taxprofiler/pull/508) - Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
- [#511](https://github.com/nf-core/taxprofiler/pull/511) - Add `porechop_abi` as an alternative adapter removal tool for long reads nanopore data (added by @LilyAnderssonLee)
- [#512](https://github.com/nf-core/taxprofiler/pull/512) - Update all tools to the latest version and include nf-test (Updated by @LilyAnderssonLee & @jfy133)
- [#512](https://github.com/nf-core/taxprofiler/pull/512) - Update all tools to the latest version and include nf-test (updated by @LilyAnderssonLee & @jfy133)

### `Fixed`

Expand All @@ -36,7 +36,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| minimap2 | 2.24 | 2.28 |
| motus/profile | 3.0.3 | 3.1.0 |
| multiqc | 1.21 | 1.24.1 |
| nanoq | | 0.10.0 |
| samtools | 1.17 | 1.20 |
| untar | 4.7 | 4.8 |

Expand Down
3 changes: 0 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,6 @@

**nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun short- and long-read metagenomic data. It allows for in-parallel taxonomic identification of reads or taxonomic abundance estimation with multiple classification and profiling tools against multiple databases, and produces standardised output tables for facilitating results comparison between different tools and databases.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/scnanoseq/results).

## Pipeline summary

Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ custom_logo_title: "nf-core/taxprofiler"
run_modules:
- fastqc
- adapterRemoval
- fastp
- nonpareil
- fastp
- nonpareil
- bbduk
- prinseqplusplus
- porechop
Expand Down
14 changes: 7 additions & 7 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ process {
[
path: { "${params.outdir}/porechop" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
pattern: '*_porechop.fastq.gz',
enabled: params.save_preprocessed_reads
],
[
Expand All @@ -266,7 +266,7 @@ process {
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
pattern: '*_porechop.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && params.longread_qc_skipqualityfilter && !params.longread_qc_skipadaptertrim && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
Expand All @@ -279,7 +279,7 @@ process {
[
path: { "${params.outdir}/porechop_abi" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
pattern: '*_porechop_abi.fastq.gz',
enabled: params.save_preprocessed_reads
],
[
Expand All @@ -290,7 +290,7 @@ process {
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
pattern: '*porechop_abi.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && params.longread_qc_skipqualityfilter && !params.longread_qc_skipadaptertrim && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
Expand Down Expand Up @@ -339,18 +339,18 @@ process {
[
path: { "${params.outdir}/nanoq" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
pattern: '*_filtered.fastq.gz',
enabled: params.save_preprocessed_reads
],
[
path: { "${params.outdir}/nanoq" },
mode: params.publish_dir_mode,
pattern: '*.stats'
pattern: '*_filtered.stats'
],
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
pattern: '*_filtered.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && !params.longread_qc_skipqualityfilter && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
Expand Down
6 changes: 3 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [falco](#fastqc) - Alternative to FastQC for raw read QC
- [fastp](#fastp) - Adapter trimming for Illumina data
- [AdapterRemoval](#adapterremoval) - Adapter trimming for Illumina data
- [Porechop](#porechop) - Adapter removal for Oxford Nanopore data
- [Porechop_ABI](#porechop_abi) - Adapter removal for Oxford Nanopore data
- [Nonpareil](#nonpareil) - Read redundancy and metagenome coverage estimation for short reads
- [BBDuk](#bbduk) - Quality trimming and filtering for Illumina data
- [PRINSEQ++](#prinseq) - Quality trimming and filtering for Illunina data
- [Porechop](#porechop) - Adapter removal for Oxford Nanopore data
- [Porechop_ABI](#porechop_abi) - Adapter removal for Oxford Nanopore data
- [Filtlong](#filtlong) - Quality trimming and filtering for Nanopore data
- [Nanoq] (#nanoq) - Quality trimming and filtering for Nanopore data
- [Bowtie2](#bowtie2) - Host removal for Illumina reads
Expand Down Expand Up @@ -155,7 +155,7 @@ The resulting `.fastq` files may _not_ always be the 'final' reads that go into

</details>

In most cases you will just want to look at the PNG files which contain the extrapolation information for estimating how much of the metagenome 'coverage' you will recover if you sequence more (i.e., to help indicate at what point you will just keep sequencing redundant reads that provide no more new taxonomic information).
In most cases you will just want to look at the PNG files which contain the extrapolation information for estimating how much of the metagenome 'coverage' you will recover if you sequence more (i.e., to help indicate at what point you will just keep sequencing redundant reads that provide no more new taxonomic information).

The `.npo` files can be used for re-generating and customising the plots using the companion `Nonpareil` R package.

Expand Down
31 changes: 14 additions & 17 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,25 +274,22 @@ Before using this tool please note the following caveats:

:::warning

- It is not recommended to run this on deep sequencing data, or very large datasets
- Nonpareil requires uncompressed FASTQ files, and nf-core/taxprofiler will uncompress these in your working directory, potentially with a extremely large hard-drive footprint.
- Your shortest reads _after_ processing should not go below 24bp (see warning below)
- It is not recommended to keep unmerged (`--shortread_qc_includeunmerged`) reads when using the calculation.

:::info
If you get errors regarding the 'kmer' value is not correct, make sure your shortest reads _after_ processing is not less than 24bp.

If this is the case you will need to specify in a custom config

```nextflow
process {
withName: NONPAREIL_NONPAREIL {
ext.args = { "-k <NUMBER>" }
- Your shortest reads _after_ processing should not go below 24bp

If the 'kmer' value is not correct, make sure your shortest reads _after_ processing is not less than 24bp.

If this is the case you will need to specify in a custom config

```nextflow
process {
withName: NONPAREIL_NONPAREIL {
ext.args = { "-k <NUMBER>" }
}
}
}
```

Where `<NUMBER>` should be at least the shortest read in your library
```
Where `<NUMBER>` should be at least the shortest read in your library
:::
#### Complexity Filtering
Expand Down
2 changes: 1 addition & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@
"type": "boolean",
"description": "Turn on short-read metagenome sequencing redundancy estimation with nonpareil. Warning: only use for shallow short-read sequencing datasets.",
"fa_icon": "fas fa-toggle-on",
"help_text": "Turns on [nonpareil](https://nonpareil.readthedocs.io/en/latest/), a tool for estimating metagenome 'coverage', i.e, whether all genomes within the metagenome have had at least one read sequenced.\n\nIt estimates this by checking the read redundancy between a subsample of reads versus other reads in the library.\n\nThe more redundancy that exists, the larger the assumption that all possible reads in the library have been sequenced and all 'redundant' reads are simply sequencing of PCR duplicates.\n\nThe lower the redundancy, the more sequencing should be done until the entire metagenome has been captured. The output can be used to guide the amount of further sequencing is required.\n\nNote this is not the same as _genomic_ coverage, which is the number of times a base-pair is covered by unique reads on a reference genome.\n\nBefore using this tool please note the following caveats:\n\n- It is not recommended to run this on deep sequencing data, or very large datasets\n - Nonpareil requires uncompressed FASTQ files, and nf-core/taxprofiler will uncompress these in your working directory, potentially with a extremely large hard-drive footprint.\n- Your shortest reads _after_ processing should not go below 24bp (see warning below)\n- It is not recommended to keep unmerged (`--shortread_qc_includeunmerged`) reads when using the calculation.\n:::warning\nOn default settings, with 'kmer mode', you must make sure that your shortest processed reads do not go below 24 bp (the default kmer size).\n\nIf you have errors regarding kmer size, you will need to specify in a custom config in a process block\n\n```\n withName: NONPAREIL {\n ext.args = { \"-k <NUMBER>\" }\n }\n```\n\nWhere `<NUMBER>` should be at least the shortest read in your library\n:::"
"help_text": "Turns on [nonpareil](https://nonpareil.readthedocs.io/en/latest/), a tool for estimating metagenome 'coverage', i.e, whether all genomes within the metagenome have had at least one read sequenced.\n\nIt estimates this by checking the read redundancy between a subsample of reads versus other reads in the library.\n\nThe more redundancy that exists, the larger the assumption that all possible reads in the library have been sequenced and all 'redundant' reads are simply sequencing of PCR duplicates.\n\nThe lower the redundancy, the more sequencing should be done until the entire metagenome has been captured. The output can be used to guide the amount of further sequencing is required.\n\nNote this is not the same as _genomic_ coverage, which is the number of times a base-pair is covered by unique reads on a reference genome.\n\nBefore using this tool please note the following caveats:\n\n- It is not recommended to run this on deep sequencing data, or very large datasets\n - Your shortest reads _after_ processing should not go below 24bp (see warning below)\n- It is not recommended to keep unmerged (`--shortread_qc_includeunmerged`) reads when using the calculation.\n:::warning\nOn default settings, with 'kmer mode', you must make sure that your shortest processed reads do not go below 24 bp (the default kmer size).\n\nIf you have errors regarding kmer size, you will need to specify in a custom config in a process block\n\n```\n withName: NONPAREIL {\n ext.args = { \"-k <NUMBER>\" }\n }\n```\n\nWhere `<NUMBER>` should be at least the shortest read in your library\n:::"
},
"shortread_redundancyestimation_mode": {
"type": "string",
Expand Down
17 changes: 0 additions & 17 deletions subworkflows/local/nonpareil.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,6 @@ include { NONPAREIL_CURVE } from '../../modules/nf-core/nonpareil/cur
include { NONPAREIL_SET } from '../../modules/nf-core/nonpareil/set/main'
include { NONPAREIL_NONPAREILCURVESR } from '../../modules/nf-core/nonpareil/nonpareilcurvesr/main'

// Custom Functions

/*
*/
def extractNonpareilExtensionFromArrays(ch_input) {

return ch_profile
.map { meta, profile -> [meta.db_name, meta, profile] }
.combine(ch_database, by: 0)
.multiMap {
key, meta, profile, db_meta, db ->
profile: [meta, profile]
db: db
}
}

workflow NONPAREIL {
take:
reads // [ [ meta ], [ reads ] ]
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/visualization_krona.nf
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ workflow VISUALIZATION_KRONA {

KRONA_KTIMPORTTAXONOMY ( ch_krona_taxonomy_for_input, file(params.krona_taxonomy_directory, checkExists: true) )
ch_krona_html.mix( KRONA_KTIMPORTTAXONOMY.out.html )
ch_versions = ch_versions.mix ( GUNZIP.out.versions.first() )
ch_versions = ch_versions.mix( GUNZIP.out.versions.first() )
ch_versions = ch_versions.mix( MEGAN_RMA2INFO_KRONA.out.versions.first() )
ch_versions = ch_versions.mix( KRONA_KTIMPORTTAXONOMY.out.versions.first() )
}
Expand Down

0 comments on commit 0a3181f

Please sign in to comment.