Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Version 1.0.0-rc2 #82

Merged
merged 184 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
98d9289
Initial implementation of ARIBA
HarryHung Jun 21, 2023
c1fd7c2
Include ARIBA info
HarryHung Jun 21, 2023
0106a59
Update ARIBA reference and metadata
HarryHung Jun 21, 2023
ede4f84
Rename included ARIBA references
HarryHung Jun 28, 2023
1233fde
Allow custom ARIBA references and save database
HarryHung Jun 28, 2023
de24962
Implement validation of existing ARIBA database
HarryHung Jun 30, 2023
93a926d
Improve naming of BWA database module and script
HarryHung Jun 30, 2023
199e213
Add MD5 check to BWA database
HarryHung Jun 30, 2023
9279759
Output ARIBA database info
HarryHung Jun 30, 2023
9ebff00
Improve DATABASES_INFO maintainability
HarryHung Jun 30, 2023
f8b857a
Improve code maintainability
HarryHung Jun 30, 2023
d6ab8c6
Include ARIBA database generation
HarryHung Jun 30, 2023
4140943
Improve code maintainability
HarryHung Jun 30, 2023
3bbde9b
Update ARIBA database
HarryHung Jul 4, 2023
bb3c158
Update default ARIBA reference files
HarryHung Jul 5, 2023
0a898f8
Remove unused script
HarryHung Jul 6, 2023
3ec7089
Initial work on extracting AMR from ARIBA report
HarryHung Jul 6, 2023
67277d0
Save AMR info to FreeText_Drug column
HarryHung Jul 6, 2023
ad3cdc8
Revert ARIBA assembler back to default
HarryHung Jul 6, 2023
2f8c6dc
Ensure type matching of variable comparsion
HarryHung Jul 7, 2023
7bf5362
Remove LNZ from ARIBA database
HarryHung Jul 13, 2023
619a2e9
Update SeroBA image
HarryHung Jul 13, 2023
20af1fd
Use both normal and debug reports of ARIBA
HarryHung Jul 13, 2023
9c747b3
Improve header of ARIBA metadata
HarryHung Jul 13, 2023
9f649c3
Further work on extracting info from ARIBA reports
HarryHung Jul 13, 2023
44b1331
Improve robustness of JSON capture
HarryHung Jul 14, 2023
a0938d1
Improve target names
HarryHung Jul 14, 2023
a4d2251
ARIBA-based AMR detection prototype
HarryHung Jul 14, 2023
7a38ec5
Update to reflect change from AMRsearch to ARIBA
HarryHung Jul 17, 2023
5905964
Add ARIBA-related options, and update credits
HarryHung Jul 17, 2023
69009e2
Update License of ARIBA and resistanceDatabase
HarryHung Jul 17, 2023
d0ea4fc
Update Output section based on ARIBA-based AMR
HarryHung Jul 18, 2023
08ba429
Add AMR inference based on other AMR
HarryHung Jul 18, 2023
82c6fa9
Include new AMR in results.csv
HarryHung Jul 18, 2023
c2d321c
Fixing ERY and CLI determinant output when empty
HarryHung Jul 18, 2023
7de8841
Add information on new AMR
HarryHung Jul 18, 2023
11b318b
Merge branch 'dev' into feature/ariba-amr
HarryHung Jul 19, 2023
0ebe754
Merge pull request #57 from HarryHung/feature/ariba-amr
HarryHung Jul 19, 2023
e8a2905
Ensure version of Python 3 is captured
HarryHung Jul 19, 2023
dc1064e
Remove unnecessary versioning of Het-SNP Counter
HarryHung Jul 19, 2023
47bb796
Change default Python image to include Pandas
HarryHung Jul 19, 2023
231b7b1
Improve Docker Image capturing
HarryHung Jul 19, 2023
5d07225
Save reports as .csv
HarryHung Jul 20, 2023
ea1b1ba
Combining reports to generate sample report
HarryHung Jul 20, 2023
01b22b8
Initial work on output revamp (WIP)
HarryHung Jul 20, 2023
d0c5123
Save Lineage report per sample as .csv
HarryHung Jul 21, 2023
bfcca31
Add quote to csv header
HarryHung Jul 21, 2023
897aa09
Save Serotype report as .csv
HarryHung Jul 21, 2023
58a549f
Save MLST report as .csv
HarryHung Jul 21, 2023
51aebdc
Save PBP AMR report as .csv
HarryHung Jul 21, 2023
2bb7ad1
Save other AMR report as .csv
HarryHung Jul 21, 2023
e94cc56
Improve clarity of Read QC module
HarryHung Jul 24, 2023
fe079b9
Correct output column names
HarryHung Jul 24, 2023
19312f9
Initial implementation of overall report revamp
HarryHung Jul 24, 2023
4c7fc28
Improve comments; remove obsolete code
HarryHung Jul 24, 2023
073ddd3
Generate results.csv based on ARIBA metadata
HarryHung Jul 24, 2023
fd905cf
Improve shell scripts style
HarryHung Jul 25, 2023
29caffd
Refactor to improve maintainability & readability
HarryHung Jul 27, 2023
60614c8
Improve shell scripts style
HarryHung Jul 27, 2023
00e3de3
Fix comment
HarryHung Jul 27, 2023
b813ad0
Refactor to improve maintainability & readability
HarryHung Jul 27, 2023
6002529
Improve code comments
HarryHung Jul 28, 2023
63ad10b
Improve shell scripts style
HarryHung Jul 28, 2023
2e6707a
Fix outputing incorrect variable for Het-SNP#
HarryHung Jul 28, 2023
8d35258
Avoid numbers output as float
HarryHung Jul 28, 2023
1327f76
Refactor to improve maintainability & readability
HarryHung Jul 28, 2023
8f15a7d
Improve shell scripts style
HarryHung Jul 31, 2023
9d6ead6
Improve chart
HarryHung Jul 31, 2023
d899ec9
Update Nextflow executable to 23.04.2
HarryHung Jul 31, 2023
a98510e
Improve wording of messages.
HarryHung Jul 31, 2023
fa3f58f
Improve shell scripts style
HarryHung Jul 31, 2023
6c54b3e
Improve names & comments of processes & scripts
HarryHung Jul 31, 2023
76afb57
Extract scripts to separated files
HarryHung Aug 1, 2023
2e625f5
Improve shell scripts style
HarryHung Aug 1, 2023
0c70390
Improve option names consistency
HarryHung Aug 1, 2023
d76351c
Improve help message
HarryHung Aug 1, 2023
7935001
Improve content and update to reflect changes
HarryHung Aug 1, 2023
457b996
Update version of ARIBA container
HarryHung Aug 2, 2023
ebc7ce5
Update credits section
HarryHung Aug 2, 2023
8398069
Merge pull request #58 from HarryHung/feature/output-revamp
HarryHung Aug 2, 2023
db629f2
Use full Pandas image for NF metrics collection
HarryHung Aug 2, 2023
bc1842c
Merge pull request #59 from HarryHung/bugfix/python-metrics
HarryHung Aug 2, 2023
09672c2
Include Virulence in the chart
HarryHung Aug 2, 2023
494bfb3
Include information on virulence detection
HarryHung Aug 2, 2023
f3e330a
Merge pull request #60 from HarryHung/feautre/update-readme
HarryHung Aug 2, 2023
2dc2f89
Improve the robustness of LSF profile
HarryHung Aug 3, 2023
6c075e5
Improve LSF config for database processes
HarryHung Aug 3, 2023
4eac3df
Improve lsf profile description
HarryHung Aug 4, 2023
3eab3b0
Merge pull request #61 from HarryHung/feature/improve-lsf-profile
HarryHung Aug 4, 2023
ad2aa01
Update mlst image with latest database
HarryHung Aug 4, 2023
32fb3eb
Merge pull request #62 from HarryHung/feature/mlst-update
HarryHung Aug 4, 2023
0d4a5c4
Prototype schema
HarryHung Aug 4, 2023
0372bc2
Add sangertower profile
HarryHung Aug 4, 2023
c276926
Fix parameter validation for multiple profiles
HarryHung Aug 4, 2023
dc78cc6
Add basic I/O options
HarryHung Aug 4, 2023
c79932f
Update schema
HarryHung Aug 4, 2023
e629017
Fix missing period
HarryHung Aug 4, 2023
d8b2c7d
Ensure usage of DSL2
HarryHung Aug 6, 2023
b77377b
Fix type mismatch error
HarryHung Aug 7, 2023
1562dbe
Yield more accurate number
HarryHung Aug 7, 2023
9766fe2
Remove default values from schema
HarryHung Aug 7, 2023
208d312
Merge branch 'feature/better-tower-support' of https://github.com/Har…
HarryHung Aug 7, 2023
8b86a4c
Correct schema title
HarryHung Aug 7, 2023
628aadf
Display reports on NF Tower UI
HarryHung Aug 7, 2023
999212c
Improve clarity of Singularity error message
HarryHung Aug 8, 2023
ec1516c
Merge pull request #63 from HarryHung/feature/better-tower-support
HarryHung Aug 11, 2023
db33914
Add info.txt to Tower Reports
HarryHung Aug 14, 2023
ae7029f
Consolidate local databases
HarryHung Aug 14, 2023
365cb8e
Fix saving of BWA database
HarryHung Aug 14, 2023
4e7f024
Fix version printing
HarryHung Aug 14, 2023
384b6e3
Update Kraken2 database directory name
HarryHung Aug 14, 2023
4d02888
Fix PopPUNK external clusters info output
HarryHung Aug 14, 2023
1369db2
Fix saving of PopPUNK External Clusters
HarryHung Aug 14, 2023
cd1a0d1
Update schema based on latest changes
HarryHung Aug 15, 2023
9ae5c39
Fix database info output when saved externally
HarryHung Aug 15, 2023
a864719
Use publishDir to save info.txt for NF Tower
HarryHung Aug 15, 2023
5d3c0c9
Reflect changes from databases consolidation
HarryHung Aug 15, 2023
c139b6e
Remove empty row in Experimental table
HarryHung Aug 15, 2023
9698f6a
Merge pull request #64 from HarryHung/feature/databases-consolidation
HarryHung Aug 15, 2023
c1b2b97
Update ARIBA reference sequences
HarryHung Aug 15, 2023
4d7c80c
Merge pull request #65 from HarryHung/feature/ariba-db-update
HarryHung Aug 17, 2023
f13a349
Add information about Nextflow Tower
HarryHung Aug 18, 2023
2ee8154
Merge pull request #66 from HarryHung/feature/readme-nf-tower
HarryHung Aug 18, 2023
0d5f62e
Fix incorrect target of ermBups and ermbTr
HarryHung Aug 22, 2023
6be485c
Simplify ARIBA reference files naming
HarryHung Aug 22, 2023
5c3250b
ARIBA AMR detection mechanism update
HarryHung Aug 22, 2023
7c42388
Remove unnecessary ariba run option
HarryHung Aug 23, 2023
b04cc1f
Merge pull request #67 from HarryHung/feature/ariba-amr-update
HarryHung Aug 23, 2023
c1a1ba3
Improve schema for NF Tower
HarryHung Aug 24, 2023
7863e36
Merge pull request #68 from HarryHung/feature/schema-update
HarryHung Aug 24, 2023
84b8f4c
Add missing process labels to GET_KRAKEN2_DB
HarryHung Aug 24, 2023
a8ff167
Merge pull request #69 from HarryHung/bugfix/scratchless_kraken2
HarryHung Aug 24, 2023
00804a5
Update default GPS PopPUNK database URL
HarryHung Sep 6, 2023
46e94bb
Update fastp version
HarryHung Sep 8, 2023
e6d8474
Merge pull request #70 from HarryHung/feature/fastp-update
HarryHung Sep 13, 2023
155efe8
Correct folP variant nucleotide range
HarryHung Sep 19, 2023
751c3ac
Merge pull request #71 from HarryHung/feature/ariba-folp-update
HarryHung Sep 20, 2023
034b6b0
Add mapped read depth check for gene detection
HarryHung Sep 26, 2023
cc1fa25
Ensure sample report generation is stable
HarryHung Sep 27, 2023
747208e
Fix variable name
HarryHung Sep 27, 2023
85d4140
Quote variable input
HarryHung Sep 27, 2023
6fc3061
Improve input handling
HarryHung Sep 28, 2023
1a49e82
Merge pull request #72 from HarryHung/fix/stable-sample-output
HarryHung Sep 28, 2023
5fda044
Merge pull request #73 from HarryHung/feature/ariba-depth-check
HarryHung Oct 2, 2023
515a812
Update custom images
HarryHung Oct 2, 2023
3bd004d
Update descriptions of custom images
HarryHung Oct 2, 2023
be3d9ee
Update default SeroBA remote database
HarryHung Oct 2, 2023
a33663d
Change SeroBA database to release-based
HarryHung Oct 4, 2023
b0af563
Remove Git container information
HarryHung Oct 4, 2023
66ac6b4
Update info messages
HarryHung Oct 4, 2023
cb878ff
Update description of seroba_db_remote
HarryHung Oct 4, 2023
7d55b51
Merge pull request #74 from HarryHung/feature/update-custom-images
HarryHung Oct 4, 2023
5e2972f
Add second most abundant species check
HarryHung Sep 20, 2023
8282b78
Add info on the second most abundant species check
HarryHung Sep 21, 2023
d7c9772
Switch from 2nd species to top non-Strep genus
HarryHung Oct 3, 2023
fb6c680
Update based on Taxonomy QC changes
HarryHung Oct 3, 2023
ae8ccea
Show max non-Strep genus percentage in reads QC
HarryHung Oct 4, 2023
eb65850
Merge pull request #75 from HarryHung/feature/extra-taxonomy-qc
HarryHung Oct 4, 2023
40a7527
Fix relational operators to match descriptions
HarryHung Oct 16, 2023
4663956
Reflect changes in relational operators
HarryHung Oct 16, 2023
13c9b7e
Use dev SeroBA and remove Singularity workaround
HarryHung Oct 16, 2023
02b95d7
Merge pull request #76 from HarryHung/feature/unify-relational-operators
HarryHung Oct 17, 2023
4660719
Update SeroBA descriptions
HarryHung Oct 17, 2023
f2d86c6
Update to the latest SeroBA official release
HarryHung Oct 17, 2023
3077e83
Merge pull request #77 from HarryHung/feature/seroba-update
HarryHung Oct 17, 2023
9343659
Update the included Nextflow executable to 23.10.0
HarryHung Oct 17, 2023
67f3807
Add Launch on Nextflow Tower shield
HarryHung Oct 17, 2023
cb57017
Merge pull request #78 from HarryHung/feature/nextflow-update
HarryHung Oct 17, 2023
7f5b7be
Correct error message
HarryHung Oct 17, 2023
85f88cf
Add extra folA sequences
HarryHung Oct 23, 2023
1ac7fa4
Rewrite for variants' gene mapped depth check & FQ
HarryHung Oct 25, 2023
b0a0896
Merge pull request #79 from HarryHung/feature/improve-variant-detection
HarryHung Oct 30, 2023
f863704
Bump version to 1.0.0-rc1
HarryHung Oct 31, 2023
1054ad6
Reflect changes in relational operators
HarryHung Oct 31, 2023
be7941c
Add warning about output overwrite
HarryHung Nov 2, 2023
a774fb8
Add warning about Docker Desktop for Linux
HarryHung Nov 7, 2023
d8243c8
Remove test input from the repository
HarryHung Nov 15, 2023
2a92d7c
Ignore test_input
HarryHung Nov 15, 2023
e06eecb
Script for downloading test input
HarryHung Nov 15, 2023
b8c3e48
Update content for optional test data
HarryHung Nov 16, 2023
ec4d973
Fix: ref_start and ref_end could be non-numeric
HarryHung Nov 17, 2023
25c764c
Merge pull request #80 from HarryHung/bugfix/parse-other-resistance-fix
HarryHung Nov 17, 2023
b274c51
Merge pull request #81 from HarryHung/feature/optional-test-input
HarryHung Nov 17, 2023
c11bb4b
Bump version to 1.0.0-rc2
HarryHung Nov 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ work
databases
input
output
test_input
*.html

# Singularity cache
Expand Down
208 changes: 117 additions & 91 deletions README.md

Large diffs are not rendered by default.

11 changes: 0 additions & 11 deletions bin/assembly_qc.sh

This file was deleted.

8 changes: 8 additions & 0 deletions bin/call_snp.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Call SNPs and save to .vcf
# Remove source sorted BAM file if $LITE is true

bcftools mpileup --threads "$(nproc)" -f "$REFERENCE" "$SORTED_BAM" | bcftools call --threads "$(nproc)" -mv -O v -o "$VCF"

if [ "$LITE" = true ]; then
rm "$(readlink -f "$SORTED_BAM")"
fi
33 changes: 33 additions & 0 deletions bin/check-create_ariba_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Check if ARIBA database was prepared from the specific reference sequences and metadata.
# If not: remove the $OUTPUT directory, and prepare the ARIBA database from reference sequences and metadata, also save metadata to JSON

REF_SEQUENCES_MD5=$(md5sum "$REF_SEQUENCES" | awk '{ print $1 }')
METADATA_MD5=$(md5sum "$METADATA" | awk '{ print $1 }')

if [ ! -f "${DB_LOCAL}/${JSON_FILE}" ] || \
[ ! "$(grep '"reference"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$REF_SEQUENCES" ] || \
[ ! "$(grep '"reference_md5"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$REF_SEQUENCES_MD5" ] || \
[ ! "$(grep '"metadata"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$METADATA" ] || \
[ ! "$(grep '"metadata_md5"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$METADATA_MD5" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/00.info.txt" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/00.version_info.txt" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/01.filter.check_genes.log" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/01.filter.check_metadata.log" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/01.filter.check_metadata.tsv" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/01.filter.check_noncoding.log" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.all.fa" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.clusters.pickle" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.clusters.tsv" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.gene.fa" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.gene.varonly.fa" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.noncoding.fa" ] || \
[ ! -f "${DB_LOCAL}/${OUTPUT}/02.cdhit.noncoding.varonly.fa" ] ; then

rm -rf "${DB_LOCAL}"

mkdir -p "${DB_LOCAL}"
ariba prepareref -f "$REF_SEQUENCES" -m "$METADATA" "${DB_LOCAL}/${OUTPUT}"

echo -e "{\n \"reference\": \"$REF_SEQUENCES\",\n \"reference_md5\": \"$REF_SEQUENCES_MD5\",\n \"metadata\": \"$METADATA\",\n \"metadata_md5\": \"$METADATA_MD5\",\n \"create_time\": \"$(date +"%Y-%m-%d %H:%M:%S %Z")\"\n}" > "${DB_LOCAL}/${JSON_FILE}"

fi
24 changes: 24 additions & 0 deletions bin/check-create_ref_genome_bwa_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Check if BWA database was prepared from the specific reference.
# If not: remove files in database directory, and construct the FM-index database of the reference genome for BWA, also save metadata to JSON

REFERENCE_MD5=$(md5sum "$REFERENCE" | awk '{ print $1 }')

if [ ! -f "${DB_LOCAL}/${JSON_FILE}" ] || \
[ ! "$(grep '"reference"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$REFERENCE" ] || \
[ ! "$(grep '"reference_md5"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$REFERENCE_MD5" ] || \
[ ! -f "${DB_LOCAL}/${PREFIX}.amb" ] || \
[ ! -f "${DB_LOCAL}/${PREFIX}.ann" ] || \
[ ! -f "${DB_LOCAL}/${PREFIX}.bwt" ] || \
[ ! -f "${DB_LOCAL}/${PREFIX}.pac" ] || \
[ ! -f "${DB_LOCAL}/${PREFIX}.sa" ] ; then

rm -rf "${DB_LOCAL}"

bwa index -p "$PREFIX" "$REFERENCE"

mkdir -p "${DB_LOCAL}"
mv "${PREFIX}.amb" "${PREFIX}.ann" "${PREFIX}.bwt" "${PREFIX}.pac" "${PREFIX}.sa" -t "${DB_LOCAL}"

echo -e "{\n \"reference\": \"$REFERENCE\",\n \"reference_md5\": \"$REFERENCE_MD5\",\n \"create_time\": \"$(date +"%Y-%m-%d %H:%M:%S %Z")\"\n}" > "${DB_LOCAL}/${JSON_FILE}"

fi
34 changes: 34 additions & 0 deletions bin/check-create_seroba_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Check if database was downloaded from specific link, also prepared by the specific Kmer
# If not: remove files in database directory and download, re-create KMC and ARIBA databases, also save metadata to JSON

ZIPPED_REPO='seroba.tar.gz'

if [ ! -f "${DB_LOCAL}/${JSON_FILE}" ] || \
[ ! "$(grep '"url"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$DB_REMOTE" ] || \
[ ! "$(grep '"kmer"' "${DB_LOCAL}/${JSON_FILE}" | sed -r 's/.+: "(.*)",?/\1/')" == "$KMER" ] || \
[ ! -d "${DB_LOCAL}/ariba_db" ] || \
[ ! -d "${DB_LOCAL}/kmer_db" ] || \
[ ! -d "${DB_LOCAL}/streptococcus-pneumoniae-ctvdb"] || \
[ ! -f "${DB_LOCAL}/cd_cluster.tsv" ] || \
[ ! -f "${DB_LOCAL}/cdhit_cluster" ] || \
[ ! -f "${DB_LOCAL}/kmer_size.txt" ] || \
[ ! -f "${DB_LOCAL}/meta.tsv" ] || \
[ ! -f "${DB_LOCAL}/reference.fasta" ]; then

rm -rf "${DB_LOCAL}"

wget "${DB_REMOTE}" -O $ZIPPED_REPO

mkdir tmp
tar -xzf $ZIPPED_REPO --strip-components=1 -C tmp

mkdir -p "${DB_LOCAL}"
mv tmp/database/* "${DB_LOCAL}"

seroba createDBs "${DB_LOCAL}" "${KMER}"

rm -f $ZIPPED_REPO

echo -e "{\n \"url\": \"$DB_REMOTE\",\n \"kmer\": \"$KMER\",\n \"create_time\": \"$(date +"%Y-%m-%d %H:%M:%S %Z")\"\n}" > "${DB_LOCAL}/${JSON_FILE}"

fi
29 changes: 29 additions & 0 deletions bin/check-download_kraken2_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Check if all file exists and were obtained from the database at the specific link.
# If not: remove files in database directory, download, and unzip to database directory, also save metadata to JSON

ZIPPED_DB='kraken2_db.tar.gz'

if [ ! -f "${DB_LOCAL}/${JSON_FILE}" ] || \
[ ! "$DB_REMOTE" == "$(jq -r .url "${DB_LOCAL}/${JSON_FILE}")" ] || \
[ ! -f "${DB_LOCAL}/hash.k2d" ] || \
[ ! -f "${DB_LOCAL}/opts.k2d" ] || \
[ ! -f "${DB_LOCAL}/taxo.k2d" ]; then

rm -rf "${DB_LOCAL}"

wget "${DB_REMOTE}" -O $ZIPPED_DB

# Use tmp dir and find to ensure files are saved directly at $DB_LOCAL regardless of archive directory structure
mkdir tmp
tar -xzf $ZIPPED_DB -C tmp
mkdir -p "${DB_LOCAL}"
find tmp -type f -exec mv {} "$DB_LOCAL" \;

rm -f $ZIPPED_DB

jq -n \
--arg url "${DB_REMOTE}" \
--arg save_time "$(date +"%Y-%m-%d %H:%M:%S %Z")" \
'{"url" : $url, "save_time": $save_time}' > "${DB_LOCAL}/${JSON_FILE}"

fi
32 changes: 32 additions & 0 deletions bin/check-download_poppunk_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Return PopPUNK database name

# Check if all files exist and were obtained from the database at the specific link.
# If not: remove all sub-directories, download, and unzip to database directory, also save metadata to JSON

DB_NAME=$(basename "$DB_REMOTE" .tar.gz)
DB_PATH=${DB_LOCAL}/${DB_NAME}

if [ ! -f "${DB_LOCAL}/${JSON_FILE}" ] || \
[ ! "$DB_REMOTE" == "$(jq -r .url "${DB_LOCAL}/${JSON_FILE}")" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}.h5" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}.dists.npy" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}.dists.pkl" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}_fit.npz" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}_fit.pkl" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}_graph.gt" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}_clusters.csv" ] || \
[ ! -f "${DB_PATH}/${DB_NAME}.refs" ]; then

rm -rf "${DB_LOCAL}"

wget "$DB_REMOTE" -O poppunk_db.tar.gz
mkdir -p "${DB_LOCAL}"
tar -xzf poppunk_db.tar.gz -C "$DB_LOCAL"
rm poppunk_db.tar.gz

jq -n \
--arg url "$DB_REMOTE" \
--arg save_time "$(date +"%Y-%m-%d %H:%M:%S %Z")" \
'{"url" : $url, "save_time": $save_time}' > "${DB_LOCAL}/${JSON_FILE}"

fi
22 changes: 22 additions & 0 deletions bin/check-download_poppunk_ext_clusters.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Return PopPUNK External Clusters file name

# Check if specific external clusters file exists and was obtained from the specific link.
# If not: remove all csv files, and download to database directory, also save metadata to JSON

EXT_CLUSTERS_CSV=$(basename "$EXT_CLUSTERS_REMOTE")

if [ ! -f "${EXT_CLUSTERS_LOCAL}/${JSON_FILE}" ] || \
[ ! "$EXT_CLUSTERS_REMOTE" == "$(jq -r .url "${EXT_CLUSTERS_LOCAL}/${JSON_FILE}")" ] || \
[ ! -f "${EXT_CLUSTERS_LOCAL}/${EXT_CLUSTERS_CSV}" ]; then

rm -rf "${EXT_CLUSTERS_LOCAL}"

mkdir -p "${EXT_CLUSTERS_LOCAL}"
wget "$EXT_CLUSTERS_REMOTE" -O "${EXT_CLUSTERS_LOCAL}/${EXT_CLUSTERS_CSV}"

jq -n \
--arg url "$EXT_CLUSTERS_REMOTE" \
--arg save_time "$(date +"%Y-%m-%d %H:%M:%S %Z")" \
'{"url" : $url, "save_time": $save_time}' > "${EXT_CLUSTERS_LOCAL}/${JSON_FILE}"

fi
11 changes: 11 additions & 0 deletions bin/convert_sam_to_sorted_bam.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Convet SAM to sorted BAM file
# Remove source SAM file if $LITE is true

samtools view -@ "$(nproc)" -b "$SAM" > "$BAM"

samtools sort -@ "$(nproc)" -o "$SORTED_BAM" "$BAM"
rm "$BAM"

if [ "$LITE" = true ]; then
rm "$(readlink -f "$SAM")"
fi
10 changes: 5 additions & 5 deletions bin/get_docker_compose.sh → bin/create_docker_compose.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Generate a Docker compose file that includes all images used in nextflow.config
# Generate a Docker compose file that includes all images used in $NEXTFLOW_CONFIG

COUNT=0

echo "services:" >> $COMPOSE
echo "services:" >> "$COMPOSE"

grep -E "container\s?=" $NEXTFLOW_CONFIG \
grep -E "container\s?=" "$NEXTFLOW_CONFIG" \
| sort -u \
| sed -r "s/\s+container\s?=\s?'(.+)'/\1/" \
| while read -r IMAGE ; do
COUNT=$((COUNT+1))
echo " SERVICE${COUNT}:" >> $COMPOSE
echo " image: $IMAGE" >> $COMPOSE
echo " SERVICE${COUNT}:" >> "$COMPOSE"
echo " image: $IMAGE" >> "$COMPOSE"
done
9 changes: 0 additions & 9 deletions bin/create_seroba_db.sh

This file was deleted.

95 changes: 95 additions & 0 deletions bin/generate_overall_report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#! /usr/bin/env python3

# Generate overall report based on sample reports and columns specified by COLUMNS_BY_CATEGORY and ARIBA metadata

import sys
from itertools import chain
import pandas as pd
import glob


# Specify columns need to be included in the output file and their orders (except those based on ARIBA metadata)
COLUMNS_BY_CATEGORY = {
'IDENTIFICATION': ['Sample_ID'],
'QC': ['Read_QC' , 'Assembly_QC' , 'Mapping_QC' , 'Taxonomy_QC' , 'Overall_QC'] ,
'READ': ['Bases'],
'ASSEMBLY': ['Contigs#' , 'Assembly_Length' , 'Seq_Depth'],
'MAPPING': ['Ref_Cov_%' , 'Het-SNP#'],
'TAXONOMY': ['S.Pneumo_%', 'Top_Non-Strep_Genus', 'Top_Non-Strep_Genus_%'],
'LINEAGE': ['GPSC'],
'SEROTYPE': ['Serotype'],
'MLST': ['ST' , 'aroE' , 'gdh' , 'gki' , 'recP' , 'spi' , 'xpt' , 'ddl'],
'PBP': ['pbp1a' , 'pbp2b' , 'pbp2x' , 'AMO_MIC' , 'AMO_Res' , 'CFT_MIC' , 'CFT_Res(Meningital)' , 'CFT_Res(Non-meningital)' , 'TAX_MIC' , 'TAX_Res(Meningital)' , 'TAX_Res(Non-meningital)' , 'CFX_MIC' , 'CFX_Res' , 'MER_MIC' , 'MER_Res' , 'PEN_MIC' , 'PEN_Res(Meningital)' , 'PEN_Res(Non-meningital)']
}


# Check argv and save to global variables
if len(sys.argv) != 4:
sys.exit('Usage: generate_overall_report.py INPUT_PATTERN ARIBA_METADATA OUTPUT_FILE')
INPUT_PATTERN = sys.argv[1]
ARIBA_METADATA = sys.argv[2]
OUTPUT_FILE = sys.argv[3]


def main():
output_columns = get_output_columns()
df_output = get_df_output(output_columns)

# Saving df_output to OUTPUT_FILE in csv format
df_output.to_csv(OUTPUT_FILE, index=False, na_rep='_')


# Get output columns based on COLUMNS_BY_CATEGORY and ARIBA metadata
def get_output_columns():
output_columns = list(chain.from_iterable(COLUMNS_BY_CATEGORY.values()))
add_ariba_columns(output_columns)
return output_columns


# Based on ARIBA metadata, add additional output columns
def add_ariba_columns(output_columns):
# Get all targets in ARIBA metadata
ariba_targets = set(pd.read_csv(ARIBA_METADATA, sep='\t')['target'].unique())

# Adding special cases if certain targets exist
if 'TET' in ariba_targets:
ariba_targets.add('DOX')
if 'FQ' in ariba_targets:
ariba_targets.add('LFX')
if 'TMP' in ariba_targets and 'SMX' in ariba_targets:
ariba_targets.add('COT')
if 'ERY_CLI' in ariba_targets:
ariba_targets.update(['ERY', 'CLI'])

# Add all targets alphabetically, except always adding PILI at the end
pilis = []
for target in sorted(ariba_targets):
if target.lower().startswith('pili'):
pilis.append(target)
else:
output_columns.extend([f'{target}_Res', f'{target}_Determinant'])
for pili in pilis:
output_columns.extend([f'{pili}', f'{pili}_Determinant'])


# Generating df_output based on all sample reports with columns in the order of output_columns
def get_df_output(output_columns):
# Generate an empty dataframe as df_manifest based on output_columns
df_manifest = pd.DataFrame(columns=output_columns)

# Generate a dataframe for each sample report and then concat df_manifest and all dataframes into df_output
dfs = [df_manifest]
reports = glob.glob(INPUT_PATTERN)
for report in reports:
df = pd.read_csv(report, dtype=str)
dfs.append(df)
df_output = pd.concat(dfs, ignore_index=True).sort_values(by=['Sample_ID'])

# Ensure column order in df_output is the same as output_columns
df_output = df_output[output_columns]

return df_output


if __name__ == "__main__":
main()
5 changes: 5 additions & 0 deletions bin/generate_sample_report.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Combine all csv reports into a single csv, then add Sample_ID as the first field

paste -d , ${SAMPLE_ID}_process_report_*.csv \
| sed '1 s/^/\"Sample_ID\",/' \
| sed "2 s/^/\"${SAMPLE_ID}\",/" > "$SAMPLE_REPORT"
14 changes: 14 additions & 0 deletions bin/get_assembly_qc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Extract assembly QC information and determine QC result based on report.tsv from Quast, total base count

CONTIGS=$(awk -F'\t' '$1 == "# contigs (>= 0 bp)" { print $2 }' "$REPORT")
LENGTH=$(awk -F'\t' '$1 == "Total length" { print $2 }' "$REPORT")
DEPTH=$(echo "scale=2; $BASES / $LENGTH" | bc -l)

if [[ $CONTIGS -le $QC_CONTIGS ]] && [[ $LENGTH -ge $QC_LENGTH_LOW ]] && [[ $LENGTH -le $QC_LENGTH_HIGH ]] && [[ "$(echo "$DEPTH >= $QC_DEPTH" | bc -l)" == 1 ]]; then
ASSEMBLY_QC="PASS"
else
ASSEMBLY_QC="FAIL"
fi

echo \"Assembly_QC\",\"Contigs#\",\"Assembly_Length\",\"Seq_Depth\" > "$ASSEMBLY_QC_REPORT"
echo \""$ASSEMBLY_QC"\",\""$CONTIGS"\",\""$LENGTH"\",\""$DEPTH"\" >> "$ASSEMBLY_QC_REPORT"
Loading