Skip to content

Commit

Permalink
updates project3
Browse files Browse the repository at this point in the history
  • Loading branch information
Geert van Geest committed Mar 13, 2023
1 parent 3288ec8 commit d380e2c
Show file tree
Hide file tree
Showing 5 changed files with 63 additions and 11 deletions.
53 changes: 43 additions & 10 deletions docs/course_material/group_work/project3.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,32 @@ There are eight different species: `sample_[1-8].fastq.gz`



Each species has a fastq file available. Download only the data for the species that you will require:
Each species has a fastq file available. You can download all fastq files like this:

```sh
mkdir -p ~/workdir/groupwork_assembly
cd ~/workdir/groupwork_assembly
wget https://ngs-longreads-training.s3.eu-central-1.amazonaws.com/project3.tar.gz
tar -xvf project3.tar.gz
rm project3.tar.gz
```

!!! note
Download the data file package in your shared working directory, i.e. : `/group_work/<group name>` or `~/<group name>`. Only one group member has to do this.

# change this to your species:
species="sample_1"
This will create a directory `project3` with the following structure:

wget https://ngs-longreads-training.s3.eu-central-1.amazonaws.com/group_work_assembly/"$species".fastq.gz
tar -xvf "$species".fastq.gz
rm "$species".fastq.gz
```

project3
|-- sample_1.fastq.gz
|-- sample_2.fastq.gz
|-- sample_3.fastq.gz
|-- sample_4.fastq.gz
|-- sample_5.fastq.gz
|-- sample_6.fastq.gz
|-- sample_7.fastq.gz
`-- sample_8.fastq.gz
0 directories, 8 files
```

### Before you start

Expand All @@ -46,13 +58,34 @@ You can start this project with dividing the species over the different group me
conda activate assembly
```


* Perform a quality control with `NanoPlot`.
* How is the read quality? Is this quality expected?
* How is the read length?
* Perform an assembly with `flye`.
* Have a look at the helper first with `flye --help`. Make sure you pick the correct mode (i.e. `--pacbio-??`).
* Check out the output. Where is the assembly? How is the quality? For that, check out `assembly_info.txt`.
* What species did you assemble? Choose from this list:
```
Acinetobacter baumannii
Bacillus cereus
Bacillus subtilis
Burkholderia cepacia
Burkholderia multivorans
Enterococcus faecalis
Escherichia coli
Helicobacter pylori
Klebsiella pneumoniae
Listeria monocytogenes
Methanocorpusculum labreanum
Neisseria meningitidis
Rhodopseudomonas palustris
Salmonella enterica
Staphylococcus aureus
Streptococcus pyogenes
Thermanaerovibrio acidaminovorans
Treponema denticola
Vibrio parahaemolyticus
```
* Did flye assemble any plasmid sequences?
* Check the completeness with `BUSCO`. Have a good look at the manual first. You can use automated lineage selecton by specifying `--auto-lineage-prok`. After you have run `BUSCO`, you can generate a nice completeness plot with `generate_plot.py`. You can check its usage with `generate_plot.py --help`.
* How is the completeness? Is this expected?
Expand Down
7 changes: 7 additions & 0 deletions scripts/generate_data_project3/download_reads.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
for BC in bc2001 bc2002 bc2004 bc2007 bc2011 bc2019 bc2022 bc2015
do
wget -O "$BC".bam https://downloads.pacbcloud.com/public/dataset/2021-11-Microbial-96plex/demultiplexed-reads/m64004_210929_143746."${BC}".bam
samtools fastq -0 "$BC".fastq "$BC".bam
gzip "$BC".fastq
done

8 changes: 8 additions & 0 deletions scripts/generate_data_project3/lookup_bc_organism.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
bc2001,sample_6
bc2002,sample_7
bc2004,sample_3
bc2007,sample_1
bc2011,sample_5
bc2019,sample_2
bc2022,sample_8
bc2015,sample_4
4 changes: 4 additions & 0 deletions scripts/generate_data_project3/rename_fastq.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
sed 's/,/ /g' lookup_bc_organism.csv | while read BC NAME
do
mv "$BC".fastq.gz "$NAME.fastq.gz"
done
2 changes: 1 addition & 1 deletion scripts/project2_commands.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

cd ~/workdir/groupwork_pacbio/
cd ~/workdir/project2/

# generate reference for minimap2
minimap2 \
Expand Down

0 comments on commit d380e2c

Please sign in to comment.