Skip to content

Commit

Permalink
MRG: cleanup on aisle 10 (#7)
Browse files Browse the repository at this point in the history
* update AMR some more

* upd configuration script

* ssh key comment out

* hide @ctb
  • Loading branch information
ctb authored Apr 29, 2024
1 parent 7d9465f commit 2aa846d
Show file tree
Hide file tree
Showing 5 changed files with 53 additions and 20 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ pip install -r requirements.txt

and then either `mkdocs build` or `mkdocs serve`.

Note, the website at @CTB will be automatically built.

Note, the website at
[ngs-docs.github.io/2024-pig-paradigm-workshop/](https://ngs-docs.github.io/2024-pig-paradigm-workshop/)
will automatically build and deploy on merge to main.
20 changes: 13 additions & 7 deletions configure.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,25 @@
set -x
set -e

. /opt/miniconda3/etc/profile.d/conda.sh

# add my ssh key
echo ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBHFz3WLVqV+0md4OkZi/S0a79cOO7Ax8S4Dledp832JhMQ0GJ0ZlmEnWZrIv83KnRexpAEi5w6H1aSackGjucgQ= t@TitusMatsalmoth.attlocal.net >> ~/.ssh/authorized_keys

mkdir ~/data/

cd ~/data/
curl -JLO http://farm.cse.ucdavis.edu/~ctbrown/transfer/IBD_tutorial_subset.tar.gz
tar xzf IBD_tutorial_subset.tar.gz
curl -JLO http://farm.cse.ucdavis.edu/~ctbrown/transfer/tutorial_other.tar.gz
tar xzf tutorial_other.tar.gz

#curl -JLO http://farm.cse.ucdavis.edu/~ctbrown/transfer/IBD_tutorial_subset.tar.gz
tar xzf /opt/shared/IBD_tutorial_subset.tar.gz
#curl -JLO http://farm.cse.ucdavis.edu/~ctbrown/transfer/tutorial_other.tar.gz
tar xzf /opt/shared/tutorial_other.tar.gz

mkdir ~/databases/
cd ~/databases/
curl -JLO https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/gtdb-rs214/gtdb-rs214-k31.zip
curl -JLO https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/gtdb-rs214/gtdb-rs214.lineages.csv.gz
ln -fs /opt/shared/gtdb-rs214-k31.zip .
ln -fs /opt/shared/gtdb-rs214.lineages.csv.gz .
#curl -JLO https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/gtdb-rs214/gtdb-rs214-k31.zip
#curl -JLO https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/gtdb-rs214/gtdb-rs214.lineages.csv.gz

cd ~/

Expand Down
42 changes: 34 additions & 8 deletions docs/amr.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,31 @@
# Analyzing metagenomes for Antimicrobial Resistance (AMR) Genes

## Install amrfinder, megahit, and prodigal.
We're going to use
[AMRFinderPlus](https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/),
together with the
[megahit metagenome assembler](https://github.com/voutcn/megahit) and
the
[prodigal gene finder](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-119),
to look for antimicrobial resistance genes in the CD136 metagenome.

We're going to do this by assembling the CD136 metagenome using the megahit
assembler. This will give us contigs that represent the high coverage portion
of the metagenome.

@CTB links to software
## Install amrfinder, megahit, and prodigal.

First, install the software. Run:
```
mamba create -n amrfinder -y ncbi-amrfinderplus megahit prodigal csvtk
conda activate amrfinder
```

Download the amrfinder database:
Next, download the amrfinderplus database. Run:
```
amrfinder -u
```

and set up & change to a working directory:
And, finally, set up & change to a working directory. Run:
```
mkdir ~/amr/
cd ~/amr/
Expand All @@ -24,7 +35,9 @@ cd ~/amr/

We'll start by assembling the CD136 metagenome into contigs. In this
case, we're not going to bin the contigs, because AMR genes
[don't assemble well](https://www.biorxiv.org/content/10.1101/2023.12.13.571436v1.full).
[don't assemble well](https://www.biorxiv.org/content/10.1101/2023.12.13.571436v1.full), and in particular don't assemble into regions that are
connected to their host genome. So we run the assembler, and look at genes
on the resulting contigs.

Run:
```
Expand All @@ -44,14 +57,27 @@ This will produce a FASTA file containing many protein sequences:
```
head CD136.assembly.faa
```
These are the (partial & complete) genes found by the `prodigal` software.

And, finally, run AMRfinder on the proteins:
```
amrfinder -p CD136.assembly.faa -t 16 -o CD136.amrfinder.csv --plus
amrfinder -p CD136.assembly.faa -t 16 -o CD136.amrfinder.tsv --plus
```

This will produce a spreadsheet named `CD136.amrfinder.tsv` that
contains a number of columns - you can see the list like so, using
`csvtk headers`:

```
csvtk -t headers CD136.amrfinder.tsv
```

To pick out just a few columns, you can use `csvtk cut`.

Run:
```
csvtk -t cut -f "% Coverage of reference sequence","HMM description" CD136.amrfinder.csv
csvtk -t cut -f "% Coverage of reference sequence","HMM description" CD136.amrfinder.tsv
```

@CTB examine output files.
<!-- @CTB say something output the files. -->

4 changes: 2 additions & 2 deletions docs/comparing-metagenomes.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ cd ~/compare-metag

## Comparing based on content

* reference free, annotation free @CTB
<!-- * reference free, annotation free @CTB -->

Here we are going to use the
[`sourmash plot`](https://sourmash.readthedocs.io/en/latest/command-line.html#sourmash-plot-cluster-and-visualize-comparisons-of-many-signatures)
command to compare and cluster many metagenomes based on their content.
command to compare and cluster many metagenomes based on their content - not their annotation or assemblies.

As with the [single metagenome analysis](single-metagenomes-taxonomy.md), we have two options here: with, or without abundance information.

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction

@CTB stuff about workshop
<!-- @CTB stuff about workshop -->

Tutorials:

Expand Down

0 comments on commit 2aa846d

Please sign in to comment.