Skip to content

Double phylogenetic placement of mixed samples (MISA)

Kamil S. Jaron edited this page Mar 22, 2024 · 1 revision

Using MISA for mixed genome skim analyses

Prepare the query and distances.

We will now place the mixed individual (a known hybrid called Saccharomyces pastorianus) onto the tree using a double-placement tool MISA.

cd $USERWORK
cd skmer-tutorial
mkdir mix-query
cp genomes/Saccharomyces_pastorianus/GCA_001515485.2_Saccharomyces_pastorianus_Weihenstephan_34_70_chromosomes_assembly_1.0_genomic.fna mix-query/Saccharomyces_pastorianus.fna

These are the real constituents of Saccharomyces pastorianus.

cat genomes/Saccharomyces_pastorianus/things.txt

Recall that yesterday, we used -a to add Saccharomyces cerevisiae to the reference set. Let us first infer a backbone tree that includes Saccharomyces cerevisiae.

# Update the distance matrix to include the added species Saccharomyces cerevisiae
skmer distance -t library/

# Build the full tree with included
tsv_to_phymat.sh ref-dist-mat.txt  ref-dist-mat-full.phy
fastme -i ref-dist-mat-full.phy -o full.tre

Start by computing distances from the mixed query to the references.

# Run Skmer
skmer query -t mix-query/Saccharomyces_pastorianus.fna library/
# Convert output to .tsv file
convert_to_tsv.sh dist-saccharomyces_pastorianus.txt > dist-saccharomyces_pastorianus.tsv

Ignoring mixtures:

Now, place the sample onto the tree, ignoring that it is a mixture.

run_apples.py -t backbone-fastme.tre -d dist-saccharomyces_pastorianus.tsv -o pastorianus-single.jplace
guppy tog pastorianus-single.jplace
nw_display pastorianus-single.tog.tre

Placement of mixed samples with both constituents present

Let's jump to MISA runs.

# Run MISA for phylogenetic double placemet
run_misa.py -d dist-saccharomyces_pastorianus.tsv -t full.tre -o mixed-output-present.jplace


# Check the output versus correct mixture:
guppy tog mixed-output-present.jplace
nw_display full.tre
nw_display mixed-output-present.tog.tre

Placement of mixed samples with one constituent missing

Now, let's try the double-placement when one of the constituents is missing from the backbone.

nw_display backbone-fastme.tre

# Run MISA for phylogenetic double placemet
run_misa.py -d dist-saccharomyces_pastorianus.tsv -t backbone-fastme.tre -o mixed-output.jplace


# Check the output versus correct mixture:
guppy tog mixed-output.jplace
cat genomes/Saccharomyces_pastorianus/things.txt
nw_display backbone-fastme.tre
nw_display mixed-output.tog.tre

You will see the following beautiful result. As you can see, MISA correctly identified the two parent species of Saccharomyces pastorianus.

  • Top: the full reference tree before removing Saccharomyces cerevisiae. The Two blue branches are known constituents of Saccharomyces pastorianus.
  • Bottom: Results of placement of Saccharomyces pastorianus on the tree after removing Saccharomyces cerevisiae.

Placement of mixed samples with both constituents missing

nw_prune backbone-fastme.tre Saccharomyces_eubayanus > backbone-noconst.tre
run_misa.py -d dist-saccharomyces_pastorianus.tsv -t backbone-noconst.tre -o mixed-output-noconst.jplace
guppy tog mixed-output-noconst.jplace
nw_display mixed-output-noconst.tog.tre

Table of content

Introduction

k-mer spectra analysis

Separation of chromosomes

Species assignment using short k-mers

Others

Clone this wiki locally