This repository is dedicated to comparing the performance of our new assembler Beaver against three leading meta-assemblers, Aletsch, TransMeta, PsiCLASS, as well as two popular single-sample assemblers, StringTie2 and Scallop2. Here we provide instructions for downloading necessary tools, preparing datasets, executing the tools/pipelines and reproducing the results presented in the Beaver paper.
Our experiments involve the following tools:
Tool | Version | Description |
---|---|---|
Beaver | v1.0.0 | Cell-specific Assembler |
Aletsch | v1.1.0 | Meta Assembler |
Transmeta | v.1.0 | Meta Assembler |
PsiCLASS | v1.0.3 | Meta Assembler |
StringTie2 | v2.2.1 | Single-sample Assembler |
Scallop2 | v1.1.2 | Single-sample Assembler |
STAR | v2.7.11 | RNA-seq Aligner |
GffCompare | v0.11.2 | Evaluate assembled transcripts |
- Access the homepages of the respective tools using the links provided above.
- Follow the download and compilation instructions on each tool's homepage.
- For tools with available executable files, link or copy them to the
programs
directory. This includesbeaver
,aletsch
,scallop2
,stringtie
,STAR
andgffcompare
. - For tools without standalone executables (TransMeta and PsiCLASS), link the entire directory to
programs
.
Ensure the tools are accessible via the following paths:
your/path/to/programs/beaver
your/path/to/programs/aletsch
your/path/to/programs/TransMeta/TransMeta
your/path/to/programs/psiclass/psiclass
your/path/to/programs/stringtie
your/path/to/programs/scallop2
your/path/to/programs/STAR
your/path/to/programs/gffcomapre
You may need to rename some of the executable files to match the paths listed above.
We evaluate the performance of the six methods using four datasets, as outlined below. Each dataset is identified by its unique prefix (used in this repository) and accession ID for reference.
Dataset | # Cells | Protocol | Accession ID |
---|---|---|---|
HEK293T | 192 | Smart-seq3 | E-MTAB-8735 |
Mouse-Fibroblast | 369 | Smart-seq3 | E-MTAB-8735 |
Use STAR for read alignments for each sample/cell. For every dataset, compile a list of all BAM file paths as required by the different meta-assemblers. Simulated counterparts are generated by RSEM.
Execute the provided scripts in the results
directory to run the simulator and assemblers for the four datasets:
./simulate.HEK293T.sh
./simulate.Mouse-Fibroblast.sh
./run.HEK293T.sh
./run.Mouse-Fibroblast.sh
Execute the provided scripts in the train
directory to run the evaluation pipeline in the manuscript. Beaver_General and Beaver_Specific models are trained on Chr1-9 and tested on the other chromosomes.
./train_test_real.py
./train_test_sim.py