Skip to content

cobilab/CompressSequences

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How compressible are genetic sequences?


This repository provides information-reproducibility on how compressible different sequences are using different data compressors.

Data compression tools


Data Compressor Repository Description
bsc-m03 v0.2.1 code article
bzip2 1.0.8 code article
DMCompress code article
GeCo2 code article
GeCo3 code article
JARVIS2 code article
JARVIS3 code under review
lzma 5.2.5 code article
MemRGC code article
MFCompress code article
NAF code article
paq8l code article

Reproducibility:

Change directory and give permitions:

cd scripts/
chmod +x *.sh
./Main.sh

Alternatively:

#
./InstallTools.sh      # install listed compressors, GTO, and AlcoR
./DownloadFASTA.sh     # downloads FASTA files
./GetCassava.sh        # gunzip cassava files
./GetAlcoRFASTA.sh     # simulates and stores 2 synthetic FASTA sequences
./FASTA2seq.sh         # cleans FASTA files and stores raw sequence files
./DownloadDNAcorpus.sh # download raw sequences from a balanced sequence corpus
./GetDSinfo.sh         # map sequences into their ids, sorted by size; view sequences info
#
./RunTestsExample.sh   # run bench
./ProcessBenchRes.sh   # sort results by BPS and time
./Plot.sh              # plot sorted results

Use case: Run Bench only for Human Chromosome Y (CY) and Escherichia Coli

#
./InstallTools.sh                                   # install listed compressors, GTO, and AlcoR
./DownloadFASTA.sh -id NC_000024.1 -id NC_000913.3  # downloads CY and Escherichia Coli FASTA files
./FASTA2seq.sh                                      # cleans FASTA files and stores raw sequence files
./GetDSinfo.sh                                      # map sequences into their ids, sorted by size; view sequences info
#
./RunTestsExample.sh                                # run bench
./ProcessBenchRes.sh                                # sort results by BPS and time
./Plot.sh                                           # plot sorted results

About

How compressible are sequences?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages