NOVOLoci - Haplotype-aware assembly of long-sequencing reads

NOVOLoci is a haplotype aware assembler for targeted assembly or whole genome assembly of small genomes. Both HiFi and ONT reads can be used seperatly or combined.

Currently it is only available for Nanopore, PacBio and hybrid options will be available soon.

Cite

Will soon be available on BioRxiv

Getting help

Any issues/requests/problems/comments that are not yet addressed on this page can be posted on Github issues and I will try to reply the same day.

Or you can contact me directly through the following email address:

nicolasdierckxsens at hotmail dot com

If your assembly was unsuccessful, you could already add the log file and the configuration file to the mail, this could help me to identify the problem!

Instructions

1. Install dependencies

Install BLAST
Install MAFFT
Install Perl modules: MCE::Child, MCE::Channel and Parallel::ForkManager

cpan install MCE

cpan install Parallel::ForkManager

2. Find a suitable seed (only Targeted approach)

Sequence from a reference genome or from a previous assembly
Length should be at least 500 bp
Make sure you take a sequence before the complex region that you target, DO NOT start in repetitive or duplicated region!
The format should be like a standard fasta file (first line: >Id_sequence)

3. Create configuration file

You can download the example file (config.txt) and adjust the settings to your liking.
Every parameter of the configuration file is explained below.

4. Run NOVOLoci

perl NOVOLoci1.0.pl -c config.txt

Configuration file

This is an example of a configuration file for the assembly of a chloroplast. To make the assembler work, your configuration file has to have the exact same structure. (Make sure there is always a space after the equals sign and every parameter is captured in one single line)

1. Example of configuration file:

Project:
-----------------------
Project name          = projectname
Assembly length       = 1000000
Subsample             = 
Save assembled reads  = 
Seed Input            = /path/to/seed_file/Seed.fasta
Ploidy                = 2
Reference sequence    = 
Variance detection    = 
Cores                 = 30
Output path           = /path/to/output_folder/
TMP path              = /path/to/temporary_folder/

Nanopore reads:
-----------------------
Nanopore reads        = /path/to/reads/
Local DB and NP reads = /path/to/database/
Sequencing depth NP   = 
Min read length NP    =
Use Quality scores    =

PacBio reads:
-----------------------
PacBio reads          = /path/to/reads/
Local DB and PB reads = /path/to/database/
Sequencing depth PB   = 
Min read length PB    =

2. Explanation parameters:

Project:
-----------------------
Project name          = Choose a name for your project, it will be used for the output files.
Assembly length       = If you want the assembly to terminate after a certain length, you can give the desired length; 
                        If you want to assemble the complete dataset write: "WG"
Subsample             = This option is currently not available
Save assembled reads  = All the reads used for the assembly will be stored in seperate files (yes/no)
Seed Input            = The path to the file that contains the seed sequence.
Ploidy                = Give the ploidy of the sample. If it is a very heterozygous diploid species (>2%), you can give ploidy 1
Reference (optional)  = This option is currently not available.
Variance detection    = This option is currently not available.
Cores                 = It is strongly adviced to use multiple cores for the assembly, give here the available cores
Output path           = /path/to/output_folder/
TMP path              = /path/to/temporary_folder/

Nanopore reads:
-----------------------
Nanopore reads        = Only use this when you run the dataset for the first time. 
Local DB and NP reads = If you ran the dataset before, you can give the path of the previous output folder to reuse the database
Sequencing depth NP   = Give an estimation of the sequencing depth
Min read length NP    = Give the minimum read length to be used in the assembly, (default: 1000)
Use Quality scores    =

PacBio reads:
-----------------------
PacBio reads          = Only use this when you run the dataset for the first time. 
Local DB and NP reads = If you ran the dataset before, you can give the path of the previous output folder to reuse the database
Sequencing depth PB   = 
Min read length PB    = Give the minimum read length to be used in the assembly, (default: 500)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSE		LICENSE
NOVOLoci0.1.pl		NOVOLoci0.1.pl
README.md		README.md
config.txt		config.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOVOLoci - Haplotype-aware assembly of long-sequencing reads

Cite

Getting help

Instructions

1. Install dependencies

2. Find a suitable seed (only Targeted approach)

3. Create configuration file

4. Run NOVOLoci

Configuration file

About

Releases

Packages

Languages

License

ndierckx/NOVOLoci

Folders and files

Latest commit

History

Repository files navigation

NOVOLoci - Haplotype-aware assembly of long-sequencing reads

Cite

Getting help

Instructions

1. Install dependencies

2. Find a suitable seed (only Targeted approach)

3. Create configuration file

4. Run NOVOLoci

Configuration file

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages