Skip to content

DavidsonGroup/nailpolish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’… nailpolish

Build status Static Badge GitHub Release

nailpolish is a collection of tools made for the deduplication of UMIs when working with long read single cell data.

Install Β Β  | Β Β  Example Β Β  | Β Β  Usage

Install

nailpolish is distributed as a single binary with no dependencies (beyond libc). Up-to-date builds are available through the Releases section for macOS (Intel & Apple Silicon) and x64-based Linux systems.

Releases: macOS, Linux

nailpolish is in active development. If you are running into any issues, please check to ensure that you are using the most current version of the software!

Example

Say I have a demultiplexed sample.fastq file of the following formβ€”for instance, one generated using the Flexiplex demultiplexer:

@BC1_UMI1
sequence...
+
quality...

I first create an index file using

$ nailpolish index --file sample.fastq --output index.tsv

I can view summary statistics about duplicate rates using:

$ nailpolish summary --index index.tsv

and I can also transparently remove duplicate reads using:

$ nailpolish call \
  --index index.tsv \
  --input sample.fastq \
  --output sample_called.fastq \
  --threads 4

which will output all non-duplicated and consensus called reads, removing all the original duplicated reads in the process.

Usage

Help

πŸ’… nailpolish version 0.1.0
   ──────────────────────────────────
   tools for consensus calling barcode and UMI duplicates
   https://github.com/DavidsonGroup/nailpolish

Usage: nailpolish generate-index [OPTIONS] --file <FILE>
       nailpolish summary --index <INDEX>
       nailpolish call [OPTIONS] --index <INDEX> --input <INPUT>
       nailpolish group [OPTIONS] --index <INDEX> --input <INPUT> [COMMAND]...
       nailpolish help [COMMAND]...

Options:
  -h, --help     Print help
  -V, --version  Print version

nailpolish generate-index:
Create an index file from a demultiplexed .fastq, if one doesn't already exist
      --file <FILE>    the input .fastq file
      --index <INDEX>  the output index file [default: index.tsv]
  -h, --help           Print help

nailpolish summary:
Generate a summary of duplicate statistics from an index file
      --index <INDEX>  the index file
  -h, --help           Print help

nailpolish call:
Generate a consensus-called 'cleaned up' file
      --index <INDEX>          the index file
      --input <INPUT>          the input .fastq
      --output <OUTPUT>        the output .fasta; note that quality values are not preserved
  -t, --threads <THREADS>      the number of threads to use [default: 4]
  -d, --duplicates-only        only show the duplicated reads, not the single ones
  -r, --report-original-reads  for each duplicate group of reads, report the original reads along with the consensus
  -h, --help                   Print help

nailpolish group:
'Group' duplicate reads, and pass to downstream applications
      --index <INDEX>      the index file
      --input <INPUT>      the input .fastq
      --output <OUTPUT>    the output location, or default to stdout
      --shell <SHELL>      the shell used to run the given command [default: bash]
  -t, --threads <THREADS>  the number of threads to use. this will not guard against race conditions in any downstream applications used. this will effectively set the number of individual processes to launch [default: 1]
  -h, --help               Print help
  [COMMAND]...         the command to run. any groups will be passed as .fastq standard input [default: cat]

nailpolish help:
Print this message or the help of the given subcommand(s)
  [COMMAND]...  Print help for the subcommand(s)
Example of --duplicates-only and --report-original-reads Suppose I have a demultiplexed read file of the following format (so that seq2 and seq3 are duplicates):
@BCUMI_1
seq1
@BCUMI_2
seq2
@BCUMI_2
seq3
Then, the effects of the following flags are:
(default):
  >BCUMI_1_SIN
  seq1
  >BCUMI_2_CON_2
  seq2_and_3_consensus
--duplicates-only:
  >BCUMI_2_CON_2
  seq2_and_3_consensus
--report-original-reads
  >BCUMI_1_SIN
  seq1
  >BCUMI_2_DUP_1_of_2
  seq2
  >BCUMI_2_DUP_2_of_2
  seq3
  >BCUMI_2_CON_2
  seq2_and_3_consensus

Install from source

Prebuilt binaries

The recommended way to download Nailpolish is to use the automated builds, which can be found in the Releases section for macOS (Intel + Apple Silicon) and x64 Linux systems.

Install from source

You will need a modern version of Rust installed on your machine, as well as the Cargo package manager. That's it - all package installations will be done automatically at the build stage. This will install nailpolish into your local PATH.

$ cargo install --git https://github.com/DavidsonGroup/nailpolish.git

# or, from a local directory
$ cargo install --path .

Note to HPC users on older systems

You will need a reasonably modern version of gcc and cmake installed, and the CARGO_NET_GIT_FETCH_WITH_CLI flag enabled. For instance:

$ module load gcc/latest cmake/latest
$ CARGO_NET_GIT_FETCH_WITH_CLI="true" cargo install --git https://github.com/DavidsonGroup/nailpolish.git

Build from source

$ git clone https://github.com/DavidsonGroup/nailpolish.git
$ cargo build --release

The binary can be found at /target/release/nailpolish.

About

πŸ’… Consensus call duplicates to clean up data

Resources

Stars

Watchers

Forks

Packages

No packages published