Skip to content

a tool for identifying circular RNAs using RNA-Seq data

License

Notifications You must be signed in to change notification settings

TreesLab/circRNAFinder

 
 

Repository files navigation

circRNAFinder: a tool for identifying circular RNAs using RNA-Seq data

circRNAFinder is a software tool for identifying circRNAs using RNA deep sequencing data, as well as annotating and visualizing circRNAs and comparing their expression in different tissues or growth conditions. circRNAFinder is implemented in Python and R languages and is freely available at https://github.com/bioxfu/circRNAFinder under the GPL v3 license.


System Requirements:
1. Linux or Mac OS operation system
2. Python (version 2.6 or later) and R (version 2.15 or later)
3. Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)
4. BEDTools (http://code.google.com/p/bedtools/)
5. Integrative Genomics Viewer (http://www.broadinstitute.org/software/igv/)
6. FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) optional, used for collapsing the identical reads
7. SRA Toolkit (http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software) optional, used for converting the SRA files.
8. The installation directories of the required software should be included in your $PATH.


Download and Installation:
If you have Git, you can clone the latest version by running:

		git clone git://github.com/bioxfu/circRNAFinder

Or you can download the zipped codes at https://github.com/bioxfu/circRNAFinder

After downloading the circRNAFinder package, you can add the src directory to the $PATH variable.
For example, simply add the following two lines to the end of the ~/.bash_profile file:
 
  PATH=$PATH:/home/xfu/circRNAFinder/src
  export PATH


Input Reads Requirements:
1. The adaptor sequences and low-quality bases should be trimmed if necessary. The recommended tools are cutadapt (http://code.google.com/p/cutadapt/) and SolexaQA (http://solexaqa.sourceforge.net/).
2. Single-end reads in FASTA format. Paired-end reads should be merged into a single FASTA file.
3. Identical reads should be collapsed into unique reads. This step is performed by fastx_collapser.py script.
4. Length of short reads should be greater or equal to 40bp. The pipeline neglects the reads shorter than 40bp in the input files.

 
Configure File Format:
The configure file consists of one or more named sections, each of which can contain individual options with names and values.
Configure file sections are identified by looking for lines starting with [ and ending with ]. The value between the square brackets is the section name, and can contain any characters except square brackets.
Options are listed one per line within a section. The line starts with the name of the option, which is separated from the value by a colon (:) or equal sign (=). Whitespace around the separator is ignored when the file is parsed.

An example:
[lib1]
sample_name = HEK293
reads_file = ./fasta/HEK293.fa
genome_index = ./build/hg19_dna
trans_index = ./build/hg19_trans
genome_seq = ./build/hg19_dna.fa


Usage:
1. Create an empty directory, this will be the working directory.
2. Copy configure files (with .cfg suffix) from the circRNAFinder package to the working directory.
3. Edit configure files to substitute the option values with the names and paths for your own data. See Tutorial.sh for details.
4. Execute "circRNAFind.py -s 0 circRNAFind.cfg" in your working directory. This will find circRNAs in each library. You can find results in 'LIBRARY_NAME/output' directory.
5. Execute "circRNAAnno.py -s 0 circRNAAnno.cfg" in your working directory. This will combine and annotate the circRNAs detected in multiple libraries. You can find results in 'GROUP_NAME/table' directory.
6. Execute "circRNAView.py -s 0 circRNAView.cfg" in your working directory. This will save an image for each circRNA to 'GROUP_NAME/IGV' directory.


Tutorial:
The tutorial.sh helps you get started with circRNAFinder by demonstrating how to detect, annotate and visualize circRNAs using public RNA-Seq data. It could also be used as a template to build a pipeline for your own data.

Contact:
If you have any questions or would like to report a bug, please contact Dr. Fu Xing (bio.xfu@gmail.com).


About

a tool for identifying circular RNAs using RNA-Seq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.9%
  • Shell 10.1%
  • R 3.0%