Skip to content

These scripts were developed for a project at UCLA to compare structural variants found by various tools to known structural variants. It includes a very primitive structural variant type categorizer.

Notifications You must be signed in to change notification settings

samanthaleejensen/compare_svs

Repository files navigation

			   COMPARE SVS HELP GUIDE
-------------------------------------------------------------------------------

This script compares output from various structural variant finders with known 
structural variants.It was designed for use with chromosome 19 of 8 common 
mouse strains and thus may not generalize well.

-------------------------------------------------------------------------------
1 - Usage
-------------------------------------------------------------------------------
This script requires four arguments: 

CORRECT FILE: a file of the format described below in section 3 which contains 
your desired standard reference, ideally a verified and complete list of 
structural variants in your region of interest. 

SV TYPE: an abbreviation for the variant type you'd like to compare (see 
section 4 for instructions). 

TOOL NAME: the name of the structural variant finder used to generate your 
data that you wish to compare to the standard (see section 5 for supported 
tools). 

OUTPUT NAME: your desired name for the output files that will be 
generated by this script. Keep in mind that any existing file of the same
name will be replaced without warning. 

You should call the script as below:

python compare_svs.py [CORRECT FILE] [SV TYPE] [TOOL NAME] [OUTPUT NAME]

***NOTE*** 
In order to make this script friendly to those without computing skills 
it was designed with command line input of file locations required. 
Unfortunately this means that it is not able to be run as a supercomputing 
job. However, you will find that the comparisons and memory constraints 
of the script do not require great computing power and you should 
be able to run it from your personal computer if necessary. 

-------------------------------------------------------------------------------
2 - Dependencies
-------------------------------------------------------------------------------
For this script to work, the following classes and methods must be found
in the same directory as this main script, compare_svs.py: 
- parse_file_formats.py: contains scripts for parsing different file formats 
- output_methods.py: contains methods for printing to output files 
- reference.py: contains a class to hold all correct structural variants 
- variant_results.py: compare tool output to our reference structural variants 
- find_sv_type.py: used for SV tools that do not annotate structural variants 
- highest_global_alignment.py: a dependency of find_sv_type.py 

-------------------------------------------------------------------------------
3 - Correct SVs File Format
-------------------------------------------------------------------------------
In order to create a dictionary of correct structural variants to compare 
results against, you will need to create a reference file in the same format 
as correct_svs.txt (should be found in this same directory).

Each row in the file represents a different structural variant:
COLUMN 1: chromosome 
COLUMN 2: start position (base pair) 
COLUMN 2: end position (base pair) 
COLUMN 3: variant length (base pairs) 
COLUMN 4: variant type* 
REMAINING COLUMNS: strains**

*  Although correct_svs.txt has very specific variant types (ie Q6_del, H2_del) 
   this complexity is removed by the script that creates the reference dictionary 
   and is not necessary or even desired. Structural variant types will be reduced 
   to DEL, DUP, INV, INS, and CNV in most cases. 

** Each of these remaining columns represents whether or not the given structural 
   variant is present in the strain represented by the column header. A 0 here 
   indicates that the mutation is not found in that strain, while 1 means it is. 
   If you are attempting to use this script for just one species strain, simply 
   include only one column of whatever name you desire that is all ones. 

-------------------------------------------------------------------------------
4 - Supported Variant Types
-------------------------------------------------------------------------------
For any of the variant types found in your reference file you can choose to 
compare your output just with that variant type. For example, with the 
correct_svs.txt file that this script was developed for, the fifth 
column contains complex variant type names that will be simplified to 
DEL, INV, and DUP. Those would be the options you could choose from. 
To compare all variant types, simply type ALL. 

-------------------------------------------------------------------------------
5 - Supported Tools
-------------------------------------------------------------------------------
The following are structural variant finders whose output is currently 
interpretable by this script:
- BREAKDANCER
- LUMPY
- RDX
- CREST
- DELLY
- GENOME-STRIP
- VARSCAN

If you have a tool not on this list that outputs VCF format results, 
you may be able to choose another VCF output tool (like LUMPY) 
and get accurate comparisons. 

To get another tool output format included email samleejensen@gmail.com. 

-------------------------------------------------------------------------------
6 - Strains Tested
-------------------------------------------------------------------------------
The mouse strains in the dataset this script was developed for are as follows:
- A_J
- AKR_J
- BALB_cJ
- C3H_HeJ
- C57BL_6NJ
- CBA_J
- DBA_2J
- LP_J

About

These scripts were developed for a project at UCLA to compare structural variants found by various tools to known structural variants. It includes a very primitive structural variant type categorizer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages