-
Notifications
You must be signed in to change notification settings - Fork 0
These scripts were developed for a project at UCLA to compare structural variants found by various tools to known structural variants. It includes a very primitive structural variant type categorizer.
samanthaleejensen/compare_svs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
COMPARE SVS HELP GUIDE ------------------------------------------------------------------------------- This script compares output from various structural variant finders with known structural variants.It was designed for use with chromosome 19 of 8 common mouse strains and thus may not generalize well. ------------------------------------------------------------------------------- 1 - Usage ------------------------------------------------------------------------------- This script requires four arguments: CORRECT FILE: a file of the format described below in section 3 which contains your desired standard reference, ideally a verified and complete list of structural variants in your region of interest. SV TYPE: an abbreviation for the variant type you'd like to compare (see section 4 for instructions). TOOL NAME: the name of the structural variant finder used to generate your data that you wish to compare to the standard (see section 5 for supported tools). OUTPUT NAME: your desired name for the output files that will be generated by this script. Keep in mind that any existing file of the same name will be replaced without warning. You should call the script as below: python compare_svs.py [CORRECT FILE] [SV TYPE] [TOOL NAME] [OUTPUT NAME] ***NOTE*** In order to make this script friendly to those without computing skills it was designed with command line input of file locations required. Unfortunately this means that it is not able to be run as a supercomputing job. However, you will find that the comparisons and memory constraints of the script do not require great computing power and you should be able to run it from your personal computer if necessary. ------------------------------------------------------------------------------- 2 - Dependencies ------------------------------------------------------------------------------- For this script to work, the following classes and methods must be found in the same directory as this main script, compare_svs.py: - parse_file_formats.py: contains scripts for parsing different file formats - output_methods.py: contains methods for printing to output files - reference.py: contains a class to hold all correct structural variants - variant_results.py: compare tool output to our reference structural variants - find_sv_type.py: used for SV tools that do not annotate structural variants - highest_global_alignment.py: a dependency of find_sv_type.py ------------------------------------------------------------------------------- 3 - Correct SVs File Format ------------------------------------------------------------------------------- In order to create a dictionary of correct structural variants to compare results against, you will need to create a reference file in the same format as correct_svs.txt (should be found in this same directory). Each row in the file represents a different structural variant: COLUMN 1: chromosome COLUMN 2: start position (base pair) COLUMN 2: end position (base pair) COLUMN 3: variant length (base pairs) COLUMN 4: variant type* REMAINING COLUMNS: strains** * Although correct_svs.txt has very specific variant types (ie Q6_del, H2_del) this complexity is removed by the script that creates the reference dictionary and is not necessary or even desired. Structural variant types will be reduced to DEL, DUP, INV, INS, and CNV in most cases. ** Each of these remaining columns represents whether or not the given structural variant is present in the strain represented by the column header. A 0 here indicates that the mutation is not found in that strain, while 1 means it is. If you are attempting to use this script for just one species strain, simply include only one column of whatever name you desire that is all ones. ------------------------------------------------------------------------------- 4 - Supported Variant Types ------------------------------------------------------------------------------- For any of the variant types found in your reference file you can choose to compare your output just with that variant type. For example, with the correct_svs.txt file that this script was developed for, the fifth column contains complex variant type names that will be simplified to DEL, INV, and DUP. Those would be the options you could choose from. To compare all variant types, simply type ALL. ------------------------------------------------------------------------------- 5 - Supported Tools ------------------------------------------------------------------------------- The following are structural variant finders whose output is currently interpretable by this script: - BREAKDANCER - LUMPY - RDX - CREST - DELLY - GENOME-STRIP - VARSCAN If you have a tool not on this list that outputs VCF format results, you may be able to choose another VCF output tool (like LUMPY) and get accurate comparisons. To get another tool output format included email samleejensen@gmail.com. ------------------------------------------------------------------------------- 6 - Strains Tested ------------------------------------------------------------------------------- The mouse strains in the dataset this script was developed for are as follows: - A_J - AKR_J - BALB_cJ - C3H_HeJ - C57BL_6NJ - CBA_J - DBA_2J - LP_J
About
These scripts were developed for a project at UCLA to compare structural variants found by various tools to known structural variants. It includes a very primitive structural variant type categorizer.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published