-
Notifications
You must be signed in to change notification settings - Fork 183
Arrowhead
#Finding Contact Domains (Arrowhead)#
##Quick Description##
This is the usage that most users will likely use (more detailed usage below):
arrowhead <HiC file> <output_file>
Upon a successful run of Arrowhead, output_file will contain all the contact domains found along the diagonal in this format.
###Examples###
arrowhead local/folder/HIC006.hic local/folder/contact_domains_list
This command will run Arrowhead on HIC006 at resolution 5 kB or 10 kB (depending on the map's resolution) and save all contact domains to the contact_domains_list file.
arrowhead https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined_30.hic contact_domains_list
This command will run Arrowhead at resolution 5kB on the GM12878 HiC map (high resolution) and save all contact domains to the contact_domains_list file. Note: these are the settings used to generate the official GM12878 contact domain list.
Default parameters for arrowhead described below.
##Detailed Usage##
arrowhead [-c chromosome(s)] [-m matrix size] [-r resolution]
[-k normalization (NONE/VC/VC_SQRT/KR)] <HiC file>
<output_file> [feature_list] [control_list]
The required arguments are:
- <HiC file>: Address of HiC file which should end with ".hic". This is the file you would load into Juicebox. URLs or local addresses may be used. Running Arrowhead on MAPQ>30 and MAPQ>0 files generally gives comprable results.
- <output_file>: Final list of all contact domains found by Arrowhead. Can be visualized directly in Juicebox as a 2D annotation.
-- NOTE -- If you want to find scores for a feature and control list, both must be provided:
- [feature_list]: Feature list of loops/domains for which block scores are to be calculated
- [control_list]: Control list of loops/domains for which block scores are to be calculated
The optional arguments are:
-
-c <String(s)>
Chromosome(s) on which Arrowhead will be run. The number/letter for the chromosome can be used with or without appending the "chr" string. Multiple chromosomes can be specified using commas (e.g. 1,chr2,X,chrY) -
-m <int>
Size of the sliding window along the diagonal in which contact domains will be found. Must be an even number as (m/2) is used as the increment for the sliding window. (Default 2000) -
-r <int>
resolution for which Arrowhead will be run. Generally, 5kB (5000) or 10kB (10000) resolution is used depending on the depth of sequencing in the HiC file(s). -
-k <NONE/VC/VC_SQRT/KR>
Normalizations (case sensitive) that can be selected. Generally, KR (Knight-Ruiz) balancing should be used when available.
###Defaults### Arrowhead uses the following parameters if optional flags are not provided.
Medium resolution maps:
-c (all chromosomes)
-m 2000
-r 10000
-k KR
High resolution maps:
-c (all chromosomes)
-m 2000
-r 5000
-k KR
###Domain List Content### The contact domain list created by Arrowhead will start with a header line, followed by a line for every contact domain. By default, the file should contain 12 fields per line in the following format:
chromosome1 x1 x2 chromosome2 y1 y2 color
corner_score Uvar Lvar Usign Lsign
Explanations of each field are as follows:
- chromosome = the chromosome that the domain is located on
- x1,x2/y1,y2 = the interval spanned by the domain (contact domains manifest as squares on the diagonal of a Hi-C matrix and as such: x1=y1, x2=y2)
- color = the color that the feature will be rendered as if loaded in Juicebox
- corner_score = the corner score, a score indicating the likelihood that a pixel is at the corner of a contact domain. Higher values indicate a greater likelihood of being at the corner of a domain
- Uvar = the variance of the upper triangle
- Lvar = the variance of the lower triangle
- Usign = -1*(sum of the sign of the entries in the upper triangle)
- Lsign = sum of the sign of the entries in the lower triangle
See Section IV.a.3 of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014 for more details.