From 7b0622788cbfbf571c34fff55924991b6c688893 Mon Sep 17 00:00:00 2001 From: Chris Saunders Date: Tue, 22 Aug 2023 17:47:46 -0700 Subject: [PATCH] Add doc and data updates for v0.1.7 release --- CHANGELOG.md | 13 +++++ README.md | 4 +- .../cnv.excluded_regions.hg19.bed.gz | Bin 0 -> 16853 bytes .../cnv.excluded_regions.hg19.bed.gz.tbi | Bin 0 -> 11657 bytes .../cnv.excluded_regions.hs37d5.bed.gz | Bin 0 -> 14818 bytes .../cnv.excluded_regions.hs37d5.bed.gz.tbi | Bin 0 -> 11035 bytes .../convert_hg19_regions_to_hs37d5.bash | 31 ++++++++++++ .../get_cnv_exclusion_regions.bash | 21 +++++--- data/expected_cn/expected_cn.hg19.XX.bed | 6 +++ data/expected_cn/expected_cn.hg19.XY.bed | 6 +++ ...ed_cn.hg38.bed => expected_cn.hg38.XX.bed} | 0 ...ed_cn.hg38.bed => expected_cn.hg38.XY.bed} | 0 data/expected_cn/expected_cn.hs37d5.XX.bed | 6 +++ data/expected_cn/expected_cn.hs37d5.XY.bed | 6 +++ docs/aux_data.md | 47 +++++++++++++++--- docs/outputs.md | 16 +++++- docs/quickstart.md | 35 ++++++++----- 17 files changed, 161 insertions(+), 30 deletions(-) create mode 100644 data/excluded_regions/cnv.excluded_regions.hg19.bed.gz create mode 100644 data/excluded_regions/cnv.excluded_regions.hg19.bed.gz.tbi create mode 100644 data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz create mode 100644 data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz.tbi create mode 100755 data/excluded_regions/convert_hg19_regions_to_hs37d5.bash create mode 100644 data/expected_cn/expected_cn.hg19.XX.bed create mode 100644 data/expected_cn/expected_cn.hg19.XY.bed rename data/expected_cn/{female_expected_cn.hg38.bed => expected_cn.hg38.XX.bed} (100%) rename data/expected_cn/{male_expected_cn.hg38.bed => expected_cn.hg38.XY.bed} (100%) create mode 100644 data/expected_cn/expected_cn.hs37d5.XX.bed create mode 100644 data/expected_cn/expected_cn.hs37d5.XY.bed diff --git a/CHANGELOG.md b/CHANGELOG.md index d225f98..2fc2f9e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,18 @@ # Change Log +## v0.1.7 - 2023-08-22 + +### Added +- Add new data files to partially support hg19 and hs37d5 + +### Changed +- Rename expected copy number example files from male/female to XY/XX + +### Fixed +* New FAQ section to explain common errors +* Improved error message for cov-regex mismatch +* Improved error messaging for mismatches between aligned BAM contigs and provided reference file contigs + ## v0.1.6 - 2023-03-29 ### Additions * Added support for minimap2-aligned BAM files diff --git a/README.md b/README.md index 6c76915..d98415a 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,8 @@ HiFiCNV is a copy number variant (CNV) caller optimized for HiFi reads. Key feat - CNV output in bedgraph and VCF formats - Efficient multi-threaded analysis +Authors: [Chris Saunders](https://github.com/ctsa), [Matt Holt](https://github.com/holtjma) + ## Early release warning Please note that HiFiCNV is still in early development. We are still tweaking the input / output file formats and making changes that can affect the behavior of the program. @@ -26,7 +28,7 @@ We are still tweaking the input / output file formats and making changes that ca ## Need help? If you notice any missing features, bugs, or need assistance with analyzing the output of HiFiCNV, -please don't hesitate to [reach out by email](mailto:mholt@pacificbiosciences.com) or open a GitHub issue. +please open a GitHub issue. ## Support information HiFiCNV is a pre-release software intended for research use only and not for use in diagnostic procedures. diff --git a/data/excluded_regions/cnv.excluded_regions.hg19.bed.gz b/data/excluded_regions/cnv.excluded_regions.hg19.bed.gz new file mode 100644 index 0000000000000000000000000000000000000000..c8e2d76d51403683cdaf2f23a16b942b7f422913 GIT binary patch literal 16853 zcmV)qK$^cFiwFb&00000{{{d;LjnM}L4}=5vn9KYoyW(2#>qE1^xPvyIHGEFL#KhkL7BB#;a6Ep zzu$sXu9nxD2GS*P6Xmp4$_B~?H&JUNbwvFVc#B5Lt+xpn0`I3=bIfbzuX0^k;9E3v zUn|!GYU^MBlW-sgNpp}H)_q=MotcuD;>uSmWy#y-@rEz6{ zi+uGplHJ?Ts6y{sx2Fy%ebw{0>sMDSl31)r|_jskX31AUe@*^iV1qzD9`La z-4>KxFSFWng_Jt3gRJp~&tihIpT%l9zRJ4Fxys6^wUCli-=A^SD}G!PoX1^lTC<2K zZPM#NV+8GW3;Mar>M6d;>RDH*SE@5=F9=rMsMzNUDa9JzMlJq~2=-^J*QEVLl)2ib zSO-Z_?IOqr*}xVxOa!|h)l^%J+6#hKHkx^aFKvRkUpik_OA*BcZNIdQlvnhn5v0CR znIrl}2xi`>jS_t$1a*HS>;6~xxF$&ZaZ9^;NHM`UlsQ&dFZL+cu9tc-@CFnU^tDlL zVJAVbJ4vmzgq_3$^*X5MV65JPzCUA~DTj}1f^r_$=AHO)m;DoJX}0ijP0;t_wlZ?~ zxF#qEwKl`YHNo1CYac5dr4Z~`*C|pac?xi>5aWyac?xi>5aXvxHp>M z^u|7W4d2KFZNJg5=`d`&gHF9sE^%)(!Q5{&D${dBp(Sn9ZDb576DMOA0*6Lg<`Pf^PCpx^)DFMt2rUA_92BsFl*pZ@UoKm6^lCjRTc|L*Vq`j8xzJ5GO>r_c50r+`U0Av)SrS@vF6~oj`ffs>snb*v&|SE! z=IK$Ut6V}iq3N2dSH8+>m(W{Xz1lK${4%B7I^+<#3419yXQ4WU-ojjpje4UURQwoe z0#h+fnGsq-@4q)~(r%rEaus?D?S+^|P0P0zYC<=mY0+2D8#SSib;cDhKVF4h|9@TX z`WiNN+GJxtb3Na|{$`jOoM{MvGMNM3=5+*h15;O@)5#X}bgd1Fn<~WMWhQ_L zG^`I3*ee+5HVylxg21|o?lN;pVSSh&*9{mqLPqbp%d}x2tdAw#)5iEqP1!S!K)^55 zUV?1pcLJZRj*9QG^D~&A}tS=1I-e<@_pyyZu_f~FW zXi^WbUS^9%|JS|RTH7kXz!13owN2_PnWk7r+u$y<5yAe#z}(;_oAxG*lS~@zK46;6 zR^O(#+uyCU!CS_NKI#Tj;4ZUXGHqaP%tf=_`?8HK%RYG7zy?M?0A*{iw$w#KX?A1K z`|nepHeFp(iRb7-H)&z;u2%(FM+1FGO%KbMIj z^4q`s>;L2%|(Pqvt>@ow~zJQKv*u`@xHISr6t~cWVM=LD>wxj>V~5dFK43l9O~#mgFa-$M5Ek6UCXMSF z8`$Xh>>o!kuP!i6@eUai^Ob>RV_?uZ1n%C%a$@N+>M#fGF=zh7o8TS48F2io-*34qqS6Koq*uomKs@64VNj<6$ z1TzbmVj0ybg0dM#h}s7NW>0Hsrj9@7*K`E?E~YYv?ZYO%r23Gm6h6mb$p@G$P$LFv zGJ~|~!D~e|W`CDeW6QKhH#bUP!(8FpT9q$n!%SZ`ciNY;4+iKqx`hlIX@`JKp8ngm ztE6>Vk1~+$KmWE{VBHnaOYr3+Xf`<+C^`E25-`sjke(GWYO(d6KpW2fuo*U~=t)~o zL7hrqD+vLc0mEzbD;;1skUm>MvU`i-3(nF~cTg0nEZ7 zqt&4=SDQ7I@HzNk0_e0M!vy*+wzTP2SRaFT5LhJ`=_IgaI$&ODg$!FeC7v|YY12#4 z1zKF|jMYWgNTxwVcWErDH$%-JVwff~1?*j-wr+oS%^^^2Wg0TDk_q${jFwxlbPd$6 zs1+hudX8r5M_3;w*nlxF`ua8tTLUrBcZ2l|`<}L%0HhLsm#%#Yf!2c=NP?yMFomgw zqZ9&-LETng;V6Y*={Yh(f5_-IE(5IR1as6D6uoULk^X_e7R9;?_X`;|y`%~-^&)J^ z9mAOzHjTygF|p%(v2SJX9etO<_8ix+;nsAfh|wxWTm!>aNYU4~Sp^fe0h=jX^c+WE zouUZ5wcqceW)E@!A}e8fiyLa=;~KlD3v^p|_$NqcpO%0= zZR?z#oOK`_L9QD#XBIyU)As?kD37!T2^ec_42<;(3F-kWngb;$OQ1#T0R;Wh!Zl+8 zis~cVs&w!!*%XT$y)6Q@5A1}IUV(sV+yJ(ykRfx3K=by4&6wvdZ8f(Wk&dA28oflT zFa-6~n05PTG)}lQFbV={q(#}{o9)95mNO%6MqiAzHWs(;>`O2T+pkYF%J-e-%-41Aa3z~Iof%_PDidsG3VT4U@iGlU}^H&jIhSNNP z&O&0_c!H{y0A->M8>Oz`gGzwHYQUzI=o%xN^40}%jy@NAVEAnSR*KX>35poyvQaW( z6q}WXfHj}!ZATgOQ3Ea50v5xjy`r}@tuwxh35FPxozsXh**pMPw~iT@qOD=onNrv& z5Ew@Qn{vp=rgjThOO9&Y1Y>Q+7;Z;!Gi+gf0In(_gH@XW%pJkNBSG6h`#uqaoh|`e zUx#mN0)06HHT-heq_6!PR;giqVBhE4&SJWA_#8!hv49>fi6u~Pn5OXkMerxo>=#U@@82s|^=5vVs{vkv=<3DzZ$?E^&Z1A%(w zvUNn*J`iYxS%qEMh*3?K(>1KUMg{;0)@Fh>(93VDh4VF#TXH7u~zucI+ zC9{PS7(=&!MPgmEwMb!*VCgyd7hGd6*)IVb(So*a0;`7su*GvU;!1#G+i)NZlrTK? z*1Q00mR#v>&R;gkyU-H)l&&M?^+BJz7Qg-R&&@{7u{TFdZ@W^qAJhC?Yj{$7vDTl*2<#8P z{o9`=#_PS4!Z*9rLil+-U5@ura=4F@ziePHxHIVIUEFMD%=^cwZ~1H3%w;wEHP`^Y zZJ^~Z>;ENRHsAq@*LpSs6?22(+T3PTER97DTyBhpVLCl~u$VPxbuCH^jOGI{FYlct z#>BcIz?OW-z@WTAcjL)RFm#P&Yk6N&TaY%xxZz#wJ67tWU#3At^?{AYI+lT?9p->x zg0dO)awE=WFt7=9Z}=7?V?7tZa2PW1C=poMX-*Q9W+S~oAI>P5kH7{^1MFOecj*Z) z%c4zVr}g9`7_^UWYXs+3khfs$=dcC6)0@SRrUJ0<;ylCh5ws1KcTIQ+D!@kj>|0miP7VM+__W1u31cj-Lp z2H7p?H)CKNH866E>jO8u1G>`!C73y1eS_*YiZ+8Ce}S39mTZD@>22BIk9yRwwb{kH zu;h1hR+hl@M4rQD4R6THN1!ptS}AMUfYESM|FUUkgU1rnC}SlCe}UPYGxlb%_bagS z6}*fDxoj|;HRdoF-|>YiJ*rd*Y;*$DUop_W1=@6~lNh#ow6Pda{PIl<<_#RCu4pG9 zzXTOf1D7o^s=U?>G|WN~1Ai3!vvjm0s8b2#xP>*=Eoj8B$-(C8Vk>~FxaC_u0$U~k zdW$*?f@zA8fi%O34#C{OCVXe-e+iaAF>X+&Q`R6oN4uiQS^}#w0kpM<(Q#ZCFl|44 zQWG#);+1o`MUfcRO^Rx4%;BuLo56wwVD%nfV|Lu40JcU9x}^jY23m&ODasNQ4n6=z zb9S`NSnQn`NLaZdh6&VyGRJ|UI?dMS_H)!=w>lqz+DC4-2n(-)slN)y;UHp^SvSDs z9x@uXYryi}?qV=mt3LYh?%4SVHe=LK5t5+jE~ZaM>Xr#c+KgHK4sYIq+Ert-XSaVV zh7CgcayE`Ex8%u3pf88(s@w72g2L`BfYn*Z=2%6- zDyom7t!9AfLy;OR!P1lNvj!2P5A`_=u8|@nfyPr?zw}5oK!6*Y0n=sOfh;i=Tl}1` zOQx(uZ;PPn3|^8QC}J$^4>Z=)!+vXmrBRAQDUm|PnWr^3XgQi+OwqR{c0k;c5oEnf zY2K@1Gg{ZzH?SfV)&xDXfSH^}33~&AvKfJw$!1`l++exgSurX{2n9@;itfVFyiS3B zxzkfP+nCbF?#Bkf?2)deqi-w-twjsHmuhU{l*^QUp(%8emW*z_gyu$fDc_t;TTm1F zSZPG>1U9!B3tYdBP(3Z+uVjv4ddzr-kA{)RVJ($Oo>hm^fvj*qd&D|DB#!92$l zYC@mV_ATu~AC0NTZLI#Pjn#kL#_CJb05$j`VlbgE0n>;hM(3DK2UrhtMno7(CsYL5 zaJ{>JwZf^3dVu$x=PhluxA0FiWYZ1k_Nw1ypepoHO)ajy_Nu+{sJ-?s%2!-_?JY5o zv-cbjBiCzghIU8-hE-Y-qu`bwG8$jNKt)v#7`CQovx!sBxLn*`Bg-E<*t1e z7Ais?&8f$&tcTXfZo?0|(#d|Oe%}%slWcieHdxE`2L#8x43Pcq_hw|x+}VbfAHmXd*h%7B=b)PVO-eCv-?{{f->B$EU>y{&mNN^9 zF*%M4z~&NOV~ug?F4S01AN69hWrO53?jx8owQ<(RZKiKVQREE+t+>@xKZ2%fB$kE| zgPrLtkS}Ky>PJ9-F;J4z{QD8GmaaB%7el^IT( zzac{#%Y{AH<<|Qz!O&gS@LOs92xLAtb=;o^hq3U(SOUh3>RX$!JgKeNee?h^%5~{3 z2D-DM9LA!Bvgc}ZRme!}gaLAj+8^f$?VAXUE3C1_vUUova0nTAPy<-dg-^HWUIKv{ zwNnXb$!wE$XD{3gY)Jys+Y9>j-Qh6j<45!2JlO?qa<^tdGP&UV&yy-RLfI=}ZE{ds1)4)Ld-ZzfHQ)yCCSBG1%{k z7z2|yfFrpf!@4$gn#R1M`5a#pf)NS$6GMlAYcrsgxIRpv_F?1taN9g$tkRjC42-ya z*kVC8$7Xu(aAz|65inaDNZ!YJGZOohfWZ5>9|1Nc12s9TLO+5ffOZho2ZE$)u$LD; zhs~mw-o-}3NE4HQ`J8x9eiL{zW-7XdwUbDTGC}Ee10zJWo|!agL8Y|2<3WEHVayYF zEO+`_!dgQhGD!5dn5~t^ZKf=9)_4f+Bk-0HSY`O_8C(;lXp5ATYxW`fDcziG&4>&G{POE9@yB%7BB+=>??!} z(|%3=1?1$;H;f}NZ3dB1Ht=4phcOOg z`K?tL(ovGY%+7mqGZuSU0zGP|66kGB*}9y6r5L7NFm!M@+ok=d5rz`HOFxn#6i z^ z{W5g1?MXL(S$n3T6Q+)%(r)$TsbkyDt}%aoz07G+`$^6c;HX7GEv*N*MVq5?LfU>> zxzew1t%iValT)srRO`#yN*aOI@T3mKC@jqf;(JordbD>pmD3Nc&xV%KR;@{ne|XiV zBLW=Z_k;eEIu#45E$^`Kq;eFWk%I%dzpT-o@#$u%;XH<(RMQ#Q{2+8>{pIycyW@-F z_mh7`%A9yC`YGj?B-5dp9cH=*%X*MX)@C+GmHrD-nS=`{MOM%kX~ve|$rrQ1jlgED z5DVowsfe*;Wdd~XoVFQq7)iMEOlKXQZ-Id>@P4rGBGZ00cxV$hiRp{!`R7vyCKDiY z7T?{x5sFJEY%;dVTj-eiWx6t}h=Mb15xB`3M^!UntW4*dw}7}Nvt@*RDSj*{> z=ss4L7>yWz0&?Vv7@e67Fp^vOy##Y^Fn+}#D6GyiC1;HjgAkJ&gHB{>jV-2%V)PfMG);SI6B57{M z=o{Gj!8w1w1X<5vS96{lo6)j*r73oCZ)oGvxL}}9r%sB&&Yc)+@rQR=WvI{DVxu`f zq0Jcb+^8KVvFI8{U|HD-*Vm^}ewBpj~=s#`v6#^96+ zzOya4Gah1C$6R{SmAy$_jKv8Ifi~UB`UqH9Wt&Xb6%A?wKVA$P10BklD#gGV;sTUW z+~$xNHkUA{+2_ON5HSe9MU71xpq;sMZ2>3g0W5?=#u}T^iA)qRh*G~9&Bxu{3_c9N zw2QFDChpT&ATPIQ*^I)$Awc(zoa+e2zQ!1CHM)*~ZjOO)xB!GO@MU6(A4~4bA;bDP zZQaKD5w;)%OQ7Vqoi2P%i$Lkl+_{cGZ_Cb`GiR+MzF98G==VBmvGxQVwfV` zr-11a;a&R0!CC_Y&*F%|!Hpzfmy7xwf~x0`+hN2Yx~|?9Z~v&jtaWj229Rdhqc~VI zS&f8Sa_cPsWPkc%9l_e5WVcQ667Zx1N%NrL~E1&ZJ9Uq?VIL>*62jY(iSapq*Q zcYR$)z?U<9!WL0MTbH2d8Wjs%#4tfujWGy>?_x9Reixmd3mF~bnLu$tmR^FQTJOyt z6j)oJ-eACbWz3lP5X!5x#kFoz*HoPj9dC~q^|X&Uyun-ywyt#-k`Dy!R06#VzDv;v zBLRmoZF~<-O|31!Uj;>UIRuiN3zit2)vBH@=a!LG5r=@K!rVh#a<;ELSU;ZfkiP2lF+5``hpS^3Q+$cU!m*-*p`e<&YyP|ByIY zKiM+c4@-`kSyC&ix2sOi9hXhWdHF;eo=f0Cd;h?j%5ys_;jgYFLI1E0*a`SrL>~*d zs}lNK77|lsiA`XwaopTKAEd!S@ccFRk0T9{(qd%&!tbS|Uy;h*Cn`(v79DBcuQUkc z9$aaYOh3otXeRyqf<$U18!`x4dOHLDOP1Q)RiN4GSEMm*_@7@K=ezBle3Pp9sFfqxnnn{UeOL>cm%vtz`0NhC+JS#6EW_N+*PMPN~0ya%s;%_jX3U?+N12MKb zZ7m6W-<^1@gvI<~h!gl0EwnV|;e|)zE%5}RhHayKVOeeT8hRjE+AA zLCBmJ^d?|mGf?I*LesB1qK>VuE!k`Z@Yf9)OJ+`hb31!68a5?5 zXn&gZHbbs@0^Mm!o6*UQCs3UMQfUPhw@_eQPMz3{L3$zpJ7Le3C&t9kNv^`=<7J5< z!2)!FcU^uL*_gh;qpM$U3CB~Pe{Qv8SFZvras15@XenS|Im_i?toZ`YJR`x&<_vJL z2wx0NNdmgl4L2jD#Gb^$7xZ~@K#Z;Xq6A(7@As6y^)lEpB9*Edbr-ZOqpAv;h_e@=nBEVDVSw31!mF6-pDKxQ2W@FAmUa6AmH_}Q(~5fmLI zW8#e&RcWs-c2k}PrNgkrJ_m(H^7#liBkg>I7oi2+gprOw9u>B7?N5`_W>hRF0K7e( z9X}mG*4y^(GM;S4z$75R$+?hW!~ED_M0eTY9F~qu3f$`ZRb*h3B26W|#WvV(@E9P{ zTSA(y=WXiF*Ce0-NbVseD6foO&lO;P36yZkdQg2@0+i8Z=>*eP)p><>e**hUQ({QV zQu`FR$q2oO-av%K^*l73f1v@tLRY1iO7Xg2eiwb8?0nQ#zMc8(JIwg~&wD0rQWYpJ z2BR2s6PkFRRp9;C`9hB`%AmEuO_rGn=LXBg209FkoPfRe{$#MhA|=>+Ie{!` zJ67Rym%xW?9a*^qKCAN;MQjHgQ^o~1-#$qf{7VX8*;YWN_0Ms9 z<7|cZoPOgo<0aBc%2)5?z%}KONdE%gA`E+tFd~5E5=F2aK1~9U%kzZ8CJ-e;V9uw$ zQck&524J~!C*_nc5tQ&ZAaK;I3wZP6CSW@jsP(ypt|7p2kp5h+f_Tz;vD}A`ru13CtB#1c&pRb#H;? zzc&t#g!*n9wft0V%1fY}pJWbnL6Hw-Duotk1$}r6nsN$*+){T_=9Skt$jwqOO5Zuc z6m}}S(NrENp*)^S{h_eK^Fr|(O+~)hs63Y!FN!i!3eD$+#)~q?d6t@=%ZEd$atai3 z0fS!@Hs*zzQ$JsV)(`6Gd0ycq$mLYsbSmnh%&J^tAh#|0qO|t9%6OU+3W+4Ma+n;n z{8&lwD;rzzOcg3mV-yiks?J`ejYe+iJ@2J87RfJ&Dq$0c&`tO%0%f85G#dxgP`24Z zfLfB^_IHlY`ekpSkO2IaXT{d9S(Fo&-|9}pRSdg}ftD1S`E8HGh=%*RO$R0G_glcB z*1j_s<1qR3DXF`h!$g4Eo_iphQ0F1^@-(8=Q=T_iF1pyOe9#ri81dZ6ucsV0NpeWJ zEbjHI@VW}vm1fb-<@XB~{CvBj6HHSK-q~YKmA=RjX6rY=xYq46kT0^aDF@(m+4GJn zC*%=exK6zZgUCxHI85WIh}KhHb&HkVC{|B7nd}H>@4Or#lp=#1Zg2v(=kH{XG9@=3 z@HyVsmtYF?>*=jkPdRb0bDzcSONv0W$1Xqsc}KAINKJ}OKef7Ygr$9!vB7+Lu8B|- zU#qdIF6Gvutw|~jlR$H^(O%^LC!G31BttT2&_%0l~WsBo4O(grp;Ifa->A z9CLHmo~qVt)3lo<95NQ@0@NnVT^4AMlq}b!PJzw2}(;J5UT8rcpAWLRq@Rm65z0vV7Nl@}PEhTXH%`Kn&*r%m zCPGt+NI35=(ndPNF4c2o*`w;ePn%Xp<$L2R-5aFmY#zn)R${7X?pnXOJMi78$ zoWTe~dDFMZr8}jmo$@BohqK|G@NVw77Z9wx=@}SMqt_%ehX@?} zkDh^DKbBNG?TM z+vWI*3-ssS$ss5WaaTOz=ozfMj2$4OJgtBt;2lRjn1b(8y|rDAnE=pS1oL*v_Znn_ z&xJ;N3o6EZm)EDA@>v9WG^41Tnp%nx>-)dTD>>wG(S927M5u!!6^eM|woOIg@Re>s zX84exd=kVGsQtMF6=7gE6KF11;U+XJ>a5nZ+w2Lpq)e))z`zdSL?hsI6PZi`0)c3< zKy#rNHeq3TCUPaFf`CBM&n}Q%^r%gk1Wp9t81N+6gkL&tPGlq&_*UyvA@?>JoD-i> z=q;?QB1t1Gw9?a{%qi5yCD?Cl983Yd|DL=87^Dw!F;rfZrj043=JhQ}v*Cs3=teGg zvr#y&(n!t;NN^NBfNSh)>G3FVV~~+ zSlftR)5>=}OrPno4~;Q{Nq!C?*{9^=SE6DdoUFx=*EKjpSw04(Q%V}zUtU9|v1;9bcoS$MqIQq&!@?57-i5{c#A4$3A- zZ39eqDsB@9OAT0;+sW;_9B*&{Q+GtL#dZ}KC1?pMuVn+5q300T-t0t_cll8JDc3`q z7TTYh@FsAm4`8kCxs>jwd5!};d=%cnY9wSqiDv}Cl!e|xBIIPvfOhqb-aluSA1XLEob3P@(k11 z+6?5iDo0c!eSHq(oc!&im(w8+FSjw8)ub)rT6q0sD`rL4mf{`|h`&gN0bE{MjxLe# zqb96H!TJ_n!&G*nYuc}UsueM+!X%<-{NhDPFoeQPF!eRVYmA~eSc$`*HDr)foWysF zEU(dBm~P1phRKS9DQ!V)v19sW8(-j6n`VQZ98S#*nde2A24?*yw|fvWHLmWj`u<7o7{pYug3qtv>$w zpZ*Dk)MVn;>a_e7$h8({d*UiHlH;#Fd)n=C*kIXN>RY`k$7jGGNuDi&@z)sl@A~h* z`^#UTuM^C>2g)py*rC_Vtlv7Bt;Z>o5vT2gJhZOY}I%lq& zJdG%}$h$cCWxCGl@v{uRnvVTlALH$fvIs5M`CYAr&w>#)5ht8u4DnTLD7A*n^gzlo~Ay-l%TuO}?U7E`mQ~zx{_v5qMjIMKtdCeLzYO0IF&HVNp)hSDO zVp`ckm~&JSqbmHV_SW*fPygS4`ZEl6?Mcc)?5KPnf4=3!2PPd9t@pYQxZez{0!(c- zq%Zc@0|qX2L<(jv>aJ+L8H@ZA2^Wm(y0@SI{LjDt70URUT^W+CR&&h}!QNBGJsYEH zy=F*Zp2HoRF1UkkKTKuu&+=@9s51d)v_b zOJtv^pKmg;!)Pa|1!e+V$L0H)Nuc4bxy6DfUm~}Zh!O;*dH}ubM4c_ zy35%!6u^tm6UfLwnj4#<6zts-teo^W1Enq`SUHv|Qo5F>EoKuqjS2`7$6!U%MsgNl zpEDv@>&75Qwu`4ZO$1IGr4828=647naZs62NsA>g4{L_nu;RjG2^;;@#ZF$RQ5tRsOoo^C$ll#eQa z&visVJO9#7bAP(#i@^A*qh6e+j-G*hZPb*t$n=bbF;q^7D}Ytz^IX$-l~WJoS4`od zV$t^LdO%K3OZ6t$I74j|Hd>+^?KJP5f!1Y)8oOMUykwfSJ0vhpdBV`hr1RW)8@s$+ z8;}nyB9IS9V7N>TB4nvFfa=Vp6<gOO9R6VJEU>aICeSd-vPsUI*wDW2H$Y@3+j+^PW2FY*ZB>Jut=#c&`0&OuT3!RGkQo4>lOjdP(%ds;Q>tPer^PfQ{Gh%V}&yX zyiuVHcmMcA`0pC++P-#&MvP2t`Y$O zWvpz_gASV1F2J^Z_(G@;=6312u51U*E3s(fr>@>P*syUJ8HBH6J=k4^OT;D`4-yA+!3oi!OHvI3mq<`+BoHCapbG) zPw&Wa%GGf)0SBU6C<2i769_UC5a`s}HMKcDnA|hQ-=bTXsv3-zD(~8E#nc~E6y^IZ z28SRcf)az0MqHbho?u_IZ33IH5uu^{0PL$q4>@Vj0NmoD*Cfn5gp%a4<_MH)XPe+e zfkv9~u8@xaIo18rokMf2XoM+cxQQBgeiAp(^kl37d)hubC3;CU1-H|%A0c5T)jSBImRh38>~o4 z;7rVtrvp)%qMB6nrYb=g4{Rv^1Wf9IR!KR>QbK1pVrE?gA>;ffA!dYb zLZXe6A21R7kCT%(p{Y!DrX|nAYYE+j?3OPfHn!HsU)o5wgS?=p2;B^aikYIk8?|h7 z6A}Q9AsSSKKFpE7{o9{9tEjoNkc>;QG^Mq@t`aBwm{eTlEYWCLzu)x41`NH~^yDh= zEuzut^qBz;7(BYjJmt9mmWrXT1%x)SQUKO?`UlTbjypIJBTdOb(wdCG@wF%^*C6DH~kAUN~_;pJGX9GuTiwh0xJm4I{2-GoMp4Fd|)9T0Gj z;5A#xE|rQ1xIc6OvWrB7G1$~BoWFi*Oq(z{fv$~k7j|2ON~067uN@LP3bKF;GJObB z-+~-&Y*eaw@h6U5+e(;etcYQ=T>;>Qh8mZNKJdZUPix$g2my z8PSM9^i2REd81oQ1F z^v`j9p7Oc@Cnh37O*nfS=T2}eX2mkTE^0?EO3 z3!I0eXTW*537D=Cd<)9e9&C}Do~zqaUe&YuUO5E>;tL2&cf9;nUUUVPT+u@=d{ea( zh7YPIP_8zK$Hr5O7Xe=d5eU>75z2*sEr0~V5rOap0>d4G5`i->x<&QD|3$DFv~JK{ zEQ?)kO|ESho9k&-nJ?EH5s14Mv-5heM24>^mFrB~O8yL!2s*t0AQrd(-VC%^ z$L1Dc@_bFn*G}ijtl_v^u=AA9DzNf1Rp2I|GXlnR!OvgiI&fy!aXCKbE?+wA#hzs_ zP*P6HOu(GaMcFQ2Nv9;&Qf>oJuyQ)*h>UpiRbDo5&f}YqG@6q;KKk5NKJf7ea6}~_ zkT!D(On1_Mp7N%L?8_aiou{0dv~HE%Dt?~wu3HSB5OAJyW|VQ+LLnGxN8->t1k-BW zUgtdJ^nU}^9DG$%K8f*;q0+@8pQjv^bl~j#G!)EJK6DFWANbF$a=is^98YWKJms8N z8#qn{Q_dg>Z_zCZi|BxWtws+?MMhUJMXBi)ZO*5s|2*Z)i-WY_VNK#ByRSe9bxc{&L%T=Ibk~hHJTSy-VebYNP|!m15LoDt)L#X@61CM zY>`2OIaUsU9H0XNsW1hK3tzhl3V;G2tx`moNv+yGW{J8UQ4u0zs8 z>F$Epv~q$NasbO`vcPKx_X0**+)~=Qtz!vOL^AFTrXR*_Y&Vg$vDrD}^Nv7*qd$3( zrb1hS*7IoeWT8K-Qj+WeR@Lq8--WP|iE|3ojc!6>tP?>8)Pz2S?OWJ|?hE%gBji$k zc_kF#Ah^J}BqCB8C~OhIs@F|qmoG&=&REu&lsURDUuVSM1XdOT%Wc3z;4AX(wan$2~epyvcmSnHzz(h(jFOz3jOP+(Z?hk`8DBHxa%&i@)j$6?GTcy;&D@ z#dUfbry;O*sUno_Y|oppn9k}Yg0(mia5Jyv{&-8PP{6WIA6y(I zOm8iFSapQ0t~P<@+&v(q3vJv$-r#-rq0spmkS`YqU4%~5a04y%d>_A93b6dcD_LhW2W#y$q-Ii(o4(_OxvX zW*y`S_KAt2ew2P!&!?PRDi=w=-ACNzH7m=F!1v=(f}IM$eC+N{IX++;xMeH^LPBLg z+2D5Z@-52E`Cjhw&cYTj+A}2?1mXk~-kmskrQI(+WF{nD@dn=_Ti}@*q_iM)gPVxV zXG@8L#{8Jo$X79>UK-MI&Fd~CS~^bBpmlebMz{zyy|e?sNx8SFo;U+!ZK;acoyag< z(ueRAfs)Y2487AHrS(p|WX$KP%(D`1noSWVW9E7>7rI(*&UWaToaiPwQ+gh?_0-<9 zse$jpV@&$re>sVqNMHYzWX%2`{0KFbA5p}DA7A9-T*rYj^UeVzd*uU5uv2xt2@ep!@T|X^oT6!-Rhk`2O2U5KG#kft^70 zCR)ldOUUmm@GT;p6oKNtzv*&W~Nbw!8L9<7rjj zggg}e0N>c>g!JQ-D}sc5D(49Oi zEh1Eot_nCm#!YB=TM4)w+e4^xKSOh|nm0iket_k+qkf$7LgWg9ON3WgQsf2P#`GbW zS}IUoID}1T_+tQkB8(rWe5k6)y%JtwW<}({Y7{klD<}CDp!KH}<;N*kd&n43gIiGE zHh~EF5kZ@dZ6`;Jo`C>JeS_i0-Ca&FS5gZWKSokc&;bEbGe-nMY6=jT))P1hq>e+= zjB1i{wMmx0QG1|VQDO?}Sx^s@7rsSl$r%qtV5PhX?RqLSKTdhygz1u$|2X9u&?>|mCN$zGPU1Kl=VdGSz)p|K5unhy?m`y~Od_=%< z87@TNREz3i;shTN@Ye(^A9-2?JSg=Hriwqeqkf!nCezsh1^vgXd=s*B&--!8)o~g# zQqbhu&b9`aE=KT=Q?9YPVOI*iCgqwIS=~lsRI8jv0nF@twtu|JsfS(!Zi>MvVLwaY zbLIWxEx={=^N&vnKG68)J#DWLY-OTVY$6le)2#X9lxIA-iFN1aV_G56F9gth0>a`f z4UXCuo&%mh`IJAptv8-qj+?*{Z2-Ew>u)OGnXhPX&(*vLJ&)t^Uc-8zTxXv!E5qQJ2=I9EQ_(NX`(A1*0SX}|F@yROA9Y6%q z&x^2{+j0DH%J(hSbVquA9EbaLJnx4eH7GA6Jdxjgd2Yr3*ySX_26X3F_2ZPw_3*OB z6*PO5H@S&tzen}slsBCUATgz{YL(aR)8Ab{;~%G7zN);9pJr5^_G|7HwkL3qb#CBN zQn0SVV7IieQWlwy)te~+co2cQ&`n7C$V^<4!~hSvE$^(E7j}w4UFa>u5o;*ffIN*# zu6LnzkzOU7c;9>qN!Noi4H@3gxi~_mc^c^xl|2NDRP+d4iaULLW=ylgnvc z+pCW#a7tK_^n0%JGhAN({Mt+CbzeyTBc|Eu@oQNdNp=>#U>eeI0Gr zy5?C&m-{;Ur~C!Ajy~(@Zvkyn>*%wd{uWHanA>D&15LeI{eNQa(HXdf|MYu%gom=b zlLgiW6c?ra=Ppi7zwV^;78B(XE2nd33r-O@=b+e;rxCI&F@2rBFD!0wS$v=1oSutBmTVb_D=u-hz3rkpdPWa1hZY+%}JxBU^YJvgzv|ZSzv|7vX=}D Q@@NM)NHZ{l&v5_|0CaaUq5uE@ literal 0 HcmV?d00001 diff --git a/data/excluded_regions/cnv.excluded_regions.hg19.bed.gz.tbi b/data/excluded_regions/cnv.excluded_regions.hg19.bed.gz.tbi new file mode 100644 index 0000000000000000000000000000000000000000..7d4c318494229cf852ba435f216c61ac91f626d0 GIT binary patch literal 11657 zcma)i2|Sc*`+uwUv?vuBSt_!elaOsDNs^ExyE#Q9JION4oKx9D_F~MCq^u#!SZ0dE zSZ0*nV8Spm#+Vt5S&aX*oRjxG-rwi{e8%JY-0O9JujRh4>wW++TQ{xy_N)l4JFmB4 zo$^L1q1IHyOr)7^y=N03_IIm(*IO>9@tKDeuRM<0>miePCvrfc%_wa1&EIdReT?q3 za&^qvToe`-78M2i>!X#8l?|a&1A^Bmce}AaHd>N+Y)F_~6`<^g&&(Zf9oUg#?!e6# z0VDb>Ls+83W*rnzRM*i{!~3DKqsM(<>vNg7yDI9lcklTrVINBFXuUMDSHv$#p}h+? zl=U=&-SfiDbLwMDY^m6O^&7)aTdmyQIwuX%swBqKCXDfL=P0#RXo*uPb4e~ zkZN{32d%T~?gBeC#`kcHXb4ewxvk`n57BJpkQ!3iH1_1Vbi zmkP!y2IfV^#wL{om-54c%LkMk6~$Dj@(I(6q>$Rrw{P6AiJ^WzirU%aVVRSUbh=-xbrnam$B?2e9gAs`}!mG~@5t+tTP+BvY2*vQ5#nrpvuy$h1)=vD?3|+cMqO z1GCa_3`;#?X26xH;=Bkp^DDr=PCHW-+~7NGX+-h2b+KYN1`{jeKSeP5J}MC`8@Y?Y z1r!^Muq$}P)w5!OT0hx~k*-7b&4`G7N3LA`jlbV8l+Ld`63(Sutge905v_EfWo z($A%$(tFuzctnJK{i&YJIl2@a-AWg>H6MK4rRYW!*Z!f7wgndg`ZA zqwbrBw#*UbWNrS+%-X$cZ}H&Rtlisc^{5!h^AAzm6nCRG7wxXx%HO8=*Vci38Wq)% z8GgG>9P03X6O!WM7%fwfGH+=vgcbzDutJOT`Na#LqGb#dzx}X-Ltm4?TRsqV70I4R=W7EQuNdaWv0L%>lyU;ATjzV{A4B|~Zx3Nw zWZuuYzBOcwaKF%p=QjWBhXIgbvLw+P`Ri=I0{A_GDFkuIZ#uhSYHwt#?& z3l2*5vg^c=*!nBIhJl=Zp@6h-r|Tj;ISpWwO_P$-r&UbG$Y zglha$V`9lg@Q7RROA-nM1IPH0`EYz~i=Ro`xDW9x{ziB0D0d3aks7*_s!dK_X}dV@ z0>-&8t+7l9JF|(r;=I5XVp+!+{IiBa>?vigYQtK>!mp=x>$qLl6%qQ9A?Z18F!bla z=uJE2L@sC*xsM-wkn&nKcX04hzT~Svj`a)e-SPe-=u7Hl{sXH=Cm-|+xs}8HyvyG# znH6&BJMAQJ5HjMKR?`_{FL!|Jr9}8yOgyS6=N~S<6TaPujH@LHGABM0U|&D(0lzSf z5A*7(pKqnl>DFWvD?BRDvjdb08z#b!fqh?6Oqn z-$nn`;>WBXL#`x>N0kWVme^s z2Vw<#$|BTjSE1HZ;o{m)>Jg;*+RhI;oseEr{rk+2e$D-C6P&%nPK-n}<yj0>lCX z%Av_Xa8%ti$5V?8!v(X?FUd2fZD|yV0&c-1Mlf{65O~>uN}e8Q0cAc8mRku>W#&`F zi^-uj6pxpgL*O3UuT@Pg1+y<9dD0~{I3{%QwO>)Fy#uFWvi_K7;>wqU0`@gN-Zx}a z@i&cO&Ft5br~3wKmvQ6`3xV-zNh?wv+glHJxE`qxK9%WhcA@im)B|AhxuHvsE-0S& zJ~ny2J?sEg=F+J<`)^+$L_WWfxaV|6mj0`)(p!Z-m4(GzM66^#OAC^=X}ET|i8a&kgU$j4Npdjkj_w_C~aY$IL3$Qnu9fRJ_imUNey`Usg_4U(&n&V)UT( zrc^+vtESr1&cUm14KoTNzkHE8_^H$8g5w$0N03d%`^>6UFKzblq}KbwL>aq$L z%=*gYo@wyGpHgs7sgl9o2m+`HRyFk6)ROeF^;p;#{8#BuRDGKz zSca8NF*^QB26a2)Q~jBCt)ritTSiI>GRx4rpQoQy-vNTCB0tne%u(J>l1l-Pa+1=? zbu)b$1wn{5==jqNW*PAe1D-~Me)U{D{!3U4b##J^?Sy+w%x^1_KEv-3n#T+o&0ebV zhmNiVmX!dkQv$F=ggUeTeE;N3q)^yKy?}G?ALKi3Y|#|i9d_Z(rU$q7rey8u-@Z>g z{n4#G=O!H*^!|9}%XjiUO@xtO=Owdfe-u3V*AfMb$2~f+R`I`K$#szk|aFrWO1b88|H@0!uv!le3Z|6cQOvdGf_~q4HLqpZY{CO6I<(7NLq`ZV zY2?h+Z1X2o49n!0yBvafRAUK(cl$Hn&#$^ZcBogncL4{o^b%#*UPu4K?UUL$+xxu| z-mV2~^m;1cl+kU0bC}C9nM+W7cu3*ku7i}NYE8=~$wMUQI^ExQ9WNjGZKF^rVdt&X zZO(g32WGdt>2H^khj=(E1#bga+KEJc#1)T-Ur}ya&sSq83b(~a&9hnUa zUg;=Iw)!zmh;-jC75vGR@ud1p?A-AgfT?GA-c6{MDLSHd)GR7AsRQvyFQu=PgSB7& z;t-Kzj*XpT&5uijH-nNk_N1Z25fL#eSi*@y{Biu{IvGQ#eC_1-WjPhZC7(6@wdB>h zoa*qZBidb#pWw)mK~J--*hSgYR>9l^5&k{#y5e(bd2tfOg;XK@tIk;Xc& zAlS&21Kj*xH3^m?^P&{({kbUeS+cO&gv5PQQ!2FuT*C!^9IIn$gTmS9!SRaxiUswF zdxO%kz$2=+Lra{a%jpm-J;GaQOoM-LBsMT^i>D}A8ZjPo6JZKeA9u;`y`L{fp1DWA zyL2){0`0}5bc}`Bni-uHI}({U+5!`&i<6n)3p%LT+TZY?KKxQteuc*enidkdfDvD+SNnwyX{7j( z7t1F}X24>0#t|0sQ02<6N`bmX^de7p^gLNh$xs`q!-khE%*31N^K}-wN}KC+YaUuQ}e18&-$EKApDFk^g%7!zu)2hrd`~ z_~1}x=$wZe>qZtPdqn%U^rX!MAHrY$5MOr*N%Rk#)-DQalUKN5Nq6MUqbffSM%;eL zfnfYTGg-WgmLHg`r7)dIy5!2;4%Q_WyG}Jp0ez9x)0U1s$f}I5)f*2)L%|UYW{6iy zfTO}Jo1+;b?&3ou2TjB;(G`%jpPJ8qkC(+2Qg5d99&>&|-t&uj?mIC7ldvz+JZSrn zQ))|0nYHJzdnNX(ZMzoZnRsoFwrok8<>{)^p1SJA`WHT|pBVM5lgG3kAhg>u>JYX+o?j8Excjp+?gkQ%&mU` zwefxMcLsmF#P40iXniG`hSc}X)6~N0*-=XgqFPWjlul}32CwaVbMb#ec;lDx&VzE35PO!B7XMoqE-XFhFWrDY1uB8Q3<8XUqK zeu#g52{3cCWTCSXe5k0lufiMo)n9mgl=g?&3t{K*M%F_66mU)xxpPtXkT6U4TVbkL z&F?k=C2o+btluPhL!mE(7vo}e&e-Q{uV)1gYd)+Ar_a?=bEHB!ZSz{;G*i4^7k|m` z5q+-y#bwHG*ZM?T9<@Ylpzs9aMN?91?KuvAQF|HnQg*C4`SpDm8iDV_ zm{X=GWQw*CWtKu^4LjWN!Bb+jvv|>}E1nvDu}wN)3{cq!UsBnZYfYK(uRUd(YBNM; z`k#Fn6bb~;%!`|ZU%PW$ePF5Z8hz?{lqj-kyRRzWU;jpjv!m+%{SOv+mlLR2 z4ch>`2jO9`8)30=Z-uoD7H652y*QHU9W_OYq64dqtBrKiv454e5-5d zj?yaQn~G0=BW8swMCxB^IBd0>Ri;v2JJIms%psaf)aw2A7i~~;&*VRx34-Sg=W-Tr;m)n}?VFWbPyj%>`vmck^d+QP%!^!c{wyGT(7Ae^Q{*1$vh zAR1m4MGjTU7JoC~o5WpGx0ghX&RU_6?gjak2O!0%?vm~?kiLo2I6Rn@tWDk(p$ECl zm%B@@Q4!A1uRn%oZ^@r@%dRg(ipMWIj>HbTxd|5J>lQ3H5by(b^Kavqk2Uj4yA;Rl zVz_GBklX?8DARjosduqoGwz9udTtVBNs6~;(#z{_F8;woK=31VWMG5q6J>1Q;Zn5l zqL@PiIJYKspWRC#q?pn2P+3}RP|oORy(~VcZS+DnK1jJyJ5L*b48P2TGur*2mOUY{ z7lvolxrb+LZ2vjNmi5#bBo1B2m5h%G)wY9AuV|f1l<6BZDw`0OT_wDq3}F>#k9ux* zY3yO??QG)8F_!lX_r41jXSa0&;NIyP-!WSxL)RdN>`t&~6Pg zN)_df^@=>F1^Uu7jptmLv5lB1AGDZ)1ZiJmgrE!6{a%pfj?5*r)#!OsIzv#ll)^8IMp5}mzP{Mp5MR9Ev#}Hs~i%)3-YT1wDL5f;zuQh?FOCnG{=y! zmB6-`kXACUrOQ6lmEg&U5j3DWORO5>=Okk6$`ZWrzh~t0c38o7%CBW6gX^io#zxm~ z3Id@FeYb51CtpN9Iv9Pn^KzP$O!d|c(sL@Cl+JkHRNm+DVUH+uL)z^0!w|s4Ef9K;leuyOJE5f+c~K_~U3Zvm1_UMIF*PD=pWeod(fU&fm^K9RbJ&dS z!h|^NW}a&}G?4ER=a4+V_-9a&Idz(uukpL2|SJUo~UPwIaXOVpFV>9{v&fJ<4Zz9R>VGwi)v^ zvX?bnd_n8tm>~ZH2*UI}+tKacO$O#2^S!F9S%+&oyF4ujRZjF#=AX!PNq@6ra|-Aj zj4UF^rf8s&?i(gL5d*&;cV}E9)D08sr_OvEq%0S@B~khP!djwr7bFkrl`iNW68ch3 zDS5J$Ab0YltIYPy-~;lpJG^&!zB7Duq2GH~!Ka$dQ5Ui;@{`Nd-^8`EWK*y49=Rl9 zlY(4u?ds;p{9q5aEeHN>am{wc%8hpg|n zKSo*$>>cX`ZsFSbn+5__MormjLmqykyoIht8~#{cr}H?E9RSsi)?!x&+K2dxf=MyFNua~z zf*vm&&n)51|2|8mH6feG<4c+=Z#>ke^?g`!P;46elZ@wtWs~18ukvq?P&vG0xkl+~ zPCN)BluQo-aH+`U?~#^?pmISISPz7M{|#)N`9WFXwXlc~_!OPXdb)eoeM#FRa;j+8 z5;$<3R?|q9`fstUtM1pfjKjQ+p`sqRyy^USQTxwL(poVW_mxI|d{^}*q`e(I5IUMW z@;ngGXYlu}mO#5;8ft#U{V!44^R4ebXXTSy{ZpBLzi&Wf-vlB4fiG<9@=Js&cHkx;rwtTP=y z69gdNLFO&}ZWp%@WUs=oECn{RQT5nlWbKp=(UWAfm&`Gr$8sk)esB;{YXuij0H6Us zO?##MFNN3hdH5ey*u7RKuzMlzO6L=aoFkqeha)3K8`o*cU`u^RUtlO3hlJC{aDyKK zwZ5r!+XBWs39HhEKLPkL9*NGhd$C(%8LqOuF6G* z9Xj^;wCHH8rf%W1)s0F70l<&vmZxG}eB{%(Q>C&%8Ld{9+Y}csZnT^=YL3Xb=N9YrV&;=3!gK(<&zHVs zfb&Hm^cJE<wASP^M-BQWTx-nJ*BJ5@Ohjb{W#4_X>+B)dKq)6LiLJqC zx_E_cFwS?DU;BKz%`d&rxTT}j9!U?$8}d)X`PnlqF?W$n>xJ4986*>)J4i_tk#{j^ z8;+FI{Hs(uDZz4jY9*trtos%>kY(h)bYe9Ucf64riB>|(5k9I~! z+{06Y3**R+!4-z05?6W=$Eh7Cg728E$Fz$vhbm6*6H?N-=hlZaN1C+QAJ#&=G1u^~ zY2wraPg%;6ay0yO*%IA$s;_4r%!f>J(#5aMa&Ykv#d4lNJO@wdSnleX54hOVcaTc) z2jT%M)-l*>(Y=HbEF5wy&!LgvJ6S)B#4IgC=Pq?J&v`%tu7~Pa?4kaw~z@vv(DNq4L#a{u2n%sI1@= z2~<$RZ2R(i2P7wW*2mgr52Iolb`6@lTwj7D5)S(!8!2jcVfZ}h&qzO%MA0+1VVd7v z89s|sBVEV}T83~FvS`5at|V%mL;lLf=g90!f#e5_w)6I z6r*q62PG5weT#I;z2RUu`z$YFJ3^S-XnM1&$H^`5kqqiZ-F5H4zm{*zEme?m=BNs4 z^D@E*d56>-Om~MZ)Te+!=e6k(;5+wA8k=t62}d3Qz-q) z2>7JR&!#+s5CcJ+46Z2jt+?)~dcbnXbYSvLu8NNGq-yT8jjBicEo)1Z$wzdCwA&6- zk1On&(r(%zZaw~Woeus0``KVJG7_yd6HR9?hu+la?VwDsIca$zJj3aTi%A<3$WU$< z=?@YL2JrN_9OR%(D-h?3M?p`E+l!G!L;17A1?C8R5z4z*1JAsdoKLwO0^b;_n^G3C_%^5HD zxxTn5qW3BCgH-7rJ=v$-4?bMmw>QQ}&f;>9?0cyzMIYYX`ePr7XL>$&htw3!7|lbI zR}d_c2Hy1y8klK$#Mq56I61Kl6vKUR-jEY(6k zMvhgY$DW-%(l@N&86V*&Y{)?R20o_UYAYK180aeQvLISJ&LcEgj0Z}_@n0Bk{0c(H zb_7HljCY-7dipTWA|jbWF}BV90E4k^2=A!UudiQz@xkvY{<)X3n}^S@CHh}&zlqov z^aIjObV0V^sxlL~7z8)JbA`iz(6e}e5G(DcA$0H8gt2T=#Yn<}z4>L}Jxq7Wcmk=y z;q-MlB6NhrO=xZ!z>ypq1G0!Q02p_U=Zb-ngJ=A2iotK_bc>kQ5K8nD5lbvk;n zDHH2vPrDt!2o|Q6sst5Lw28{6f~X}3KhxemRBzC1M+gOe7p$`b1{Yj0!k8)^mhTeQ z(g>YzCs*Y#vvupfh`FZ&B(HAT*Ku-#VY%5R=sGd|FDEW-(ocK*3@kb30N!s=zAgQ3 z`|N8a+9JAHLkn+?xIrPj^Kbi~M@joe$0#5RU!uk{$p&2DD17$NYEV`kCe8wFTFQ$T z4ybePcL2g8H0zPA^GyvzK`!_Mkby);&L??j6@Y%F=W0g{Jo9zTr`n?G4t%W-2NknT zmK3$21TGuwumgF4Q#Mwzix@zb7Z&B;8&kf`aCW9e;P-QX=kXi&vxY^bFcQF6o`n?X z@00A$gRUsUg|tPGtNWk)duY*4co1+LX|HF5CH5ff`}C>#r}pPxjnn2LbOv z>75GWO5iO4$_SPYk;V)GRe@j6GkZYxcTbuoIR*aZ)J3of6}|zo$sdU+oR{*cgQJxv zi*hSU>{WG2rmHpck|#;r#|I`wn3WB1Tsp!X{61T;rUo1^sEo&0_m@1a5?L&$uRJxK zn$CaLMCv(9`}FL3e8j{QKD2E#b1|7*Q=S+i9#jnRlVD*N&Yc}dV2r*hc@!^&Ux=M^ zwQbdMHXq)Pm@#EL%`0B-MVLcJ;PD4>u9}N&kcm(eeFWZ73LdXbOCckoh;Fiz)F^(u z51P$+?Ja{i(Pwm%%Li+L3dfnooUh8LGDD_{A4Gd8V>ds$(S56dz}T=t-f$U|H2jD4 zI+<(eud-5{ALi7?Ck(upLc%;{^uaQq4zRGLbAJ8&V?mPI2N65GBFLgiD!<%X#>nkeY_0Ygf*>7s( zj^w2-zzw@QgqSNxvs8{+COUyNx}O@wTD!H6x7;O08zEa+0Zf}7%@N)wy)r+80$+4G zQyxbk0GcS-iKhBG0HuovvY#g2i)4)t6Am{*U~2b6Bl$T~TH8Pu5&ry=3Bd*R$abUw zV!^X|Ur=wvXiQ5vj)Hj{`t)^L@Y3^YH5-PVsRK-rQ|=|xyCBaf^jlGDHbPLmh7KTA zK^ScX_H>Y8Rbw4s66ms$l<$8fZfNmw#n*fBaJMdhzIoN`SI;zwGG*=Wx44yopUDn% zhhn(@WbK^gS;)w=nxSvj5XRABr8d4l^}C=TlRirv2wk$>rB$e-S!dC8NdsQ3z~9L{ wtd;nXCc*~%uN!I^!y8pJrh{q6Pt09vB}zyJUM literal 0 HcmV?d00001 diff --git a/data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz b/data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz new file mode 100644 index 0000000000000000000000000000000000000000..28d416b74a27292355986f4d77163f870c31562c GIT binary patch literal 14818 zcmV<8IUU9yiwFb&00000{{{d;LjnNBIfb3e&MZ5QoyX~iv30);8y!OtsYk=mNVebMz5LdgTz3z><&D=1gUSQrqjF z{`tTCr+@m_fB*Au+%4qM@(c|$qtJd%C?!}8AG_YZT4u-Gt`UF zzHGPx7>3Zp*jvXsx=^!Qmg}C9_O{U&;R`=QS}4t&J)dSN3)RrSTwlZ3s97;R z-)7n6n_sV=&-Fc*J{@n;85Zg(`u8(wQqANMto=CNq%AL$Mv9(KBd_-##meIO(!)m9 zX#Mj|n%DEK9=}{aL*CcV6l?W-ifw(LS;Cj{nRJ(L)?r`T@paHiK)v5>Zb{p|ezy4K z`q`4MFCmuUzCY4g7RvT1JfE!zEmXgJAEfL1lrU&Xjvuow-r9C_^@lWVWW$VWUr1N` zFR|3(zCY5M7aFPi_cQ5O@0WtrhT~0I*ZbLs72Nkj8rwp7bXWgKM_Q=%?CKwBS@m!I zH2CtFG_LaH8k*-#+BZM?^zUzv59|F)VD!6rleW1~s=j?AUG2M+l70I&X$c#(bdQ!x z+Sd0eBPL%yljdDM8|1%~Z`yxCHIC}bXVSLHS94Fkd?w8s)jE9nOgdNj>_hnR1L+z+ zN}W0S@|iTP^0gSh^dFP%{?l^!(tk|4`%f$LOaC$H?mum`;>&N+y2{^e%JJ>5(*YMs z`Ah#X=~(6Og=x?6G>e6r4ON~W%gA{AvH_Z-4!-f8$?S_d$I~LPKajz{+A;O9SN?g&xK} zXc~0tVk4f=o2kXxNCmZp-i)a}fr*rEDD*Hk>sPBJ5Ut73n@Nf0FVuzJjPpe zy_segMcu7+p*LfXHOWg|O6XzgFjdK#E@%inOpDf}_&j#oDD-BmQWG7I1}gM0!_*AZ zf2;%g4OAvImCFHl??@ai^n*??|;#ypB^|GJwJnE0fC+{aB%8@23h zqkSxaeM_4RqBVgEXaWy0V=WWH3M}v@_HTRj1%7tJ@DbZYcU)+{QAhwbhxd9FaY?&? z_4N`QZQr=g1!neFo783lfITOBI?quBtoKDvo7yDWH1_FlZUcQ>w0-pH=}dtErm_`J zw;1LEIr?W)^Z{T#^6za@E3koP^I3B=4O=%KDM$a_CQW4ly0fQS)6IwVQUBgkDwqwZ zK@IVuw^ZOVH6Cdn0`ux%VM|RD5a{C@JydJeKHWJFi#9!JF3`fMMlWecc{Wg58=gk0 zXHS858?_-Us_|OCFjMEct`)NZ?L`fAf#*7#6lXNGsr0(Qo3I{h|JJ}*;30+;nvsgJ zxxR-iBg`Z;1=X&Yae+-Rf|`jfFa+L&HElDQ4cJIO%Dcx|m!I|l^8z+HU4ufh=4Q4S zlX^ror}mYosbP!{@(cJa#y z82t!nA~WL|1WX&y-z{!7iJ`Recbr3mH36&W#?UB`Pm2QDw5Ss22EBZtYfKx=e$k_K z21)@N2JV~Npc+D7zy_A;X&aH7z%aeg(}RTz1JgzjfAbk5og2XT6)zQ(u`V=tD$UPfr*VX7-U#>2w`3B6SEcYidseO$_&zN1Mn$)ZWfK8U={v?vx zU?nip7dmT5G*GXJM0AT9>uK0=xH@P3py}qlUG~$W?UPdH-JWrpU!twXrvS!*uNJFG zT}`uS?!A$YBw$*Rt2Lx~(Z;&(jg*PR8jWd-(a|PdH#gPJe{Y)!qTXAIfPZ$!umB9~ z9!I0mF|3)^Hm9o=yCSJtrYS(peK8x~t=a(1#4_a~9G+OTaPC-*@mWsWl-6~fVGBDXG3do7-}aaTpFE#9>z zsVTPtHY@n6TZkK&Q8yhu64M7@@?{-8ZA1c?mD#JKC%U*mweIHWiWv!LHYR)82A_4c zO$4rJ#m2X%AosLZ-L zrf^R+NXcp~$K&$t_wWL#5qg=SggmmiJ)uwO% z-X>juLCAejo7(Jy8)&Q1U9^2AQ*!|^f2s4P1z~~O?%`h2srNPwqx!)EsYZupy1ySh zkWM{2G3I)@*~kd67JttMOPcj!HM{v6shL~4IgjzQ4d<}{Y=-ttv9U}d+GdL{?>#W? zUu*!_E4n)GY-w<{jd`YIIJy>A-VAh8)ZGWQQJf`412zv;_v|gT_M8Q_68QdI?co97 zmVe@htLRfduU@~KKr@B-dTK-)rBD&w^HztJlo8E9%dQ!jblQjrup!`9hmg)pbw<-A zuQf5zm4Qc~U0Wt`ONXA#B;x62Q*ReA_14p7eApNZ$jx;=N$GrMbhH@nk0EI)3)Faz zhe%Ds1(=EF=3Yq|@N7i-GBrzDHkfXbku*<%>eOGPHb3g-EDgB#PGU*;ifM?RNZ(e6D?t3F`y1Hd&Qy?w7TC8)r@i4Lx%a{Zx zPUkR#jy;wIX7_b&)&c{bjfSVuly?gv{*r;27%v;Sg9hpz56@cNr~hFJldxLu`I)?Vp? zb(R!dwQOm%9w_i0boTFBHu$-g7C&MG^SJQ&4H$mPMB31p*9yR0laewYmu`bcxIdGm zXm18=7>5NXdN%HW1ih2F&PGBuTUA9BjaHbhh!o3s_rlFV=IXKN+a^?w4qr zSF&Er6tOQiH)$6=*hX^nU0}zGfHg%|sV41n z(YQZ?jZDuWx$Bvn%!)*{_NinM1v+w`Y<9RaCn)Os;geb<=m z&SotDdS&*$Y182=y<}ioyH14EbQ^$8zy7@kYc>G8i<|l+&ARzC_1AwA>-FkHHXz37 z+WBPTqYB)6$Z<>6kXd5%?cAg@2qddt_q~y-6PZ!(tusm0{by!#%F)HN9bIj0&RM0B z&PDg?v@l6AL)Zce_i~z~G3eQBBKe}3w6CjYy%)rmwDQI5t%_5}k;-rcG&jE>?W@+z z#Bqv}Nwr`MFw6SO)w5#?SpTxw-G7okR#Rl9{9Z3xQf!O=|Dk0cPX5VNX(2yoz7Zt*qe8GtHPaz$$f!?Dle{0w%DX_1@YsE(YVgL(&l=qr(wpL@hX%m`4Kd8AWUK8|LflgmmrITNk zPA?nZG8?1%vt?%S=CuBtx<-0+8>{U#UY#weXGb8ttheM8Y$O2WYtmjc3eW=m{ECY% ztPy2-Wt^iGv?-Kq53vq4%K4c@vtHQ&BV1k=q6dbq02_dwW`S>kcJ{kR+I96bb1yr3 zZCwKFANreHn=QO|Gqbb0_r|QZR%sw=tLvssnipL{bkD|suZwQi%A@GS!leKUKb~gm zSzxB@pPj3Xb^@*UFLYzy%K)1#{^l*>>6EpB)qK&Y48C_W)pWWqX41Z%Z8l$Vjj3t6 z1z4S*ULI*n>jLxI@O-2NU?X0tu2PXUiq^Ha<>-jLCQvUA!L+6N-YAmOmE5E`=;pZZ z`DgRNf&jhS(>C++y*typzj>o<3#939-a6}8270Pr-W!jZ4O(^I+bm$SNf#*DeKDL2 zN*6Fw+1HwldqFS8MxyHm-8B9{$uS9o>qMrcnbHoh@hG_WMyk=lS{i21O;@}&UpCw> z@Y75&zRy_MaRH2)?$(5K&=w8M6z-x+Y6iGKPj{_KnsxJ(o2xcSZO{OeuQWQKKppvh zaU1n)PzVBfy5?e)YC|;M5y36Frbk);<@U)CUF5-Opt20$Xg03#!H|3MU2BrcGlTWA z=$~yvvKc_xbkC5>U8$0qb_y_~`hIVuW}pM&@RJGBu|Vp5xS}DQ0@?oIU28+uzn)!# zoBkxVdJ#aoaCF0oQ@~VBUo?|46|qwvf~(YM%{!}F`sn2gjcyFEYW;HcXt+TD%-Xnn zk)#8fiMqP)ZF3tQy?|b$yE=u*O0B-YPocJGZ;p9D9t>TaG>W+MuK&hP2Q5<9@=Q1@a;6D2pr zEp~r3@d`lzr!`MwDF94|aN3}xQ$<@2wT^CGy@XXUr#wiiam`fa=H^(WxTF{`#V)(X zgb#b=*)}Z2F?-?O-h4{&G#{rP7NM-Qai;3m-{&usL74zrrKI#+2Bi4 zM4D7`Gp~KEnY8JH8kkO>GiggG=mRHe-;j~!McKvn2t zB;_xLQ{Ix;_Drff<H3Jd(yE%HZfAPKl#o%~7!Em%PosAqfLG#&y1*WkEmS_Poj zb68(=N5c>p=g7NgT(}m<=P0mgQwtXu{Y%jX>+OIUT+!s{%GH22ucCF_I_dlZ`KAYZ z(3nJWAIgsduvMx}FV!AcKb zLR>buJV<#VGgcQa|-0HjvV_#VIbN z(ypru`kOr6t0NdVQJ$nzjIn{E%cL^+1 zFAy&Ub4!O-Y4AU9ccq#%D>&Ie&G~LEl2%VZ_Dec@qj>Q+vq}X`H zMw3|IeNxM__Ra!kn%`~IrZW~g1xj^oRMMj58*OQMTDxSKU8R{_p^$dneD-T|p>3&_ zNKN&;0`{OMo554S{+X*z?KM#}Ga#>UP0=|7-P}Mct~yORX=Usx*?i#X$PNJ@Mc0go zKC${1=)qUZ#O@jbXvLn!cM*Uk;)}-aL|&j)|7^UX7btzWIydRisLhYQU^Lh9kxrWMv$J#3k5r(zwk9b}k?o*X?&g5BW04pb)r||1VnVSKckMNa zjuEFEfy-%a(=D#6S;+CUnS?4D;d*zaBDG;jV5aVi#tMpVZhh3hcO*(npqee_ZZ6lX zDA23>phVqkf-@c0CNo!Zq;1iib$Lg3Rz3j)UU}O3Ahj0z2yS|0Z=3810+&DhrtvNW zn9=Ls+oby5Hm3IM>1kGP73g2Sx9P)Lma3-s_3CJ(%zp;r{zPz2Wnk;bg6?Yldj!=$;D1&no- zni!{P4msENu4w!GX8A|prwZRbVsvQ%XAQRx$8z}3yALhrSW6&9fxsq6RKUi%0rb`S zgV?`W@DcbZmu61m`fY_;e{_in@snjI0!J}{H-U3bg}`6`>|lLK*#$FTde}Mc^etUi z_j^`fbS9z%U|;hK-SJ8QD8GcGry1A)GaRd@34g#?>C9ro(NmG=0$0im5~JV)T!|?) z3%0svBW0`ySPgsHG(gh<0V&4&*^-(<0GK7-wT866LH}}dh2AAxfG=l_zNB3-vT@#y z_+1tnxXFoEY z#@w)V@#-q);;U$Dsk8^cOn0SAiou8z&^72Sdg4h-AZ+UWF5B{;(X(f{!a5dBVE6); z7uu%#rswpuX7?yCQu#tp_7()1YiyG0#Y`y)#TCsaB0QZE!F-pOis-~X*8*1*K-W%m z9Yvd~Nnj5QO_9Ggy%}f%o3N@anD~Gx@DQf1PI4z0r~)F!7ZZ5AVr9>WyZ|*54SAGW z^pwgDXxEnTqO}tg*@GLWSv)>c15IF;Z9x11r() z@}EptZBKaUc-iR4 zml=q~j9Met-?Jr&l$yieD-Om!DyELj=4Ke`@4n%%E=73T=vjZa*y*yNZ@8xp8zC*E`)3(PAZMpBzJTNg<6%gs&7d}Cc1XYiXY`1B8fQckl+ zG!Z&Qpq6u+LNt3mJR3dr?8I|RdlqO{th?w+-U| zeW9%xtIlV+7Li1=M9BB{v03vY1>8XviymwwCTzhg{^OA{Di@59Ry=Jpj&5#hT=UOH zxuaStI<#vJNL8sgS4a0^I885LV}9^OBL{|_ZTFe3ozt<@iKMb=e1e_*4KO8&L0zU`w`4QVJpZ}jXSzG~V z+2<+1{(V=sB(@bD4uFD(>Qd0b508l_YJ^P{?5yy@uU*Mu^b^WoE zQ6Q`2Hv9T&894qVfi$~oE!cR>I|f>E$iO>X*&k zggInD5_sE-U*amX%(_Ad6o{}Kjficx5{?yp%-KF$8`3-n{sOGKj(g4>8ji3011S6a*w-eT(Ho1k1R|eK1&hUAG?1e~X-kvGmXttBIb}4!d;G+Hn(T@ypE{8M_6_X)-I9xe^Nq z^l&;iEV`4#6<`z3q1h~YV3Cs=Jzn9oq9vVxCa`Pk{o97o1%CF%@Dbywd|YT%VK*%T z=nvNek!t2OFsrKzq~ignJtFC|@BujU1w1q!wSYEUUEqHeS?HQ<&K}+~qSq8e=Jp`4 zUr8AOHVpG zYsvBPStFQq|0F8*hggxQWf04#zmFJe-%4#YUf-qHo$XCmFB$@G!m6BYFYGpfH_=E) z2x09U@FDo!aYhW`Og}CfX_nU_GuRipD7$Z9Dekb8Hf_2uo@_l{xz86JSMKvb?Eg6~ z$1N>WptazyE|?4eYu;zCj9Z%YVy&I&CKugNj|K3@^t6qseSz+7zQSdu9BT!h>U@41 zSpUM!>+!?cK)1lpt{XpM(hnbZFDrGefWkM=Qc0ShG{xn9CrOoQ*BX2|v|?q>oN}vs z6?y2?_fzc}o9{`2S=tbI2ofML-ID5H9xmbOo6h&d(#B^R)_v0F_uJw3XMLm}e!y+z zyn0{Q62YEDAfIUvMANouon{FO?CX-<##DD@30~kK5_8K~V4i`Fo5uVG;0;tynx)>N zTo!>0$GtzdCI3sofT}3%fDbV@k%Pdyw!Zk(XE$r+S8E(3VJZtr*#wpx%dZmI1ej-n zK4asqQ9#Ov=geXWbP*Wim`w8)npyyur>|1p!YNh} zzuBtHDT=N+mqf(2g4O|;XUZY5HufpTkDU;IA#4CF04|Hi z*K%HknFz{AneI0#NoN2|GWnfJ2J;q<#oD(Fj_ob*zL-7ru1S)zR!YSvl1w3C*0!8~hl#@<$37=hmB8f;6VUnD+}4 zn^3J23e4*h5`;~%r@By0d3auKX+q;nth4!{^t_;XceW+1I(Gwf8a<02AeCqE%l*?8q^;kX6!rOzslz%9iZlRpZ7hcKZg+}>O{yh4Aeo5&IU*tp}bY`M? zc(yBqYWY(CgY3f#wV0ddTgtnB#;F@^eiUUs2A!+S4`1tZeIHl(2*4!&<3{ruwg`K* zj!!s>P_E}75NDCAU0kS}#l9yNvlS=#1qqC?v0CV1p5LG(^q#xE#Qo^thD>S{&X8z3 z;eL5pffWnE-+a_td70tIISc$Pr)Q&a)=U}d$Z((bY6%M+oh;tg$&&sqDPF{F9qWM? z#j#ITmJpvZK&i)q@M6sxUu?_ALPy!cw7@)Lh96<4+mNF#IM#%?SCIbpJZq@DGOwkm z+&J?4>@AZge8m-tF>G$#$1#|?7RXm@sy?8H`P45t-Yd(W06Yedr$|UlB%q)DZn2$k zOtO}nG3&0KEUgNpb4)2)NN#L$k)1vgusLh3gza{Lr?ic%x29C34So!<<&h477Segs zB_uIYY9pU3y<$nB&;-Z?cCx0`mZ;ofLmK(X&Ym>?3XCvsHY<*{Vq5{%%UGLIV5UGl zgPR}WhzqpPJxjDEz$&s_Y+^~V18lVI^Jb&`7y;!REOE+7O>5Fp@5j|BnL0Z`HYD*( z?5*r>O4zT}_yRqiwa3^_dbGAsitXxRnPqNr*Do~l`L=Q50-=M;rUDcD(|kX_K~?BI zrhEyXBTkNpSC-K_0k3s+9OB2u8lwUI+y-2%PP5~9oOJqVi?wNcELH~~-E7gK4!%TZ z-tW4FeO)`x%yD%K*^SE6G~T0C)+JQfoq>7|AlC0Hy_f9t6<5m^ZkE$t(gNo`_GWt* zo0LcXVr`PXRzPg23rQ+11a3Fbo_kDy_ z^_+4`u3aIl0M&IRvF z9QwT9D65MC;uQ-k7T+1Z5s{bf>vN6`X9<}C7mKS&n}t}xLZ^=D9=#Fb#3j(?T^o7% z2Le*_&oyE0D^4dD$Fi%=`~h!qN`&5bGum-4*84<)&Nq-E-;yX%!CGy@;d z-0&?S)iIUsBs+F+R+?fM&LthOrQ@ThpvQSPO6653x-$n8c~)rv35u|`P;R!fq9Rbw zH06usJSf=`+UvSlvv8tn<6att-2oLi`1XD`tf>gLBv9u)U6DF3=u9VqHC%0w!INztAS5y zp!jK4^v448P-c`JnDx}KKs=^zW69i&hljv8x}mW+S~b%AlAyWmZ4l(7(ZiH9KqC6< zDuiVJ1;pMx%YrN5I%9JQEB+6FaUNwSR(hSlJQI07LUO}%2-2y>v@I;Uv%T;!bhj-e zfiU+_;P?8&(z3bGt;F&Im6N07PfV5JI{b zi{*faDo{eNj;%zz0Q$Mj)V9!+UBEoI*o{q*P}2$M0=o|NbF#s07E!sH?-H`Mz>!Di zLiJ*$tN_O8xxH9Y8UZH#{bpHiS}YTsW6iuP0zKbfR>G1OnB@!0j!z?P*m?Fq+d`Gf zG_4ksV{tPFSmPPxX2}F0z}X*n4I}KjHko>Vzb}!Bj8*{G8YN6!V4RCocC`~6nc-|-4E#G<>LHF91Y9P+pKA5(2QceTeXhUVnT}A zSru50`9CljMTEjlK^(4V@-)9vib+b}Fkp zt&f$`zJ=?~*78!jSrcZ)x($8|>x|mCStf?FFs_+IEOQCNs5ORM>%}G=^CQrEDUKyW zEr7I<;bsSMAp*6xSMC( zTNs(N3QnZ=mnwX#1;}saSn>)9^fTFp*c9;30F-m0?OVw76HWa&cQVCh_NoA6I=pKe zp(aZy??ZFCjiQfcd31OkGtAF_3~Q69ep9|Osde;&{wOE#c`rRrj=ktg)HK^aM!nAqP93>b$?53CZgq&|UrGF`1IU zIFtV_mJ>7qor555wi6jE(9i9VzJ+S5q_B4*140~Mq8X3pVyDN|Fe^PK3ug{lWL1|8I-<0t8K2H zr`GHjX!o@-5JDr+=dFiGSb1OjRJW?SgiU5)>aNW%Z_Wiu4|iQ5R2~tV`_C20M_6PZ zvIum)5t9nDwpp#A&buZ_ub5RT4$X6=8tl3*GkD&*6J{_G7qWQEt%o4&s!6g`yH-ax z&g)v7{w_9A&oU`j7Kw}HJji8^_UUR8AsP;uF3Op-VE!yEYK6GWS03*@5Gi(<(CoSdO)CR2Pf+>)48_0I+Z6 zN;PPXfO(x^-nURKmeXg)U6Tm4WIzbjt(zs5p0vC|I`VPXWF{pyWzMAazgv4&I(`64 zosK0Hgs!c#-5NW>uCYCnXy<;TtWYE>B{Nv4*Bh#7Nrr9iSGFwV2*`W7aYD#tU*Uo44Zb!U_> z&ys0gpp9EY=Fq1x^hV{1$m?6EmQy&i?rwE^kED^fW7Mn65aoS4 z^Xe=X&&CaU@hqA80CU#s&g_0sHhby_y-3KM&}N99 z(W$HCmssdwNa{g8LPG@~;QlbCT9gr|_A(E+sqG`{n+*!Tgb*O`GLxzLyaY@?7ndd(i!MAmXIa`uuI*YyVj)* z4CSJ^*unG-xSd5{pB*%1;5^#t{ZOv4wE~g3BB`>w2SJ&v`>E$z_tjJWnED>}12Gkag0xx;m6P z22jdjMU5>is;gN}4O;9}#8=T|);+5@T);eanX!ePmV@% zuaCoy`a)X@{>|bwE-;GgDG0N!#U^m2nrvRZ7)05px>$s2R~_Tr(U|x5HFc6q zp@2DEi$$mw+c_NQZj&Z#>SbgZ4eot$^5uQyR)3e-gq$!|2nam)u$Pe4B!GegZq|eY zHq%fu~~)E4PxtNdCF5EVOaw&92*(C0ldNH~fBM@u;t{Rq&yk?gJY+_nD-Zyu&PvT)_Z7>kh-#tkor9lO4xMgp8H+T5;A=E zFgU64Vs(@YYzpz*BpO>t7d5~+B90}^ivVskj@8BlppWM7+&PrS0KV+?W+M@w0`c@P z9b1S?7ALNUdvNF23j4w@9hjNh{f%Vlt5j5#yVi7ysyo+l9$P%NkSV8=OWW6#P18SB_fN5NWV74chSvtD3a<0l8aH?uQ16DtX9BQ29I zZ=|%*OlM1s#FI33If?m+Li#CfLdfl5sJl&(Y{$wXtPbcDYmK@j$4*8U zj@i!j`glvox+BNMo~tEdt;vX;QQ#@hl#K-7G~)Z7=p}kPEuxf@U20ktsOKIvYz!!z z2GRoiiCw9g5RUNOA1y0)vmf*Mq$O}C8%x^hIr}gWqj5~H0=ylvIj^UYi zBrrGE_dUB-j|_|jmYk#Kb5RDkvcMy+I+^aJe5{{Ig2ZKAYLy}R7KS%N0v|lILGzuG z2xgK$WoliPj1PMy6@PNl?iJ;k;V_pcf6Dyv8x)1!bK5UJ-TiW&-M?NW+va-xS)!CN zamlxsx3}Q07upp^Sm74`Q>WvWHsKNMN%P;ao+gc)pUdf#qdbWMP0DgCOTszo?|zn|! z77iDiy4=EX20I4AUt1{Wd8_@?neW#Y_H}JPH^>)DvKk!kaVy9di_i)KsA1f9jtqlh z*^X?EO+-!u^6B{~R!1o;P_nzWv&b)SxyXp+Q1>X1<8as7j63cScwjOUfk2 zYJa1y#H;SPL{7B@dbn#6A&Cn(sy)ZMCTUes&qZBzrQ!^~DnQDGi{+?mwRW19t1E)atgs9Gu#bLyq&SHExNQH#j}Pyb zeiqb(I(r*n8enWUTgf~Q=p4Iwvz)5a1<(_2cH(@EcVS>R@nR7U`AeFPS*Bx0lV32o zP|~+L@T+*A~i4 zi?Uw>x`c7P(XxpymRO>(z?n}B}oX?32W|3hP^cW@9 zLNi-UpeD2#+#|Y@OUX4Lr}Ad3UAH1;u!sJGA?G2xno$c-pv{oYr=xKjBF)BLT7@Af zw_-06sVww1+J1XG?)Ph%Yx8Zi{eHz|t!>+F$#PsqTfB_6->*5h%~swvo82OeZL{UK z&GuW~HL}^Z?Y5+}8QE;xc3aXx795*ctXr37jq*2EvmzEI9({YQmLk~W$eV86_sr+> zd&xMkT_wf z$ISWrkv2e5aQAVsgv2Eh=6tT+{a!*$4}e8;FVnzSu~% zIe~VjBw8#-BMJ0))rMvrmJ&HRPv98{2-POJR^XWW)~>4%j>V37jY3OE{&8|$-yH*? z7OesE%!Ms>WVUR{@RHB0fWNme>bb=0SI6qi27wlmXJ?E8EylYZ$mP$Otp?_qLOlbDN`G_IdSjv`n*YX=d-Ca96)ighbF zg`Y%SL}BIUClS^&tPaXqs?s-sKkz~9Te0W)ld!(Nh|VB&JR@(S%qL6l99Y5Ay<=L#F&&*G-_0m(*vUQsr|ftm;LT)}DJ$E{Xz z3&Bihslp~uu#a1{H*vjS+|4Rp&nT;Qi!#$4#4c6FZ|Zo)et0I3?;v*5CDiUD_A_wi zIA3s8$zUcvKh&ySJbtdmunDO##}(@ZId|ZyleCIu)?~g~ZyKAK#8w|&WECu~{D8+b z`#Gjf%=0b7?ueu{Jex$^&mkD|ye5T*8D~>XF>i(x;3u<7U_Gd{vkrC%2w0w7u#V?EB~bpugW09F@#6ZXX_bxBHF`d;StVeV{9Wq*`k((hdwW6Y zLc5MbUBsXNC-?t>lvtZipvjW|U!FD$841Gx03VA81ONa4009360763o02=@U00000 I000000K!=k%K!iX literal 0 HcmV?d00001 diff --git a/data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz.tbi b/data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz.tbi new file mode 100644 index 0000000000000000000000000000000000000000..07c1a84486b2742a503987fc3565ce5a2b84ed89 GIT binary patch literal 11035 zcmb7q3p|tU`*>%)Z-q+f%`sMz(_3;rEKx#{R3xlY$eE>?*;c7kLe6J$>mjbYJ&%Jcsy?ZY{zhRS}ExM!|s6$+CH)b*mi}#Ni z6w@P5Rde(@xCZ;j*lpWhe3UKVls>aFe*@42Xzniol(yw!>SJ?|z zI4us_`@2=)IxAz%OC@2-E%E7j$|+hBqm>7%HP2&aJa2eead(}#J?(i^plUY{Y7g^ZSz&!M||{LN8XxwGwx?=pm)Qnoc6XrdeYbk}WB+Eh=9Z!<_D1WZG*E&^vRwE~oW}T4gAf!md+pHwil| zR`z-+AU}^e-rl}_N+@5BXH3}s^O!0Po=*s_XdoUpF=8~A>d;dAykm2OOow9OMhinBa;r)22Q;3UNp)qPFJ3@55XE*~!CLizK zTK}C<_IL`XfB+B|VnE;QR$D(X22uS!z(qm=$>~CuENwWkg zhJG&X+$LS`lO;Qa$zXJ=Laka9zf2}TBnNg$hM6_!kUOe&jTGk#951Bb2>jEIIns?Z z22xQ3N4W%M4 z0vq7Gk5y6}<&*GM@ei2=$*!nrtnvwX*A3)=Rfj&l2|OypWpt*19j@9E|7P+!n78q; zD1~xD8J|h%y=zk`&7@C7I0|cHKPD{~IhkmZHmNsVs1;`NBYxbV3WMFZ6^pmwJ+m z;3o|63jbh^fXUE-Dsy(FHYMxrE@(c};LQb_bQbj?Em4h7JxO53QYvJeR6gotw6e-2 zsAO0RSOv=~rL>AxlGjf}33!(GL;=kFqq*y~@RmDQvyRF-8s6-6q_yOxuOimFkKmcX z83&CuKMkF$f2yQc`|!Bk!7W#Do1Q;P+j#L{!Jj9skL1U0RC)}^jD64u#rjqvaqIYW z@c7IOCICT6&B;-n$-$`nr*@Wfr}rX~Zv&eXgl3dlJ-_CZr7b?7@S_+*L%cfp)M-iJU)oyvVOm2JZP*n**D6U{;X~eGLVba% zhEKybiEGujPu-Z%eW{&)H_y+D*#3SPiKA{??5_T6@%0_74RFe)Tf(N_nB$NrR)vVi;%uR++|g-G<9iFEb^(Ux3o=_IH-Z{9F{ed zr|s%6Ws<6a3Dh=x%oy(Y?Qdte&eTC6$$OUtzFtW`fD;}){2hWrk~c1Q{Y~_K-1PeN2^!x zQxSH|x2zNKQ>WJpWpuKoZHG=0_nAp63FR;LSov|(=0FDB-l=py6~tp>LXOSU8De+{ zAr~D)eC({1(=_EA8R>ITeAsS+IuJ1H(nKnn?YWRujs!xfeO0ANg~pg59VkEk$O(X8*>8LBiQX69KX#E-2)m8GM*!AsicBk@1-RSVJq-C=co-TD> zrfJCrbXN0e9VU_Hl zX}+7fI$L1NyhN^TWL@+yjO57$KUsJq@gR$v464^H$M-lYR$DP_`%^Kn)rP?i8d}p7>I*j9`&bna8d(Y3gr(s$OD}|(XlPi+$d_^WB{4_&2dhJ z;E&h?RdWdEWvX&4Nh6DLtrFZ=gD11DQ3(ts(77CL7xoRvDkNhkOPTfUArLfL)se}s zN$E=mDSh8d0cM8J;QwiS@|}@wFsKA}cS&uGukt`Mq|tQd@_fdJQQ}{u+LO-OCF$jt zt%#sOa`0_+qFq^EwJ*iEgdZ89=F&HKBXDvG0jdLPB=15TFf&wwe0LPG49wx`uT*f^ zb$(5p^kvtOXO_`w)_Ap+)>rL*-db!^o4T{|X|b!p_r#(*;uX1%)!WS~?mS34@ucJ8 z&3nZ+O`a*PFUhVu5^iU|AS`EiCxp_MSo2bJpUZVA?KeFsa8|I0ua5_f8#2dSNo}3V1QoT9tg=~LF%gq!eW|9!QIr^c~xd;Qc7L1M>Pmm`UNFjtf^m^HLBG+ z#MYd^lVBaxPiz3p12iJJA2gDuZ`F>jDrPLCf&Pf^5BMe24>57rTT~Ia6t*GA6nu#o zi5lvidx0Nr3OpU{aL7unhZ9)|x&~(+`%Z*w@(cap2LHJ73t-F75hyf7x*RqKm&Ykg~JEuMkLdN$i_-Qd`7mb3W5S_DppO~yU;jI3#&x@=iZYQ>L zH$L?0^o{&!{7I~<{IV(18H7NXKPCl$%k&SviyvCM-1SQ!Hg;kChHNG9YG!I)N9(sZ zb>3RCVM@=XX6F7qImO24#3a#|Cqxcz3L5-oH+R12fo7Qfh}@~SkEF}8Pq-BwOx|r| z6dm^T=B}dO&plrE{DQ@0vCWslj@uty;B&K+ZhbyGc$%~&L4=}3NoQjDeq+A(WXqmy zex>@o9Q$~Vb>N?6*wWL`QDH%`*Saor`*wbGhX2($jNR@H&);P0HdS0d#p-sFQ0sMd z>I_B^!;IhGv|)gaUKFie|(ZQwOX9%w4QdM!(h1C^jX7 z(<;03{Cb(MM=;)b#JHlWoX&c^8(ts%NsOq|%Ed%H(lhF6C)=F4+l~8jpotRDxM{?2 zE(nzk97yo;cZ(S@^p4Z}5Krowe>?;Jv)U4g&XHy1IJn{?J9shO#jF&B~oYc9>?SHte}xQ3nQdHKeo1aIfDY%&Va*#AfFe+>9T=B zGsCkDQxASO$Ve{Qo=$&K{fC7&^XE!E*)_)JphAbP*Qx3wAb(d z5yN;1mzN`{*>ry*KWHmH;MLAS6wJHCF9E=QrSCpGCST1LxtVwf@m|d9xzfE0tiQ08G8^Ngfe&yVBqt=hf`*6*@Dz3Ytnmn4`zZ#m z6jscG{1%TYqOE3~VS#lHyNOwgZu&ZV?6<(pzq{}2{==ju_S&D+Dw^*;z>}|1 z7{fIKul}f(Ml}co(dlrKGmGY@l|s5rkvKMTA~$}%WCU>&C4n*N3_5aEu|kVo^H)$SsC;#R4*!`+5N8|4lMz>q;ZNTi=grM*nOj0&HVpL zr4A3t+)l%eY%+_UpdP?=UE*i|EK2^y6f|&GXV(dR@EF4!Xniz)n^*f;FY3(8N|p&v zpGttJqaIa_i;N9Bxats`EEM#j2<)8!*w(Dq!Q)Qa12Yly$AJh<&TB+^YUH8?28lG} z^Bl;Xw*4reG?a$!g83;=kF&rP&!C-3r5`ehuAQ^h;o&Au@D7A!B)5CG-B5K!+P;#j zavQ|P1A(O$tJxSY2rD())*7wZ*rs?@se4CJ#+tCUwY$E(?=(Aq=XbGp)^`scyN1|k z>B1Zsi92T*Hhp77@K~|P=VGzT9qEgV;6FUSKXdWGX>~@ztOsJm`vPjj%a9C~24hRS zO6OYz(d4$e%!)C7h8GDaCY$tw`rn=3A-q|UZC!6btI5+N@L9Zhhqai{B+5qZtV!d2 zu`hCeikFMlz@Ke-p0?TG4%ctEJ}+gn!`4$GqMPN){5y|t4%!rQw~M(h|GrUn4OsZP z(RM(KHcDo3-WLN#VquK(%1g~h&oKS+%FBM`$>eX{-_1Y$yZ-lkL~N;(!2EpLX_=Fc`nRFman~!CnlzrAWyO%9u{8(L?M( zCNiKM4jA3`sbDn<8j7O8JgWO=njFYVI^OEldJ^p8U9)V0L=>_SGt=J5iL!NsC%T(r zIEThFiXdv!Ig#zv7s2-j;I%$Cv)f zXW0hy<=4MI*+6Fgmlfu8gXfo?&w7V1ed=x;u5D~>pDl0ndB;*sQ;dRetbJVgUH0VC zOTd?`9uOsuq?bg1hTe4DHyy)$t@edo*GLTtz)GA^%!68lUZ( zFZ!%)(V=+i7E?DCbyweU2A|x+2Mu~12`Qr}T3xtM%ISRS#X(m2pOqeE5Dk?{*I6d~4lIrt7!oasU4qFvF zim&QI@jt9gG5*z75F{Y8q+g+94uxEbC02PNyI~E>G&GdqT4!lt z*lG$15X6uI)evj3HD4`~PpLHT-X11&B_Z*YdsL#G-I`aDEK;})-8_%-G2YLh{$dDEd5Yx(75cPj8`~7$67jk%Ks^3(3^Ov zkA0PZFXZ~_(TlwPLA|_=aX^oh<)@X)n~j8TEGoV~wVKxV2&%Pe3-M!;LSI4l=Hd4q zIR9nwx$8?uTiceS9im6g%etR!|J-)Py@wNa!ud&w*oMK5FT1yWKC80L+QdZEL{!71 z2+&WFMeeUY`)X2BQiL@~hpF-3YwQiruBj)ST#Z+W@ILG}=bt@Wx*z~{q=Vmmel5NG zaC^c_h)9B`@0;#-Mdab6O8?j0$yE{5iYRiLcvGOGq?6;cJ`{Kcs>^zs zH=(zeP!tyo9%a>6W`n3*+)`yFsDY_tsm82UCOyAQgK@SYfbXSFwZ|lZ@H(A|{q*Aa z+1aYue(6{A!3m_gp}A87Yp9(Z;6QSfWuaO5vjePPWCGs{s1q5nngApmgGxeRsS5`n^>+j1F%e$1b>7kB7ydRe)yN*DGq)qeARpmiZ7Gsw52Q%@kIIN$ zBF%%d^EtTijJnHH#zaI_nO6E!2y&I$I@4NLO_V z#}7YLuM6?hHFSH&@bm*%*5eDP4pYY*5Qq}k(Am^SsG*!Kb=)b$1P=8+ge*5L&t6X@ zjs9>A@^OEuW%m8Ov1_(MOfyzWK!h?sK!rm9$^KF0f;(4CYH<_6o za`zv)-Q4oxfX5;GLmFsqfl=?dd$Ml7`Q6+Zy#8)mhiHlBt*KyE0y&`w~_@;l3(VTwVH z4B8c(mWxI8mqV%t(6imlp>xyrY7OmK=1!bWxXN{ud=DwEEci8fR)3Z#E)pLs%gJTk z%7%xozdT*h+W!jtW@K^J(4B-fMS{LFgQIDiM+15&5vwPG#FCv;&XS1Uv6lIRPHBf) z>tfpLLj8PSwHG?J*2_$NEF2n{Kak%b4|SfoohtbDve)dLAOW8~Ht^I?+UH}Ov~;?X zWyA9-MT4$ds-uq;KNmEVCte5f({KvBR3!tfLmB8ZbIQc90Z?yDpo*oxsppEq3xdY3 zxi~W^wVHL?1=clii&@&%tu_j21!QLo9TNN4X`~QVrt?^Mqn44i$v5RhvwgDnZn@d; z+l)j*TQ@db^PLzPKN#R|Q^i<|JiDR2`zT&$5C;Wgo60Hlu?qU&ITmecGyHzWP!IiR z;tKSBK>n928OZ*@xW75E^mKIln6+h*!tnExrBtdbI1`R!mq8WSk4^J1k}zB@=GiQS zRTS*fHr^Ac2JN&R;!PB$vlu|HRsb?)0q$VwL>*@t&7rLLTc*sWb2w8Mro%e|=r*SI zP86spUeb+S*F0gLN`>^(8PkRMOtQtO!lUXI87Ap&ba@ zLMt`+)?7UK0-@T$%8B)Mj}HJ{G0GkbY|6#zQd-`i%e(Y_ZeTRB#;J9Kpntv8UG{3h zj>MWP$+qD4bgJCe&P42q>A#}i`pnt)wn;i?45$aGDAhS9UY?1OE?^81KcUi4XT@I; zdG|TNDdqK8f9KPJXkt0H1-Ad%$Z9tIyq4B+_|%1^{{~X78TU2yIs*3Yl^4J1{y24= zk)-6O3yuFid3*KCdx>oQ4VUzq$ot z0SU=7#wq<*Y#rWDmxCADgioV{&wT!;Xu!#;REzpastJ=|%M8WAfDgvjJJK=-Xb|`}5R`H;7 z>C~vn8{~3K60=)@0I7`|RzyWE1eEQC<^w~0Ds2pFCJ~C41H{f3;%}Bw@2%{U|{s!CUW*D>jx9?~P zkDU)oHF(+NnafC?X#y^U$NkAq5%@oje^bWnoxJ3aXm&kMa%D=tXrUlT7cY+qcdr55 zSk0na*3sJZPQ_gnSQPDc=#glGs!7TjqsyA)@!#-ev~ZLBR$`MTeHY<{ZJ zRXk>M?Y`uD4-4ZqhM)6tef-x;7Boy=O5Z&JOlH+IMeg zgdDmN?P?{EPDSvcOv=gOJz1INf;5dvUVQfcu2v@~wl;J#?juL8x9cSx<&+7bnhPd+HPm2b#O7?Tiu*L{03 z+EY<)v@6H1aA!7*t<3fc7{{NG{tcGFc*>@~6@$0J2YWTL@qD<9aKqyH<_opV&y-Yr zvD(7jM&m)=H*o%ap%-|lvi=@;E0kY5C0vY)RWmp<2YG2oQeco|m{9LO=>dR+R3;~l z7%%v#1U(q&Iu7}W8F_nN@kx0GFPNfFvgttf5%|et?qBG@>!29nNwjX!DW#@cPgMrR zmSld{>K>-obfTPs`!XStJNOV6)C9nlOwa3sl)r9Gc_+w~aB&QX&_qbrYQC^-11;{b zTb%2j2Yi#<>^$Kwd9nZtUtj8#khatJgr&CbI#RAnG?Eje9G9%JsrYS2tjnj-FCA`0 zUVllY7~M1e3#@naR^w>fGnfdRQM)#ed*n4m2>O)_$8&-q1=NaanTilKeTO-jZGqR} z@6{ewapI@1I`cHiJq_y(_e|$1tRL7~JDGMBxRtaXA^z~{p8wm)A)&+PIhR|fxX1)c zNPo8>NVw2Sb5qqK?JJ|Xq2e=|-euT=GL``ovXVkDQ%0}YVc(ovF}mQ^8F$9sprp!y zurfzjF=R`1vm--;m|xv{`M4T*PFhj@I*q_Pw2)NE+pn#hgrNDnK$TNL4{CG92t45> z@EL8-WCV@zTmUBBt&I?Xv$Ha|&5nF&$+^?mxvbB4;;hHRH{4#-2_TpN+!lJX9nFQ# zs6)7gzKI&#UeK{6ut+FW05)72aHF?NtFdYzZtO+$R9<#Gm8{NFB8&1iV-+u}Ii@iBk*=Swh(%93wyhFHyRF{nL4& zm}I;(rBJAjV9FCqoET`&2v|*6VpZ+K)z32?$Zbm!9FdGagPkjF6M)e*S6gO0_##W# zmwmgBaC>F=enlu94)CBl`dW1s#>y{Jhash;j&AX{tws&1f+4o*=%ygsYS54>c)-BP zg{MS?&w?7@qg6m_mx0($V+iK15Sa2jNo(Y8!`sUcdQFIqfuXWGKcUlDC(rE#S8faSG>W$>ee6ec+D;)JbRhB7JyUo*nlVUuQ8<4;%0Ui@y;#+RD%wXnWFW#Zo{>&pKOTBb%D976g5A z{yn`pEz3nXtkXtX{40(KwH#E*yJRrpw@U!0Y^=)E$j#FCejiiLta?&f7g3E`sR7MVo=ChdJ z+|NfP-)xy0tS<1!QW5N`g~M4S!cKnc{r~HVxjOGY&JpTq{;F7+|L=bUCDyDL{0sZz F_#aA{fVBVs literal 0 HcmV?d00001 diff --git a/data/excluded_regions/convert_hg19_regions_to_hs37d5.bash b/data/excluded_regions/convert_hg19_regions_to_hs37d5.bash new file mode 100755 index 0000000..bbdb81e --- /dev/null +++ b/data/excluded_regions/convert_hg19_regions_to_hs37d5.bash @@ -0,0 +1,31 @@ +#!/usr/bin/env bash + +set -o nounset + +# +# This script yields a recommended set of exclusion regions for CNV calling on hs37d5, by +# converting excluded regions from hg19. +# +hg19_excluded_regions=cnv.excluded_regions.hg19.bed.gz + +# This script depends on bgzip and tabix, customize these values if they are not already in the path +bgzip=bgzip +tabix=tabix + +hg19_renamed() { + gzip -dc $hg19_excluded_regions |\ + sed s/^chr// |\ + awk '$1~/^([1-2]?[0-9]|[XY]|MT)$/' +} + +other() { + wget -O - http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz.fai |\ + awk '$1!~/^([1-2]?[0-9]|[XY]|MT)$/ {printf "%s\t0\t%s\tother\n",$1,$2}' +} + + +label=cnv.excluded_regions.hs37d5 + +cat <(hg19_renamed) <(other) | $bgzip -c >| $label.bed.gz +$tabix -p bed $label.bed.gz + diff --git a/data/excluded_regions/get_cnv_exclusion_regions.bash b/data/excluded_regions/get_cnv_exclusion_regions.bash index e4e52dd..40ad865 100755 --- a/data/excluded_regions/get_cnv_exclusion_regions.bash +++ b/data/excluded_regions/get_cnv_exclusion_regions.bash @@ -11,11 +11,11 @@ set -o nounset bgzip=bgzip tabix=tabix -# The reference tag for the genome version to use. This script may work with other reference versions in the UCSC genome -# browser database, but this hasn't been tested. +# The reference tag for the genome version to use. This script has only been tested for ref values 'hg19' and 'hg38'. +# It may work with other reference versions in the UCSC genome browser database, but this hasn't been tested. ref=hg38 -outfile=cnv.excluded_regions.hg38.bed.gz +outfile=cnv.excluded_regions.${ref}.bed.gz base_url=http://hgdownload.cse.ucsc.edu/goldenPath/$ref @@ -31,11 +31,18 @@ get_gaps() { }' } -# Get UCSC Centromere track, simplify labels and convert to bed format: +# Get UCSC Centromere track if it exists, simplify labels and convert to bed format +# +# This file is optional because in at least some older genomes, it may not exist. In such cases the centromere +# regions are annotated in the gap track instead. +# get_centromeres() { - wget -O - $base_url/database/centromeres.txt.gz |\ - gzip -dc |\ - awk -v OFS='\t' '{print $2,$3,$4,"centromere";}' + url=$base_url/database/centromeres.txt.gz + if wget --quiet --spider $url; then + wget -O - $url |\ + gzip -dc |\ + awk -v OFS='\t' '{print $2,$3,$4,"centromere";}' + fi } # Get alpha-satellite regions from the UCSC repeatmasker track diff --git a/data/expected_cn/expected_cn.hg19.XX.bed b/data/expected_cn/expected_cn.hg19.XX.bed new file mode 100644 index 0000000..6c6860f --- /dev/null +++ b/data/expected_cn/expected_cn.hg19.XX.bed @@ -0,0 +1,6 @@ +chrX 0 2699520 chrX_PAR_1 2 +chrX 2699520 154931043 chrX_uniq_1 2 +chrX 154931043 155260560 chrX_PAR_2 2 +chrY 0 2649520 chrY_PAR_1 0 +chrY 2649520 59034049 chrY_uniq_1 0 +chrY 59034049 59363566 chrY_PAR_2 0 diff --git a/data/expected_cn/expected_cn.hg19.XY.bed b/data/expected_cn/expected_cn.hg19.XY.bed new file mode 100644 index 0000000..8f23908 --- /dev/null +++ b/data/expected_cn/expected_cn.hg19.XY.bed @@ -0,0 +1,6 @@ +chrX 0 2699520 chrX_PAR_1 2 +chrX 2699520 154931043 chrX_uniq_1 1 +chrX 154931043 155260560 chrX_PAR_2 2 +chrY 0 2649520 chrY_PAR_1 0 +chrY 2649520 59034049 chrY_uniq_1 1 +chrY 59034049 59363566 chrY_PAR_2 0 diff --git a/data/expected_cn/female_expected_cn.hg38.bed b/data/expected_cn/expected_cn.hg38.XX.bed similarity index 100% rename from data/expected_cn/female_expected_cn.hg38.bed rename to data/expected_cn/expected_cn.hg38.XX.bed diff --git a/data/expected_cn/male_expected_cn.hg38.bed b/data/expected_cn/expected_cn.hg38.XY.bed similarity index 100% rename from data/expected_cn/male_expected_cn.hg38.bed rename to data/expected_cn/expected_cn.hg38.XY.bed diff --git a/data/expected_cn/expected_cn.hs37d5.XX.bed b/data/expected_cn/expected_cn.hs37d5.XX.bed new file mode 100644 index 0000000..77927d2 --- /dev/null +++ b/data/expected_cn/expected_cn.hs37d5.XX.bed @@ -0,0 +1,6 @@ +X 0 2699520 X_PAR_1 2 +X 2699520 154931043 X_uniq_1 2 +X 154931043 155260560 X_PAR_2 2 +Y 0 2649520 Y_PAR_1 0 +Y 2649520 59034049 Y_uniq_1 0 +Y 59034049 59363566 Y_PAR_2 0 diff --git a/data/expected_cn/expected_cn.hs37d5.XY.bed b/data/expected_cn/expected_cn.hs37d5.XY.bed new file mode 100644 index 0000000..6839664 --- /dev/null +++ b/data/expected_cn/expected_cn.hs37d5.XY.bed @@ -0,0 +1,6 @@ +X 0 2699520 X_PAR_1 2 +X 2699520 154931043 X_uniq_1 1 +X 154931043 155260560 X_PAR_2 2 +Y 0 2649520 Y_PAR_1 0 +Y 2649520 59034049 Y_uniq_1 1 +Y 59034049 59363566 Y_PAR_2 0 diff --git a/docs/aux_data.md b/docs/aux_data.md index e3c0902..cf69f53 100644 --- a/docs/aux_data.md +++ b/docs/aux_data.md @@ -3,10 +3,38 @@ Details on how HiFiCNV [auxiliary data files](../data) were generated. ## Excluded Regions Excluded regions can optionally be specified with a bed file. -Two example exclusion files for hg38/GRCh38 are provided here: -* [cnv.excluded_regions.hg38.bed.gz](../data/excluded_regions/cnv.excluded_regions.hg38.bed.gz) - Contains regions that are known to cause artifacts during data processing (e.g. centromeres). Script to generate this file can be found [here](../data/excluded_regions/get_cnv_exclusion_regions.bash). -* [cnv.excluded_regions.common_50.hg38.bed.gz](../data/excluded_regions/cnv.excluded_regions.common_50.hg38.bed.gz) - Contains all of the regions in the above file, plus regions that were frequently called as a duplication or deletion in a population. The additional regions were generated by running HiFiCNV on our population (N=97), and then storing any bin where >50% of the population had a duplication or deletion overlapping that bin. +### Pre-computed excluded regions files + +Several useful exclusions tracks are provided for commonly used genomes, this can be used directly for those genomes, +or as examples to develop exclusion files for other genomes. + +Two pre-computed exclusion files are provided for hg38/GRCh38: + +* [cnv.excluded_regions.hg38.bed.gz](../data/excluded_regions/cnv.excluded_regions.hg38.bed.gz) - Contains regions that + are known to cause artifacts during data processing (e.g. centromeres). Script to generate this file can be found + [here](../data/excluded_regions/get_cnv_exclusion_regions.bash). +* [cnv.excluded_regions.common_50.hg38.bed.gz](../data/excluded_regions/cnv.excluded_regions.common_50.hg38.bed.gz) - + Contains all the regions in the above file, plus regions that were frequently called as a duplication or deletion + in a population. The additional regions were generated by running HiFiCNV on our population (N=97), and then storing + any bin where >50% of the population had a duplication or deletion overlapping that bin. This is the recommended + excluded regions track for human sample analysis. + +More limited exclusion files are also provided for hg19 and hs37d5: + +* [cnv.excluded_regions.hg19.bed.gz](../data/excluded_regions/cnv.excluded_regions.hg19.bed.gz) - Contains regions that + are known to cause artifacts during data processing (e.g. centromeres). This file was generated with the following + [script](../data/excluded_regions/get_cnv_exclusion_regions.bash), modified with a 'ref' value of 'hg19'. +* [cnv.excluded_regions.hs37d5.bed.gz](../data/excluded_regions/cnv.excluded_regions.hs37d5.bed.gz) - Contains regions + that are known to cause artifacts during data processing (e.g. centromeres). This file was generated by the following + [script](../data/excluded_regions/convert_hg19_regions_to_hs37d5.bash) which converts chromosome names from the hg19 + exclusion file, removing hg19 non-canonical contigs and marking those from hs37d5 as excluded. + +Note that the common deletion and duplication calls in the population provided for GRCh38 are not available for the +other reference genomes. To improve CNV precision, it is recommended to either use GRCh38 or create a similar track +of common population calls for other reference genomes. + +### How excluded regions influence copy number calling All depth bins intersecting an excluded region are removed from the depth bins track. All minor allele frequency evidence intersecting an excluded region are removed from the MAF track. @@ -16,12 +44,17 @@ unknown copy-number state -- the probability of all other copy number states are state. This means that a copy number change can span through a short excluded region if there is sufficient evidence on the left or right flank, but longer excluded regions should be segmented into an unknown state. -### Expected Copy Number +## Expected Copy Number + By default, HiFiCNV expects each chromosome to have two full copies (e.g. a diploid organism). When reporting variants to the output VCF file, it will only report deviations from this expectation. However, this expectation is undesirable for some chromosomes (e.g. sex chromosomes) or non-diploid organisms. The expectation can be overridden by providing a BED file with expected copy number values. -Two examples corresponding to male/female in human hg38/GRCh38 are provided here: +Examples corresponding to XX/XY karyotypes are provided for human GRCh38/hg38, hg19 and hs37d5 references: -* [female_expected_cn.hg38.bed](../data/expected_cn/female_expected_cn.hg38.bed) -* [male_expected_cn.hg38.bed](../data/expected_cn/male_expected_cn.hg38.bed) +* [expected_cn.hg38.XX.bed](../data/expected_cn/expected_cn.hg38.XX.bed) +* [expected_cn.hg38.XY.bed](../data/expected_cn/expected_cn.hg38.XY.bed) +* [expected_cn.hg19.XX.bed](../data/expected_cn/expected_cn.hg19.XX.bed) +* [expected_cn.hg19.XY.bed](../data/expected_cn/expected_cn.hg19.XY.bed) +* [expected_cn.hs37d5.XX.bed](../data/expected_cn/expected_cn.hs37d5.XX.bed) +* [expected_cn.hs37d5.XY.bed](../data/expected_cn/expected_cn.hs37d5.XY.bed) diff --git a/docs/outputs.md b/docs/outputs.md index cc56fd4..86dd706 100644 --- a/docs/outputs.md +++ b/docs/outputs.md @@ -1,18 +1,30 @@ # Output files -All outputs are based on the provided `{OUTPUT_PREFIX}` and a inferred `{sample_name}`. +All outputs are based on the provided `{OUTPUT_PREFIX}` and an inferred `{sample_name}`. The `{sample_name}` is extracted from the alignment file. The name is taken from the first `@RG` tag in the alignment file header including a sample name. ## Primary outputs * `{OUTPUT_PREFIX}.{sample_name}.vcf.gz` - the primary VCF output containing copy number variant calls for the sample -* `{OUTPUT_PREFIX}.{sample_name}.depth.bw` - a bigwig file containing the depth measurements +* `{OUTPUT_PREFIX}.{sample_name}.depth.bw` - a bigwig depth track * `{OUTPUT_PREFIX}.{sample_name}.copynum.bedgraph` - the copy number values calculated for each region ## Secondary outputs * `{OUTPUT_PREFIX}.log` - the log file generated from running HiFiCNV * `{OUTPUT_PREFIX}.{sample_name}.maf.bw` - a bigwig file containing the minor allele frequency measurements, only generated if a VCF file is provided +## Debug outputs +Additional outputs related to GC correction can be obtained with the `--debug-gc-correction` option, these are debug +outputs and may change in future updates: +* `{OUTPUT_PREFIX}.gc_frac.bw` - A bigwig track of GC fraction windows (from the reference sequence) shared across all +samples. +* `{OUTPUT_PREFIX}.{sample_name}.gc_scaled_depth.bw` - A bigwig depth track, similar to the standard bigwig depth output +except that all depths are scaled by their region's GC correction factor. Note that the internal segmentation model uses +GC correction factors directly instead of these adjusted depths, so these depths are only used for visualization. +* `{OUTPUT_PREFIX}.{sample_name}.gc_correction_table.tsv` - Sample GC correction factors as a function of GC fraction +* `{OUTPUT_PREFIX}.{sample_name}.gc_reduction_factor.bw` - A bigwig track of sample GC correction factors by region + ## VCF notes HiFiCNV follows VCF format specification 4.2. The `QUAL` field is reported as an average of the next-most-likely copy-number state for each bin from the HMM (see Methods). It also includes a `TARGET_SIZE` filter flag for events that are smaller than 100kbp. +This filter can be disabled using the `--disable-vcf-filters` option. diff --git a/docs/quickstart.md b/docs/quickstart.md index c653a86..e6e079f 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -4,6 +4,7 @@ Table of contents: * [Quickstart](#quickstart) * [Supported upstream processes](#supported-upstream-processes) * [Output files](./outputs.md) +* [FAQ](#faq) # Quickstart ``` @@ -19,7 +20,7 @@ Parameters: * `{BAM}` - a BAM file containing reads from the sample * `{REF_FASTA}` - a FASTA file containing the reference genome, gzip allowed * `{EXCLUDE}` - a BED file of excluded regions, recommended for hg38: [cnv.excluded_regions.common_50.hg38.bed.gz](../data/excluded_regions/cnv.excluded_regions.common_50.hg38.bed.gz) -* `{EXPECTED_CNV}` - a BED file containing regions with deviant copy number expectations (two copy is the default if unspecified), male and female expectation files are proved in the [expected_cn](../data/expected_cn) folder +* `{EXPECTED_CNV}` - a BED file specifying expected copy-number by region (two copy is the default if unspecified), example files for human XX/XY karyotypes are provided in the [expected_cn](../data/expected_cn) folder * `{THREADS}` - number of threads to use * `{OUTPUT_PREFIX}` - the prefix for all output files @@ -29,35 +30,35 @@ See [auxiliary file generation](aux_data.md) for details on the above recommende ## Example -Example of running HiFiCNV on an HG002 (male) WGS sample. An optional VCF file is provided for minor-allele frequency outputs in this example: +Example of running HiFiCNV on an HG002 WGS sample. An optional VCF file is provided for minor-allele frequency outputs in this example: ``` $ hificnv \ > --bam /path/to/HG002.GRCh38.deepvariant.haplotagged.bam \ > --maf /path/to/HG002.GRCh38.deepvariant.phased.vcf.gz \ > --ref /path/to/human_GRCh38_no_alt_analysis_set.fasta \ -> --exclude /path/to/common_events.exclude-merged.bed.gz \ -> --expected-cn /path/to/male_ecn.bed \ +> --exclude /path/to/cnv.excluded_regions.common_50.hg38.bed.gz \ +> --expected-cn /path/to/expected_cn.hg38.XY.bed \ > --threads 8 \ -> --output-prefix HG002_male +> --output-prefix dtracks [2022-09-29][06:22:57][hificnv][INFO] Starting hificnv -[2022-09-29][06:22:57][hificnv][INFO] cmdline: hificnv --bam /path/to/HG002.GRCh38.deepvariant.haplotagged.bam --ref /path/to/human_GRCh38_no_alt_analysis_set.fasta --exclude /path/to/common_events.exclude-merged.bed.gz --expected-cn /path/to/male_ecn.bed --threads 8 --output-prefix HG002_male +[2022-09-29][06:22:57][hificnv][INFO] cmdline: hificnv --bam /path/to/HG002.GRCh38.deepvariant.haplotagged.bam --ref /path/to/human_GRCh38_no_alt_analysis_set.fasta --exclude /path/to/cnv.excluded_regions.common_50.hg38.bed.gz --expected-cn /path/to/expected_cn.hg38.XY.bed --threads 8 --output-prefix dtracks [2022-09-29][06:22:57][hificnv][INFO] Running on 8 threads [2022-09-29][06:22:57][hificnv][INFO] Reading reference genome from file '/path/to/human_GRCh38_no_alt_analysis_set.fasta' -[2022-09-29][06:23:29][hificnv][INFO] Reading excluded regions from file '/path/to/common_events.exclude-merged.bed.gz' -[2022-09-29][06:23:29][hificnv][INFO] Reading expected CN regions from file '/path/to/male_ecn.bed' +[2022-09-29][06:23:29][hificnv][INFO] Reading excluded regions from file '/path/to/cnv.excluded_regions.common_50.hg38.bed.gz' +[2022-09-29][06:23:29][hificnv][INFO] Reading expected CN regions from file '/path/to/expected_cn.hg38.XY.bed' [2022-09-29][06:23:29][hificnv][INFO] Processing alignment file '/path/to/HG002.GRCh38.deepvariant.haplotagged.bam' -[2022-09-29][06:37:47][hificnv][INFO] Writing depth track to bigwig file: 'HG002_male.HG002.depth.bw' +[2022-09-29][06:37:47][hificnv][INFO] Writing depth track to bigwig file: 'dtracks.HG002.depth.bw' [2022-09-29][06:37:48][hificnv][INFO] Scanning minor allele frequency data from file '/path/to/HG002.GRCh38.deepvariant.phased.vcf.gz' -[2022-09-29][06:38:01][hificnv][INFO] Writing bigwig maf track to file: 'HG002_male.HG002.maf.bw' +[2022-09-29][06:38:01][hificnv][INFO] Writing bigwig maf track to file: 'dtracks.HG002.maf.bw' [2022-09-29][06:38:18][hificnv][INFO] Segmenting copy number [2022-09-29][06:38:18][hificnv][INFO] Haploid coverage estimates for sample 'HG002', iteration 1. Uncorrected: 14.955 GC-Corrected: 15.708 [2022-09-29][06:38:21][hificnv][INFO] Haploid coverage estimates for sample 'HG002', iteration 2. Uncorrected: 14.955 GC-Corrected: 15.708 -[2022-09-29][06:38:23][hificnv][INFO] Writing bedgraph copy number track to file: 'HG002_male.HG002.copynum.bedgraph' -[2022-09-29][06:38:23][hificnv][INFO] Writing copy number variants to file: 'HG002_male.HG002.vcf.gz' +[2022-09-29][06:38:23][hificnv][INFO] Writing bedgraph copy number track to file: 'dtracks.HG002.copynum.bedgraph' +[2022-09-29][06:38:23][hificnv][INFO] Writing copy number variants to file: 'dtracks.HG002.vcf.gz' [2022-09-29][06:38:23][hificnv][INFO] hificnv completed. Total Runtime: 00:15:26.323 $ ls -HG002_male.HG002.copynum.bedgraph HG002_male.HG002.depth.bw HG002_male.HG002.maf.bw HG002_male.HG002.vcf.gz HG002_male.log +dtracks.HG002.copynum.bedgraph dtracks.HG002.depth.bw dtracks.HG002.maf.bw dtracks.HG002.vcf.gz dtracks.log ``` These tracks visualized in IGV appear as follows: @@ -74,3 +75,11 @@ The following upstream processes are supported as inputs to HiFiCNV: * [DeepVariant](https://github.com/google/deepvariant) - for SNV/indel Other upstream processes may work with HiFiCNV, but there is no official support for them at this time. + +# FAQ +## What does the error "Diploid chromosome regex does not match any sample chromosome names" mean? +By default, HiFiCNV tries to find autosomes for determining normal single-copy coverage in a sample. +The default regular expression searches for contigs named like "chr1" or "11" and uses those for normalization. +If your reference is using a different contig labeling system (e.g. an NCBI name), then you will need to alter the `--cov-regex` option. +Using `--cov-regex "."` is the easiest option, as it will match _all_ chromosomes available in your files. +However, this can lead to subtle differences in the output files, primarily because it may match sex chromosomes (which are not always diploid) or decoy/alt contigs that are frequently packaged in reference genome files.