Skip to content

Releases: broadinstitute/gatk

4.0.8.1

16 Aug 19:47
Compare
Choose a tag to compare

This is a small bug fix release to fix an issue with unpaired reads in Mutect2, as well as small fixes and improvements to Funcotator, FilterVariantTranches, and MarkDuplicatesSpark.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • Mutect2: Fixed a "Cannot get mate information for an unpaired read" error that could occur with certain datasets containing unpaired reads that pass all the M2 read filters and show evidence of a SNV (#5121)

  • Funcotator:

    • Fixes to the splice site logic. (#5106)
      • Funcotator now ignores leading indel bases when checking if variants are within the splice site boundaries (eg. if a leading base in an indel, which is preserved between the reference and alternate alleles, is within the splice site boundary but the bases that have been changed are NOT, then the variant is now correctly labeled as NOT a splice site).
    • Populate the DB SNP validation status field properly (#5046)
      • Funcotator will now populate the MAF DB SNP Validation status field with proper values (e.g. "by1000genomes") instead of boolean value (e.g. "TRUE")
      • Funcotator now handles multiple records in a VCF funcotation factory that have the same pos, ref, and alt combination, even if equivalent and not exact matches.
  • FilterVariantTranches:

    • Add an --invalidate-previous-filters argument to remove old filters left over from previous runs (off by default) (#5042)
    • Add --snp-tranche and --indel-tranche arguments to replace the previous --tranche argument (#5042)
  • Updated MarkDuplicatesSpark scoring and comparison code to reflect changes in Picard (#5023)

    • Updated the scoring code to no longer take into account the unclipped start position of mismatching reads. Also changed the score to be a double packed short value in order to better reflect Picard scoring code.
  • Other Changes:

    • Added new IOUtils.isHDF5File() utility method (#5082)
    • Add jitpack support for building GATK snapshots (#5056)
    • Fixed broken link in Travis to docker test failure reports (#5108)

4.0.8.0

14 Aug 20:22
Compare
Choose a tag to compare

This release features some significant changes to Mutect2 that improve both performance and correctness, as well as a bug fix to GenomicsDBImport for large interval lists.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • Mutect2

    • Handle overlapping mates in M2 active region detection, causing fewer false active regions (#5078)
      • Makes Mutect2 ~25% faster in many cases with no loss of accuracy!
    • Filter M2 calls that are near other filtered calls on the same haplotype (#5092)
      • A very effective new filter that significantly reduces false positives
    • New Orientation Bias Filter (#4895)
      • New, improved orientation bias model, without which the M2 pipeline is not viable for NovaSeq data.
    • Changed the default AF slightly for M2 tumor-only mode (just a small tweak) (#5067)
    • Optimize some Mutect-related tools (#5073)
      • Everything that inherits from AbstractConcordanceWalker (this includes the Concordance tool and MergeMutect2CallsWithMC3) is now much faster on the cloud
    • Fixed edge case for M2 palindrome transformer (#5080)
      • Fixed an edge case involving reads assigned huge fragment lengths
    • Allowing counts for supporting alt reads in the validation normal. (#5062)
      • Added useful information suggesting possible normal artifacts in somatic validation tool.
    • M2 wdl doesn't emit unfiltered vcf, which is redundant (#5076)
  • GenomicsDBImport

    • Fix for issue where we could run out of file handles when working with large interval lists (#5105)
    • Display warning when using large interval lists with GenomicsDBImport (#5102)
  • Updated MarkDuplicatesSpark tie-breaking rules to reflect changes in picard (#5011)

  • Added the ability for CompareDuplicatesSpark to output mismatching reads (#4894)

  • Updated our google-cloud-java fork to 0.20.5-alpha-GCS-RETRY-FIX (#5099)

    • We now retry on 502 and UnknownHostException errors when using NIO
  • SV Tools:

    • Various improvements (#4996)
      • output a single VCF for new interpretation tool
      • bring MAX_ALIGN_LENGTH and MAPPING_QUALITIES annotations from CPX variants to re-interpreted simple variants
      • add new CLI argument and filter assembly based variants based on annotation MAPPING_QUALITIES, MAX_ALIGN_LENGTH
      • filter out variants of size < 50
    • Bug fix for the extreme edge case where after alignments de-overlapping, an alignment block is only 1 base long (#4962)
    • Turn back on checking variant info fields against header in SV vcf writing (turned off temporarily long time ago but slipped attention after implementation stablized) (#5084)

4.0.7.0

30 Jul 18:47
b6a630a
Compare
Choose a tag to compare

Some important fixes in this release include a new version of GenomicsDB with a fix for the stack overflow seen when using large interval lists, and an updated Docker image with a fix for the missing R/ggplot2 dependencies.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/.

Docker

  • Restore missing R/ggplot2 dependencies on the Docker image. [#5040 (https://github.com//pull/5040)

GenomicsDB

  • Fix GenomicsDBImport stack overflow when using large number of intervals #4997

Mutect2

  • Don't use very short stubs of clipped reads for genotyping #5057
  • Add maxRetries to runtime in M2 WDLs #5049
  • Fix an edge case bug in PalindromeArtifactReadTransformer #5038
  • Make orientation bias filtering default to true #5019
  • Added option for ValidateBasicSomaticShortMutations to output a vcf #4999
  • Add Mutect2 PalindromeArtifactReadTransformer to hard clip inverted tandem repeats insertion artifacts #4998
  • Making MAF become the output of Funcotator in M2 WDL and multiple transcript fix. #4941

CNV Tools

  • Exposed ability to blacklist intervals in CNV WDLs. #5027
  • Added output of IGV-compatible .seg files to ModelSegments. #5048

Structural Variants

  • Add BreakpointEvidence filter based on classifier #4769
  • Address more edge cases in assembly alignments #5044
  • Refactor AssemblyContigAlignmentsConfigPicker #4971
  • Fix an edge case in assembly contig alignment picker where no good mappings to canonical mappings exist #5005
  • Trim down ref bases for CPX variants #4970

Funcotator

  • VCF Funcotation Factory will recognize equivalent alleles (even when not exact) #4977

Other

  • Include docs for new variant quality score model #5008
  • Engine changes related to migration of GATK3 VariantEval to GATK4 #4495
  • Fix position annotations to use position in original, not clipped, read #4956
  • Add cmd line to VCF generated by GATKSparkTool #4981

4.0.6.0

06 Jul 22:02
Compare
Choose a tag to compare

Highlights of this release include:

  • A new version of GenomicsDB that brings many long-requested features such as support for multiple intervals in GenomicsDBImport
  • A significantly (~33%) smaller GATK docker image
  • An important bug fix for the -new-qual option in GenotypeGVCFs/HaplotypeCaller/Mutect2

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • GenomicsDB: new version with many long-awaited features and bug fixes (#4645)

    • Multi-interval support in GenomicsDBImport (#3269)
      • Now you can specify multiple -L intervals when importing variants into GenomicsDB using GenomicsDBImport, instead of having to specify one interval per invocation.
    • New protobuf-based API to allow configuration without editing JSON files
    • Support for sites-only queries
    • Support for returning the genotype (GT) field in queries
    • Fixed bug where records with spanning deletion alleles could cause reads from GenomicsDB to fail (#4716)
  • Reduced the size of the GATK docker image by approximately 33%, from ~5.3 GB to ~3.5 GB (#4955)

  • Fixed a regression in the -new-qual option for GenotypeGVCFs/HaplotypeCaller/Mutect2 that was introduced in GATK 4.0.5.0 (#4980)

    • There was a precision issue in the AlleleFrequencyCalculator when running with -new-qual that could cause a crash at certain sites (specifically, sites with spanning deletions and highly unlikely alt alleles).
  • HaplotypeCaller: don't count qual = 0 sites as polymorphic for GVCF mode (#4967)

  • ValidateBasicSomaticShortMutations: added a new optional argument to produce summary table output (#4982)

  • ExtractOriginalAlignmentRecordsByNameSpark: added a new optional argument to invert the logic in the read-name filtering (#4944)

  • Separated out the "variant calling" integration tests from the rest of the integration tests to speed up overall test suite runtime in travis (#4984)

4.0.5.2

29 Jun 21:01
Compare
Choose a tag to compare

Highlights of this release include major Funcotator performance improvements on hg19/b37 inputs, a newly rewritten Java version of FilterVariantTranches, HaplotypeCaller bamout improvements, and improved Python integration by eliminate timeouts.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/.

Funcotator Improvements

  • Improve handling of hg19/B37 references (#4586).
    • Fixed performance bug involving excessive cache misses when querying datasources, resulting in major
      performance improvements when running on HG19/B37 data (performance increased by approx. 30x with v1.4.20180615 of
      the standard Funcotator data sources) (#4586).
    • Automatically detect when B37 data run against hg19 data source and convert contig names to be hg19 compliant.
    • Assumes all data sources for the hg19 reference are compliant with hg19 contig names. User-created data
      sources will have to honor this.
    • Perform additional validation on input data to ensure a given reference FASTA has a sequence
      dictionary that is a superset of the given VCF. This is a more stringent check than is automatically
      performed by the GATK. Can be disabled with the --disable-sequence-dictionary-validation flag.
    • Released new version of datasources to go with this release (1.4.20180615), necessary because the data
      sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names).
    • Updated the minimum required data source version to be the latest release.
    • Updated the getDbSNP.sh and createSqliteCosmicDb.sh data source scripts to preprocess those data sources
      to have hg19-compliant contigs names.
    • Removed the --allow-hg19-gencode-b37-contig-matching flag.
    • Removed the --allow-hg19-gencode-b37-contig-matching-override flag.
  • User defined transcripts were being used as a filter rather than a priority order. The filtering step has been eliminated. Fixes #4918 (#4931)
  • Added custom MAF fields to MafOutputRenderer (#4917)
  • LocatableXsv data sources now produce at most 1 funcotation per allele pair. (#4936)
  • LocatableXsv data sources now provide the correct number of funcotations (#4915)
  • Preserve VCF fields in MAF output (#4872)
  • Fixing error when spanning deletions overlap coding regions (#4881)

HaplotypeCaller/Mutect2

  • Improvements to FilterMutectCalls. Eliminates about 3% of all false positives in DREAM while reducing sensitivity by about 0.1%
  • Fix many questionable -bamout alignments where, because of a bad choice of Smith-Waterman parameters,
    deletions were preferred over single-base substitutions.(#4858)
    Result is many fewer spurious indels in the -bamout output.
  • Introduced new SmithWaterman parameters affecting realignment of the reads to their best haplotype. This
    also changes some annotations that depend on the alignment, such as BaseQualityRankSum and ReadPositionRankSum.
    The changes are slight and make things more correct.
  • Modify the behavior of (BaseGraph) getNextReferenceVertex for non-ref paths (#4889)

FilterVariantTranches

  • Rewrite VCF Tranche filtering in java, with tests (#4800)

Engine

  • StreamingPythonExecutor no longer uses timeouts or relies on prompt synchronization. (#4757)
  • Allow concordance tools (AbstractConcordanceWalker) to use NIO for truth call set (#4905)
  • Add pre- and post- apply variant transformer to VariantWalkerBase

MarkDuplicatesSpark

  • Fixed a missing special case in MarkDuplicates ReadsKey code to better match current picard results (#4899)
  • Reworked the keys for MarkDuplicatesSpark to be sufficient for grouping on their own. (4878)
  • Improve error message for MarkDuplicates duplicates readnames issues (#4879)

Structural Variants

  • Add tests for AssemblyContigWithFineTunedAlignments (#4961)
  • Fix no index output for assembly bam file (#4945)
  • Overhaul tests on assembly-based non-complex breakpoint and type inference code (#4835)
  • Simple fix to remove trailing slash in GCS_SAVE_PATH to avoid double slashes in GCS_RESULTS_DIR (#4873)

Misc:

  • Upgrading picard 2.18.2 -> 2.18.7 (#4949)
  • Update htsjdk 2.15.1 -> 2.16.0 (#4914)
  • Added support to PrintReadsSpark for non-coordinate sorted bams (#4853)
  • Adding --sort-order option to SortSamSpark (#4545)
  • Increased boot disk size on GATK tasks in M2 wdl to accomodate 4.0.5.0 docker (#4877)

4.0.5.1

11 Jun 17:26
Compare
Choose a tag to compare

This is primarily a bug fix release to fix a crash in the help system (#4875). The issue was that tools that use annotations (which includes Mutect2, HaplotypeCaller, GenotypeGVCFs, CombineGVCFs, and VariantAnnotator) would crash when trying to print their help text. This could be triggered by running with an explicit --help, or by typing an invalid tool command line.

This release also brings in some improvements to Funcotator, including a new mode to output annotations for all transcripts.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • Fix crash when displaying help text for tools that use annotations (#4876)
  • Funcotator improvements (#4838) (#4870)
    • Added ALL mode for transcript selection (--transcript-selection-mode ALL) which will output full annotation fields for all transcripts
    • IGR annotation are no longer reported if there are any transcripts that would result in a non-IGR annotation for a given variant
    • VCF Datasources now have to match both the alt and ref alleles to be added as annotations to a variant
    • Added the --allow-hg19-gencode-b37-contig-matching-override flag to allow for even more permissive matching contig names between B37 and HG19 references (primarily designed to be used in development)
    • Updated the experimental Funcotator WDL to work properly in cromwell
    • Refactored internals of Funcotator to use FuncotationMap objects to store annotations
    • Additional tests to ensure VCF and MAF protein change strings are equivalent
    • Other minor internal bugfixes for testing
  • Fix to the Oncotator command line in the Mutect2 WDL (#4862)
  • Removed unsupported Mutect2 WDLs (these now live on Firecloud) (#4836)

4.0.5.0

07 Jun 22:56
f4225b8
Compare
Choose a tag to compare

Highlights of this release include the ability to emit MNPs in Mutect2 and HaplotypeCaller via a new --max-mnp-distance argument, much better active region detection for low allele fractions in Mutect2, new priors for variants sites and homRef blocks in HaplotypeCaller, a new tool FilterAlignmentArtifacts to filter false positive alignment artifacts in the Mutect2 pipeline, performance improvements to CNNScoreVariants and Funcotator, and a new --sites-only-vcf-output GATK engine argument to suppress genotypes when writing VCFs.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • Mutect2

    • Made Mutect2 active region determination much better for low allele fractions (#4832)
      • In particular, this makes Mutect2 vastly better for mitochondrial and cfDNA calling
    • Mutect2 can now emit MNPs according to adjustable distance threshold specified via --max-mnp-distance (#4650)
    • Tweaked Mutect2 read position filter to handle non-biological (eg FFPE) insertions better (#4851)
    • Fixed Mutect2 bug where triallelic normal artifacts were sometimes hidden from filtering engine (#4809)
    • Mutect2 STR filter now also looks at insertions (#4845)
      • This lowers the indel false positive rate dramatically.
    • Mutect2 WDL:
      • now outputs MAF segmentation (#4837)
      • now runs FilterAlignmentArtifacts (#4848)
      • now uses lenient validation in SortSam (#4844)
  • Added new tool FilterAlignmentArtifacts (#4698)

    • Filters false positive alignment artifacts (that is, apparent variants due to reads being mapped to the wrong genomic locus) from a VCF callset by checking variant-supporting reads and their mates.
    • By considering the realignment of the read and its mate, it saves a lot of variants, especially in low-complexity regions, from being filtered as mapping errors.
  • HaplotypeCaller

    • HaplotypeCaller can now emit MNPs according to adjustable distance threshold specified via --max-mnp-distance (#4650)
    • New HaplotypeCaller priors for variants sites and homRef blocks (#4793)
      • Added new --population-callset argument allowing an external panel of variants to be specified to inform the frequency distribution underlying the genotype priors
      • Added new --num-reference-samples-if-no-call argument to control whether to infer (and with what effective strength) that only reference alleles were observed at sites not seen in any panel
      • As a side effect of this change, CalculateGenotypePosteriors now supports indels.
    • GCS/NIO output support for the -bamout argument (#4721)
  • -new-qual in HaplotypeCaller/Mutect2/GenotypeGVCFs no longer counts spanning deletions as support for variant qual (#4801)

  • CNNScoreVariants

    • Performance improvements to the prep of the input tensors in the 2D model (#4735)
    • Bug fix to prevent a crash on the ends of the mitochondrial contig (#4751)
  • GATK Engine

    • Added a new traversal type TwoPassVariantWalker that does two passes over its input variants (#4744)
    • Enable the -L argument to read feature files (such as .bed or .vcf files) from non-local Paths, including GCS buckets (#4854)
    • Added --sites-only-vcf-output argument to the GATK engine to suppress genotype fields when writing VCFs (#4764)
    • Tools that use annotations now use the barclay annotation plugin (#4674)
    • Added new ReadQueryNameComparator (#4731)
    • Automatically schedule temporary resource files for delete on exit (#4616)
  • Spark tools

    • Added support for g.vcf.gz files in Spark. #4274 (#4463)
    • Spark tools can now write SAM files #4295. (#4471)
    • Added a --output-shard-tmp-dir argument to specify the parts directory for un-sharded BAM writing (#4666)
  • MarkDuplicatesSpark

    • Fixed MarkDuplicatesSpark so it handles supplementary reads with unmapped mates properly (#4785)
    • Added a distinction between PCR orientation and Optical Duplicates orientation in MarkDuplicatesSpark (#4752)
    • Fixed serialization crash in MarkDuplicatesSpark (#4778)
    • Fixed queryname partitioning bug where asking for queryname sort would result in reads with the same name being split between partitions (#4765)
    • Changed MarkDuplicatesSpark to sort non-queryname sorted bams before processing to ensure marking is consistent across shards (#4732)
    • Renamed some MarkDuplicatesSpark arguments to follow the "kabob-style" convention (#4715)
    • MarkDuplicatesSpark now uses the Picard OpticalDuplicatesFinder directly (#4750)
    • MarkDuplicatesSpark now uses Picard metrics code directly (#4779)
  • BwaSpark: disable sequence dictionary validation when aligning reads #4131 (#4308)

  • Funcotator

    • Major performance improvements due to added caching and other optimizations (#4740)
    • Various fixes (#4783) (#4817) (#4770)
      • Sanitize special characters when outputting VCF so that VCF validation passes
      • Ordering specified in the header did not match the variants and hg19/b37 - VCF datasources were being inconsistently processed, inducing a lot of missed annotations.
      • Added Funcotator tests for Clinvar and Gencode v28 in hg38, and mixed chr/no-chr GENCODE.
      • Eased restrictions so that Gencode v28 would be recognized as a valid gtf. Future versions of Gencode will not fail just based on the version number and warning will be emitted instead.
      • Refining handling of transcripts with missing sequence info.
      • Refactored UTR VariantClassification handling.
      • Added warning statement when a transcript in the UTR has no sequence info (now is the same behavior as in protein coding regions).
      • Added tests to prevent regression on data source date comparison bug.
      • Fixed DNA Repair Genes getter script.
      • Fixed an issue in COSMIC to make it robust to bad COSMIC data.
      • Gencode no longer crashes when given an indel that starts just before an exon.
      • Fixed the SimpleKeyXsvFuncotationFactory to allow any characters to work as delimiters (including characters used in regular expressions, such as pipes).
      • Modified several methods to allow for negative start positions in preparation for allowing indels that start outside exons.
      • Fixed an issue in 5' UTR processing that would cause variant alleles with length > 1 to throw an exception (fixes issue #4712).
      • Fixed a bug in the version detection for Funcotator data sources that would prevent newer data source versions from being detected as compatible (date comparison error).
    • Gencode data sources now have names preserved from config files. (#4823)
  • GCNV kernel tunings (#4720)

    • Fixed a minor issue in sampling error estimation that could lead to NaN (as a result of division by zero)
    • Introduced separate internal and external admixing rates
    • Introduced two-stage inference for cohort denoising and calling
    • Capped phred-scaled qualities to maximum values permitted by machine precision in order to avoid NaNs and overflows.
    • Took a first step toward tracking and logging parameters during inference, starting with the ELBO history.
  • Validation of sequence dictionaries from multiple BAMs now throws warning instead of exception in CNV workflows. (#4758)

  • SV tools

    • Tweak BWA to allow "gappier" alignments in local assemblies (#4708)
    • Added a new experimental tool named CpxVariantReInterprepterSpark to extract barebone-annotated simple variants from an GATK-SV discovery pipeline produced VCF containing complex variants (#4602)
    • Fix "UnhandledCaseSeen" error in StructuralVariationDiscoveryPipelineSpark (#4677)
  • Added new SingleSequenceReferenceAligner class to align against an on-the-fly single contig reference using Bwa-Mem (#4780)

  • Updates to the conda environment for Python-based tools (#4749)

    • Fix #4741, where newer versions of conda appear to treat relative references in the environment yml as being relative to the yml file instead of relative to the cwd (based on observation).
    • Add a second conda yml file (gatkcondaenv.intel.yml) for environments that use Intel hardware acceleration and the Intel Tensorflow package (based on #4735).
    • Added a gradle task (condaEnvironmentDefinition) to generate the conda yml files from a single template to ensure that all the environment definitions remain in sync. This task also generates the Python package archive.
    • Added a gradle task (localDevCondaEnv) to create or update a local (non-Intel) conda environment. This is a shortcut for use during development when you're iteratively changing/testing Python code and want to update the conda env.
  • Added a new WEX test bam to src/test/resources/large, with a companion target interval list (#4756)

  • Add slightly modified version of GATK3 github issue template (#4796)

  • Updated htsjdk to 2.15.1 (#4830)

4.0.4.0

26 Apr 15:37
Compare
Choose a tag to compare

Highlights of this release include major performance improvements to MarkDuplicatesSpark, better sensitivity and precision in STR (short tandem repeat) contexts for Mutect2, support for a "genotype given alleles" mode in Mutect2, dbSNP support for Funcotator, and several important bug fixes to CombineGVCFs.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Full list of changes in this release:

  • MarkDuplicatesSpark

    • New, optimized version of the tool with greatly improved performance and scalability (#4656)
    • Note that this tool is still marked as beta, and has a number of known issues. The current version is suitable for evaluation/profiling purposes only.
  • Mutect2 improvements

    • Added a GGA (genotype given alleles) mode activated via the --genotyping-mode GENOTYPE_GIVEN_ALLELES and --alleles arguments (#4601)
    • Better sensitivity and precision in STR (short-tandem repeat) contexts (#4690)
    • New, supported Mutect2 NIO-enabled WDL that works in Firecloud (#4710)
    • Better default AF for M2 tumor-normal mode (#4690)
    • Restored explicit PASS (as opposed to empty) filter in Mutect2 (#4644)
    • Fixed Mutect2 failure for germline resource without AF (#4607)
    • Fixed a bug in the Mutect2 WDL bamout where scatters with overlapping assembly regions failed (#4613)
    • Fixed extra filtering args being deactivated in Mutect2 WDL due to typo
  • CombineGVCFs: several important bug fixes

    • ReferenceConfidenceVariantContextMerger fixes for spanning deletions, and use the correct types for the median calculation. (#4680)
    • Handle trailing reference blocks correctly (#4615)
    • Fix and test for calculating intermediate band interval start locations. (#4681)
  • Funcotator

    • Added dbSNP support via a new VcfFuncotationFactory. (#4593)
    • Fixed the refContext annotation. (#4605)
    • Fixed calculation of GC content to be correct. (#4608)
    • Fixes for HG38 exception and better logging. (#4563)
    • Note: only datasource releases 1.2.20180329 and later will work with this version of Funcotator
  • HaplotypeCaller: Fixed a bug that caused the --comp and --input-prior arguments to not be settable by the user (#4703)

  • CNNScoreVariants: Better numerical consistency between python and java, and transpose bug fix (#4652)

  • CNV Tools

    • A new framework to support automated evaluation of GATK CNV (#4276)
    • Enabled zero eigensamples to be specified for CreateReadCountPanelOfNormals (#4502)
    • Exposed maximum chunk size in CNV panel of normals. (#4528)
    • Changed CNV PoN to filter on equality to interval median percentile. (#4503)
  • SV Tools

    • Breakpoint location and type inference unit (#4562)
    • Scaffold local assemblies (#4589)
    • Use the latest version of fermilite jni (#4622)
    • Update sv scripts to only copy a single bam file and index, and respect project parameter (#4646)
    • Various bug fixes (#4670) (#4623)
  • Added GCS (Google Cloud Storage) output support to the following tools: ApplyBQSR, SplitNCigarReads, ClipReads, LeftAlignIndels, RevertBaseQualityScores, and UnmarkDuplicates (#4695) (#4424)

  • Mark the --disable-tool-default-read-filters argument as advanced, and add a warning to its documentation string (#4671)

    • Many tools do not function correctly without their default read filters turned on, so this argument is intended only for advanced users who know what they're doing!
  • ParallelCopyGCSDirectoryIntoHDFSSpark: allow the tool to take a filename glob to subset files to copy (#4624)

  • Picard: updated to version 2.18.2 (#4676)

4.0.3.0

27 Mar 20:46
Compare
Choose a tag to compare

This release brings a major update to our experimental neural-network-based VariantRecalibrator replacement, initial MAF support in Funcotator, as well as some updates to Mutect2 and the CNV tools.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Summary of changes in this release:

  • A major update to our experimental neural-network-based suite of variant scoring tools, which will eventually replace the VariantRecalibrator (#4245)

    • The NeuralNetInferenceTool has been renamed to CNNScoreVariants
    • Baseline models are now included in the distribution.
    • Added additional tools to write tensors and to train your own models given a VCF of validated calls, an unfiltered VCF and a confident region: CNNVariantTrain, CNNVariantWriteTensors and FilterVariantTranches
    • Read-level 2D models are now supported via the tensor-type read_tensor argument. 2D models at present are significantly slower than the 1D models.
  • Funcotator:

    • Added prototype support for outputting MAF files (and many bug fixes) (#4472)
  • Mutect2:

    • CalculateContamination emits its segmentation and Mutect2 germline model uses it (#4509)
    • Option to emit (but still filter) all germline sites in Mutect2 (#4522)
    • Made number of samples to put variant site in Mutect2 PON adjustable (#4566)
    • Added Oncotator filtering enabled in Mutect2 WDL. (#4423)
  • CNV tools:

    • Replaced CollectFragmentCounts with CollectReadCounts. (#4564)
    • Allowed use of zero eigensamples in DenoiseReadCounts. (#4411)
    • Changed filtering of normal hets on overlap with copy-ratio intervals in ModelSegments to be consistent with filtering of case hets. (#4510)
    • Updated PostprocessGermlineCNVCalls (segments VCF writing, WDL scripts, unit tests, integration tests) (#4396)
  • Miscellaneous changes:

    • Concordance: added option to analyze contributions of different filters (#4520)
    • Exposed the -pairHMM/--pair-hmm-implementation argument in HaplotypeCaller, which was previously hidden (#4494)
    • Set the default samjdk.compression_level to 2 (was previously 1) (#4547)
    • Upgraded to Spark 2.2.0 (#4314)
    • Changed Spark sharding of queryname-sorted bams to better handle secondary and supplementary reads (#4473)
    • Added logging output to the bam writing step for spark tools (#4501)
    • git-lfs is now required to compile the GATK
    • Added a registry for deprecated/unported tools. (#4505)
    • Updated the Hadoop GCS connector from 1.6.1 to 1.6.3. (#4590)
    • Added a large runtime resource directory to git-lfs, and exposed it to the Docker build. (#4530)
    • We now include full tool documentation in the GATK binary distribution zip (#4377)
    • Made our maven artifacts much smaller by preventing gradle uploadArchives from including distZip and distTar (#4569)
    • Added chr20 and chr21 alt contigs to the GRCh38 reference snippet used for testing (#4548)

4.0.2.1

02 Mar 19:41
8a78790
Compare
Choose a tag to compare

This is a small bug fix release containing fixes for the following issues:

  • HaplotypeCaller: fix the -contamination/-contamination-file arguments, which were not working properly, and add tests (#4455)
  • Fixes/improvements to the GATK configuration file mechanism (#4445)
    • If a Java system property is specified explicitly on the user's command line, allow it to override the corresponding value in the GATK config file
    • Bundle an example GATK configuration file with the GATK binary distribution. This config file can be edited and passed to the GATK via the --gatk-config-file argument.
    • There are still some configuration-related TODOs/known issues: in particular, the gatk front-end script currently bakes in some system properties internally, which will always override the corresponding values in the config file. We plan to patch the gatk script to no longer set these system properties internally, and delegate to the config file instead.
  • Mutect2: minor bug fixes and improvements (#4466)
    • Fix "FilterMutectCalls trips on non-int value in MFRL tag" (#4363)
    • Fix ordering of allele trimming vs. variant annotation (#4402)
    • Fix "CalculateContamination gives >100% results" (#3889)
    • Disable the MateOnSameContigOrNoMappedMateReadFilter by default (#3514)
    • Make mapping quality threshold in GetPileupSummaries modifiable (#4011)
  • SV Tools: Add a scan for intervals of high depth, and exclude reads from those regions from SV evidence (#4438)
  • In the GATK docker image, run the GATK using the fully-packaged binary distribution jars, rather than the unpackaged jars (#4476). This fixes a number of minor issues reported by users of the docker image.