Skip to content

Commit

Permalink
v0.2.4: De-prioritize multi-allele calls where PL and AD values do no…
Browse files Browse the repository at this point in the history
…t match a final, non-multiallelic ensemble call. Thanks to Shalabh Suman
  • Loading branch information
chapmanb committed Mar 25, 2015
1 parent 74cb915 commit 021cf54
Show file tree
Hide file tree
Showing 11 changed files with 36 additions and 6 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,4 @@ test/data/web-test/
test/data/single/
test/data/tmp*
test/data/txtmp*
test/data/ensemble/ens*
5 changes: 5 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## 0.2.4 (25 March 2015)

- De-prioritize multi-allele calls where PL and AD values do not match a final,
non-multiallelic ensemble call. Thanks to Shalabh Suman.

## 0.2.3 (1 February 2015)

- Correctly handle sample names with all numbers -- parse as strings.
Expand Down
4 changes: 2 additions & 2 deletions project.clj
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
(defproject bcbio.variation "0.2.3"
(defproject bcbio.variation "0.2.4"
:description "Toolkit to analyze genomic variation data, built on the GATK with Clojure"
:license {:name "MIT" :url "http://www.opensource.org/licenses/mit-license.html"}
:dependencies [[org.clojure/clojure "1.5.1"]
[org.clojure/math.combinatorics "0.0.3" :exclusions [org.clojure/clojure]]
[org.clojure/math.combinatorics "0.1.1" :exclusions [org.clojure/clojure]]
[org.clojure/data.csv "0.1.2" :exclusions [org.clojure/clojure]]
[org.clojure/tools.cli "0.2.2"]
[clj-stacktrace "0.2.5"]
Expand Down
22 changes: 18 additions & 4 deletions src/bcbio/variation/recall.clj
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
[bcbio.run.fsp :as fsp]
[bcbio.run.itx :as itx]
[bcbio.variation.filter.attr :as attr]
[bcbio.variation.variantcontext :as gvc]))
[bcbio.variation.variantcontext :as gvc]
[clojure.math.combinatorics :as combo]))

;; ## Utilities

Expand All @@ -38,6 +39,19 @@

;; ## Pick consensus variants

(defn- matches-alleles
"Ensure that an attribute matches the output alleles.
Avoid issues when dealing with collapsing multiple variant with different numbers
of alternative alleles"
[g vc attr-str]
(when-let [attr (get-in g [:attributes attr-str])]
(let [ploidy (count (:alleles g))
all-alleles (set (conj (:alleles g) (:ref-allele vc)))
exp-count (if (= attr-str "PL")
(count (distinct (combo/combinations (flatten (map #(repeat ploidy %) all-alleles)) ploidy)))
(count (:alleles g)))]
(= exp-count (count attr)))))

(defn- get-sample-call
"Retrieve variant alleles for the sample, sorted in a stable order."
[sample vc]
Expand All @@ -59,10 +73,10 @@
:ref-allele (:ref-allele vc)
:alleles (sort-by allele-order (:alleles g))
:attributes (select-keys (:attributes g) ["PL" "DP" "AD" "PVAL" "GQ"])
:has-likelihood (if (seq (get-in g [:attributes "PL"])) 1 0)
:attr-count (+ (if (seq (get-in g [:attributes "PL"])) 1 0)
:has-likelihood (if (matches-alleles g vc "PL") 1 0)
:attr-count (+ (if (matches-alleles g vc "PL") 1 0)
(if (seq (get-in g [:attributes "PVAL"])) 1 0)
(if (seq (get-in g [:attributes "AD"])) 1 0)
(if (matches-alleles g vc "AD") 1 0)
(if (get-in g [:attributes "GQ"]) 1 0)
(if (pos? (get-in g [:attributes "DP"] -1)) 1 0))
:pl (attr/get-pl g)
Expand Down
10 changes: 10 additions & 0 deletions test/bcbio/variation/test/multisample.clj
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,13 @@
(fsp/remove-path work-dir)
(fsp/remove-path out-file)
(ensemble/consensus-calls [vcf-m1 vcf-m2] ref-file out-file config) => out-file))

(facts "Ensemble consensus calls with different numbers of alternative alleles" :work
(let [ensemble-dir (fs/file data-dir "ensemble")
input-files (map #(str (fs/file ensemble-dir (str % ".vcf.gz"))) ["fb" "hc" "ug"])
out-file (str (fs/file ensemble-dir "ens.vcf"))
work-dir (str (fsp/file-root out-file) "-work")
config {:ensemble {:classifiers {:base ["DP"]}}}]
(fsp/remove-path work-dir)
(fsp/remove-path out-file)
(ensemble/consensus-calls input-files ref-file out-file config) => out-file))
Binary file added test/data/ensemble/fb.vcf.gz
Binary file not shown.
Binary file added test/data/ensemble/fb.vcf.gz.tbi
Binary file not shown.
Binary file added test/data/ensemble/hc.vcf.gz
Binary file not shown.
Binary file added test/data/ensemble/hc.vcf.gz.tbi
Binary file not shown.
Binary file added test/data/ensemble/ug.vcf.gz
Binary file not shown.
Binary file added test/data/ensemble/ug.vcf.gz.tbi
Binary file not shown.

0 comments on commit 021cf54

Please sign in to comment.