From caf141960e4c3f12c2d3fabf04b7e7be3e5a82f3 Mon Sep 17 00:00:00 2001 From: Jo Lynne Rokita Date: Sun, 3 Nov 2019 14:11:02 -0500 Subject: [PATCH 1/5] add cnv interpretation CNVkit and ControlFreeC copy number outputs are not directly comparable. Updating this in the README. --- README.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/README.md b/README.md index b6d2ef2894..e3d06ec333 100644 --- a/README.md +++ b/README.md @@ -92,6 +92,32 @@ The release notes for each release are provided in the `release-notes.md` file t * Somatic Copy Number Variant (CNV) data are provided in a modified [SEG format](https://software.broadinstitute.org/software/igv/SEG) for each of the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#somatic-copy-number-variant-calling). * The CNVkit SEG file has an additional column `copy.num` to denote copy number of each segment, derived from the CNS file output of the algorithm described [here](https://cnvkit.readthedocs.io/en/stable/fileformats.html). * The ControlFreeC TSV file is a merge of `*_CNVs` files produced from the algorithm, and columns are described [here](http://boevalab.inf.ethz.ch/FREEC/tutorial.html#OUTPUT). + * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _copy number_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations. + + | Ploidy | Copy Number | Gain/Loss Interpretation | +|--------|-------------|------------------------------| +| 2 | 0 | Loss; homozygous deletion | +| 2 | 1 | Loss; hemizygous deletion | +| 2 | 2 | Copy neutral | +| 2 | 3 | Gain; one copy gain | +| 2 | 4 | Gain; two copy gain | +| 2 | 5+ | Gain; possible amplification | +| 3 | 0 | Loss; 3 copy loss | +| 3 | 1 | Loss; 2 copy loss | +| 3 | 2 | Loss; 1 copy loss | +| 3 | 3 | Copy neutral | +| 3 | 4 | Gain; one copy gain | +| 3 | 5 | Gain; two copy gain | +| 3 | 6+ | Gain; possible amplification | +| 4 | 0 | Loss; 4 copy loss | +| 4 | 1 | Loss; 3 copy loss | +| 4 | 2 | Loss; 2 copy loss | +| 4 | 3 | Loss; 1 copy loss | +| 4 | 4 | Copy neutral | +| 4 | 5 | Gain; one copy gain | +| 4 | 6 | Gain; two copy gain | +| 4 | 7+ | Gain; possible amplification | + * Somatic Structural Variant Data (Somatic SV) are provided in the [Annotated Manta TSV](doc/format/manta-tsv-header.md) format produced by the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#somatic-structural-variant-calling). * Gene expression estimates from the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#gene-expression-abundance-estimation) are provided as a gene by sample matrix. * Gene Fusions produced by the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#rna-fusion-calling-and-prioritization) are provided as [Arriba TSV](doc/format/arriba-tsv-header.md) and [STARFusion TSV](doc/format/starfusion-tsv-header.md) respectively. From 5b33b7962b194d1fc3f2ac531e6d67f81d7a3b64 Mon Sep 17 00:00:00 2001 From: Jo Lynne Rokita Date: Sun, 3 Nov 2019 14:19:43 -0500 Subject: [PATCH 2/5] fix table --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e3d06ec333..76c8e9825c 100644 --- a/README.md +++ b/README.md @@ -94,7 +94,7 @@ The release notes for each release are provided in the `release-notes.md` file t * The ControlFreeC TSV file is a merge of `*_CNVs` files produced from the algorithm, and columns are described [here](http://boevalab.inf.ethz.ch/FREEC/tutorial.html#OUTPUT). * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _copy number_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations. - | Ploidy | Copy Number | Gain/Loss Interpretation | +| Ploidy | Copy Number | Gain/Loss Interpretation | |--------|-------------|------------------------------| | 2 | 0 | Loss; homozygous deletion | | 2 | 1 | Loss; hemizygous deletion | From b3ab629880d7febf69c6cd3f56d63307f88c9a66 Mon Sep 17 00:00:00 2001 From: Jo Lynne Rokita Date: Sun, 3 Nov 2019 14:20:21 -0500 Subject: [PATCH 3/5] remove extra line spaces --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 76c8e9825c..b217200735 100644 --- a/README.md +++ b/README.md @@ -93,7 +93,6 @@ The release notes for each release are provided in the `release-notes.md` file t * The CNVkit SEG file has an additional column `copy.num` to denote copy number of each segment, derived from the CNS file output of the algorithm described [here](https://cnvkit.readthedocs.io/en/stable/fileformats.html). * The ControlFreeC TSV file is a merge of `*_CNVs` files produced from the algorithm, and columns are described [here](http://boevalab.inf.ethz.ch/FREEC/tutorial.html#OUTPUT). * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _copy number_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations. - | Ploidy | Copy Number | Gain/Loss Interpretation | |--------|-------------|------------------------------| | 2 | 0 | Loss; homozygous deletion | @@ -117,7 +116,6 @@ The release notes for each release are provided in the `release-notes.md` file t | 4 | 5 | Gain; one copy gain | | 4 | 6 | Gain; two copy gain | | 4 | 7+ | Gain; possible amplification | - * Somatic Structural Variant Data (Somatic SV) are provided in the [Annotated Manta TSV](doc/format/manta-tsv-header.md) format produced by the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#somatic-structural-variant-calling). * Gene expression estimates from the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#gene-expression-abundance-estimation) are provided as a gene by sample matrix. * Gene Fusions produced by the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#rna-fusion-calling-and-prioritization) are provided as [Arriba TSV](doc/format/arriba-tsv-header.md) and [STARFusion TSV](doc/format/starfusion-tsv-header.md) respectively. From f167b1625a00f93a17f3ade5dbc7ad17638fd33d Mon Sep 17 00:00:00 2001 From: Jaclyn Taroni Date: Sun, 3 Nov 2019 14:23:10 -0500 Subject: [PATCH 4/5] Spacing around CNV table --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index b217200735..76c8e9825c 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,7 @@ The release notes for each release are provided in the `release-notes.md` file t * The CNVkit SEG file has an additional column `copy.num` to denote copy number of each segment, derived from the CNS file output of the algorithm described [here](https://cnvkit.readthedocs.io/en/stable/fileformats.html). * The ControlFreeC TSV file is a merge of `*_CNVs` files produced from the algorithm, and columns are described [here](http://boevalab.inf.ethz.ch/FREEC/tutorial.html#OUTPUT). * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _copy number_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations. + | Ploidy | Copy Number | Gain/Loss Interpretation | |--------|-------------|------------------------------| | 2 | 0 | Loss; homozygous deletion | @@ -116,6 +117,7 @@ The release notes for each release are provided in the `release-notes.md` file t | 4 | 5 | Gain; one copy gain | | 4 | 6 | Gain; two copy gain | | 4 | 7+ | Gain; possible amplification | + * Somatic Structural Variant Data (Somatic SV) are provided in the [Annotated Manta TSV](doc/format/manta-tsv-header.md) format produced by the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#somatic-structural-variant-calling). * Gene expression estimates from the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#gene-expression-abundance-estimation) are provided as a gene by sample matrix. * Gene Fusions produced by the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#rna-fusion-calling-and-prioritization) are provided as [Arriba TSV](doc/format/arriba-tsv-header.md) and [STARFusion TSV](doc/format/starfusion-tsv-header.md) respectively. From cfa18754935fabcc8f944cfa681b2dc353084777 Mon Sep 17 00:00:00 2001 From: Jo Lynne Date: Mon, 4 Nov 2019 07:46:37 -0500 Subject: [PATCH 5/5] Update README.md Co-Authored-By: Jaclyn Taroni --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 76c8e9825c..4b54984547 100644 --- a/README.md +++ b/README.md @@ -92,7 +92,7 @@ The release notes for each release are provided in the `release-notes.md` file t * Somatic Copy Number Variant (CNV) data are provided in a modified [SEG format](https://software.broadinstitute.org/software/igv/SEG) for each of the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#somatic-copy-number-variant-calling). * The CNVkit SEG file has an additional column `copy.num` to denote copy number of each segment, derived from the CNS file output of the algorithm described [here](https://cnvkit.readthedocs.io/en/stable/fileformats.html). * The ControlFreeC TSV file is a merge of `*_CNVs` files produced from the algorithm, and columns are described [here](http://boevalab.inf.ethz.ch/FREEC/tutorial.html#OUTPUT). - * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _copy number_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations. + * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _status_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations. | Ploidy | Copy Number | Gain/Loss Interpretation | |--------|-------------|------------------------------|