Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRS Hackathon Genotype draft #394

Merged
merged 31 commits into from
Oct 7, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ddecafe
remove ComposedSequenceExpression from 1.2, belongs in 1.3+
ahwagner Feb 24, 2022
f44aef1
Merge pull request #380 from ga4gh/1.2.2-patch
ahwagner Feb 25, 2022
beb269a
update requirements for RTD builds
ahwagner Mar 28, 2022
7364d01
restricting Haplotypes to 2+ members (Tristan)
ahwagner Jul 11, 2022
26dfffc
add Genotype
ahwagner Jul 11, 2022
c4192bf
add defaults
ahwagner Jul 12, 2022
7d6582c
squash that bug
ahwagner Jul 12, 2022
9dc4b3b
fixed genotype molecularvariation construction error
ahwagner Jul 12, 2022
17bdd17
Merge branch 'swsu' of github.com:ga4gh/vrs into swsu
ahwagner Jul 12, 2022
fd7a3b8
genotype prefix
ahwagner Jul 12, 2022
72e0933
use strict mode from gks.metaschema
ahwagner Jul 24, 2022
2d2532d
update Genotype definition
ahwagner Jul 29, 2022
40909a3
update genotypemember definition. closees #397
ahwagner Jul 29, 2022
2def58c
update docs
ahwagner Sep 12, 2022
dc2c85a
update tests
ahwagner Sep 12, 2022
969a52b
remove unused imports
ahwagner Sep 12, 2022
983f8b3
add definition
ahwagner Sep 12, 2022
36b1268
update inheritance model
ahwagner Sep 12, 2022
6edd4e7
update note
ahwagner Sep 12, 2022
ed3306a
remove note intro
ahwagner Sep 12, 2022
429087e
reverse the VRS
ahwagner Sep 12, 2022
63d86d1
addresses https://github.com/ga4gh/vrs/pull/394#discussion_r932147364
ahwagner Sep 13, 2022
3984b52
closes #401
ahwagner Oct 3, 2022
40ef2af
addresses https://github.com/ga4gh/vrs/pull/394#discussion_r986247682
ahwagner Oct 3, 2022
fa70d4e
merge main
ahwagner Oct 7, 2022
ffdc666
update metaschema proc version
ahwagner Oct 7, 2022
0f72199
enable pre-releases
ahwagner Oct 7, 2022
90893a4
fix pip install command
ahwagner Oct 7, 2022
2a718ab
merge gt & rcn docs
ahwagner Oct 7, 2022
013fe65
Merge branch 'main' into swsu
ahwagner Oct 7, 2022
92e88fd
update vrs.yaml
ahwagner Oct 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 16 additions & 17 deletions docs/source/terms_and_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -267,11 +267,8 @@ genetic markers that tend to be transmitted together.
* The locations of Alleles within the Haplotype MUST be interpreted
independently. Alleles that create a net insertion or deletion of
sequence MUST NOT change the location of "downstream" Alleles.
* The `members` attribute is required and MUST contain at least one
Allele.
* Haplotypes with one Allele are intended to be distinct entities from
the Allele by itself. See discussion on :ref:`equivalence`.

* The `members` attribute is required and MUST contain at least two
Alleles.

**Sources**

Expand Down Expand Up @@ -435,14 +432,15 @@ objects (which would otherwise be represented using symbolic shorthand).
set of alleles at multiple locations and/or with ploidy other than
two. VRS Genotype entity is based on this broader definition.
* The term "diplotype" is often used to refer to two in-trans haplotypes at a locus.
VRS Genotype entity subsumes the conventional definition of diplotype. Therefore,
VRS Genotype entity subsumes the conventional definition of diplotype, though
it describes no explicit in-trans phase relationship. Therefore,
VRS does not include an explicit entity for diplotypes. See :ref:`this note
<genotypes-represent-haplotypes-with-arbitrary-ploidy>` for a discussion.
* VRS makes no assumptions about ploidy of an organism or individual nor any
polysomy affecting a locus. The `genotype.count` attribute explicitly captures the total
count of in-trans molecules at a genomic locus represented by the Genotype.
count of molecules associated with a genomic locus represented by the Genotype.
* In diploid organisms, there are typically two instances of each autosomal chromosome,
and therefore two instances of sequence at a particular location. Thus, Genotypes will
and therefore two instances of sequence at a particular locus. Thus, Genotypes will

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add that if the desire is to express a specific diplotype, it could be represented as a genotype of two haplotypes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Added some text for this in 40ef2af.

often list two GenotypeMembers each based on a distinct Haplotype or Allele. In the case
of haploid chromosomes or haploinsufficiency, the Genotype consists of a single GenotypeMember.
* A consequence of the computational definition is that in-cis Haplotypes at overlapping or
Expand All @@ -451,10 +449,11 @@ objects (which would otherwise be represented using symbolic shorthand).
When more than one Genotype Member would have the same `variation` value (e.g. in the case
of a homozygous variant), this would be represented as a Genotype Value with a corresponding
`count` (i.e. for a diploid homozygous variant, `GenotypeMember.count = 2`).
* The rationale for permitting Genotypes with Haplotypes defined on
different reference sequences is to enable the accurate
representation of segments of DNA with the most appropriate
population-specific reference sequence.
* The rationale for permitting Genotypes with Haplotypes defined on different reference
sequences is to enable the accurate representation of segments of DNA with the most
appropriate population-specific reference sequence.
* Deletion of sequence at locus would be represented by the presence of Alleles of deleted
sequence, not absence of Alleles; therefore Genotypes MAY NOT have count < 1.

**Sources**

Expand All @@ -465,8 +464,8 @@ SO: `Genotype (SO:0001027)
.. _genotypes-represent-haplotypes-with-arbitrary-ploidy:

.. note::
VRS defines Genotypes as a list of GenotypeMembers defined by Haplotypes
or Alleles. In essence, Haplotypes and Genotypes represent
VRS defines Genotypes using a list of GenotypeMembers defined by
Haplotypes or Alleles. In essence, Haplotypes and Genotypes represent
two distinct dimensions of containment: Haplotypes represent the "in
phase" relationship of Alleles while Genotypes represents sets of
Haplotypes of arbitrary ploidy.
Expand All @@ -482,9 +481,9 @@ SO: `Genotype (SO:0001027)
states, copy number loss, and copy number gain, all of which occur
when representing human data. In addition, inferred ploidy = 2 makes
software incompatible with organisms with other ploidy. VRS
requires explicit definition of the in-trans molecules at a genomic locus
with the `count` attribute, though this count may be inexact (e.g. a
:ref:`DefiniteRange` or :ref:`IndefiniteRange`.
requires explicit definition of the count of molecules associated with
a genomic locus using the `count` attribute, though this count may be inexact
(e.g. a :ref:`DefiniteRange` or :ref:`IndefiniteRange`).

.. _UtilityVariation:

Expand Down
4 changes: 2 additions & 2 deletions schema/defs/vrs/Genotype.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
**Computational Definition**

A quantified set of *in-trans* :ref:`MolecularVariation` at a genomic locus.
A quantified set of :ref:`MolecularVariation` associated with a genomic locus.

**Information Model**

Expand Down Expand Up @@ -31,4 +31,4 @@ Some Genotype attributes are inherited from :ref:`Variation`.
* - count
- :ref:`Number` | :ref:`IndefiniteRange` | :ref:`DefiniteRange`
- 1..1
- The total number of copies of all :ref:`MolecularVariation` at this locus, MUST be greater than or equal to the sum of :ref:`GenotypeMember` copy counts. If greater than the total counts, this implies additional :ref:`MolecularVariation` that are expected to exist but are not explicitly indicated.
- The total number of copies of all :ref:`MolecularVariation` at this locus, MUST be greater than or equal to the sum of :ref:`GenotypeMember` copy counts and MUST be greater than or equal to 1. If greater than the total of GenotypeMember counts, this field describes additional :ref:`MolecularVariation` that exist but are not explicitly described.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why the total nr of genotypemember counts can be different from the overall count. Do you have an example where this might be needed?

Copy link
Member Author

@ahwagner ahwagner Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example we referenced when considering this use case was provided by @larrybabb, in this example report from eMERGE. What was found is that in these cases a heterozygous variant is reported, but no mention to the second allele (presumably reference-agree) is given.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, should we have a recommendation then how to represent knowledge about reference-state on one of the chromosomes as part of this? It feels like this is a common enough scenario so it might be good to provide more documentation.

11 changes: 6 additions & 5 deletions schema/vrs-source.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ definitions:
Genotype:
inherits: SystemicVariation
description: >-
ahwagner marked this conversation as resolved.
Show resolved Hide resolved
A quantified set of *in-trans* :ref:`MolecularVariation` at a genomic locus.
A quantified set of :ref:`MolecularVariation` associated with a genomic locus.
type: object
properties:
type:
Expand All @@ -281,10 +281,11 @@ definitions:
- $ref: "#/definitions/DefiniteRange"
description: >-
The total number of copies of all :ref:`MolecularVariation` at this locus,
larrybabb marked this conversation as resolved.
Show resolved Hide resolved
MUST be greater than or equal to the sum of :ref:`GenotypeMember` copy counts.
If greater than the total counts, this implies additional
:ref:`MolecularVariation` that are expected to exist but are not explicitly
indicated.
MUST be greater than or equal to the sum of :ref:`GenotypeMember` copy counts
and MUST be greater than or equal to 1.
If greater than the total of GenotypeMember counts, this field describes
additional :ref:`MolecularVariation` that exist but are not
explicitly described.
required: [ "members", "count" ]


Expand Down
4 changes: 2 additions & 2 deletions schema/vrs.json
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@
"additionalProperties": false
},
"Genotype": {
"description": "A quantified set of *in-trans* MolecularVariation at a genomic locus.",
"description": "A quantified set of MolecularVariation associated with a genomic locus.",
"type": "object",
"properties": {
"_id": {
Expand Down Expand Up @@ -398,7 +398,7 @@
"$ref": "#/definitions/Number"
}
],
"description": "The total number of copies of all MolecularVariation at this locus, MUST be greater than or equal to the sum of GenotypeMember copy counts. If greater than the total counts, this implies additional MolecularVariation that are expected to exist but are not explicitly indicated."
"description": "The total number of copies of all MolecularVariation at this locus, MUST be greater than or equal to the sum of GenotypeMember copy counts and MUST be greater than or equal to 1. If greater than the total of GenotypeMember counts, this field describes additional MolecularVariation that exist but are not explicitly described."
}
},
"required": [
Expand Down
10 changes: 6 additions & 4 deletions schema/vrs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,8 @@ definitions:
- type
additionalProperties: false
Genotype:
description: A quantified set of *in-trans* MolecularVariation at a genomic locus.
description: A quantified set of MolecularVariation associated with a genomic
locus.
type: object
properties:
_id:
Expand All @@ -240,9 +241,10 @@ definitions:
- $ref: '#/definitions/IndefiniteRange'
- $ref: '#/definitions/Number'
description: The total number of copies of all MolecularVariation at this
locus, MUST be greater than or equal to the sum of GenotypeMember copy counts.
If greater than the total counts, this implies additional MolecularVariation
that are expected to exist but are not explicitly indicated.
locus, MUST be greater than or equal to the sum of GenotypeMember copy counts
and MUST be greater than or equal to 1. If greater than the total of GenotypeMember
counts, this field describes additional MolecularVariation that exist but
are not explicitly described.
required:
- count
- members
Expand Down