Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.3 release candidate #418

Merged
merged 108 commits into from
Apr 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
540b70b
update CODEOWNERS
ahwagner Mar 3, 2022
a1d38bc
absolute copy and relative copy schemas
ahwagner Mar 3, 2022
9147952
absolute CN docs
ahwagner Mar 3, 2022
8fc1a1f
explain ref agree normalization rules
reece Mar 6, 2022
f832a34
fill paragraphs for consistency
reece Mar 6, 2022
2a7521f
swapped order of FJ and normalization rational design decisions
reece Mar 6, 2022
d56ddb7
add docs
ahwagner Mar 8, 2022
d4d3d1d
build artifacts
ahwagner Mar 8, 2022
829975d
Merge pull request #382 from ga4gh/issue-277
ahwagner Mar 10, 2022
df49a60
Update validation models for Copy Number variation
korikuzma Mar 11, 2022
4bbecc0
Merge pull request #383 from ga4gh/update-validation
korikuzma Mar 14, 2022
857642e
Move SequenceExpression ahead of SequenceState
jsstevenson Mar 15, 2022
8bba76a
Merge pull request #384 from ga4gh/sequence-order
jsstevenson Mar 15, 2022
f9765b4
closes 386
ahwagner Mar 24, 2022
15c42b8
second revision per https://github.com/ga4gh/vrs/pull/387\#issuecomme…
ahwagner Mar 24, 2022
67c4768
add Sphinx version to requirements
ahwagner Mar 24, 2022
9eae683
update python version
ahwagner Mar 24, 2022
6994599
revert to 3.8 for RTD support
ahwagner Mar 24, 2022
0b0c880
I think jinja2 broke stuff for RTD, fixing version
ahwagner Mar 24, 2022
8cde64e
Merge pull request #387 from ga4gh/issue-386
ahwagner Mar 28, 2022
70eaaed
Merge branch 'main' into 377-normalization-rationale
reece Mar 28, 2022
b12347c
Update docs/source/appendices/design_decisions.rst
ahwagner Apr 9, 2022
4bc9ab7
Update docs/source/appendices/design_decisions.rst
ahwagner Apr 9, 2022
99e001a
Update docs/source/appendices/design_decisions.rst
ahwagner Apr 9, 2022
8dd547d
Update docs/source/appendices/design_decisions.rst
ahwagner Apr 9, 2022
4ce2504
Update docs/source/appendices/design_decisions.rst
ahwagner Apr 9, 2022
6832824
Merge pull request #381 from reece/377-normalization-rationale
ahwagner Apr 13, 2022
c79382b
Add default value for types + update readme for using smoketests
korikuzma Apr 13, 2022
ae5662e
Merge pull request #390 from ga4gh/issue-389-add-default
ahwagner Apr 14, 2022
cb3e5d6
C & P error fix in CytobandInterval
mbaudis Jul 1, 2022
751e8a7
Merge pull request #392 from mbaudis/patch-1
larrybabb Jul 1, 2022
1645e77
restricting Haplotypes to 2+ members (Tristan)
ahwagner Jul 11, 2022
7364d01
restricting Haplotypes to 2+ members (Tristan)
ahwagner Jul 11, 2022
26dfffc
add Genotype
ahwagner Jul 11, 2022
c4192bf
add defaults
ahwagner Jul 12, 2022
7d6582c
squash that bug
ahwagner Jul 12, 2022
9dc4b3b
fixed genotype molecularvariation construction error
ahwagner Jul 12, 2022
17bdd17
Merge branch 'swsu' of github.com:ga4gh/vrs into swsu
ahwagner Jul 12, 2022
fd7a3b8
genotype prefix
ahwagner Jul 12, 2022
72e0933
use strict mode from gks.metaschema
ahwagner Jul 24, 2022
2d2532d
update Genotype definition
ahwagner Jul 29, 2022
40909a3
update genotypemember definition. closees #397
ahwagner Jul 29, 2022
2def58c
update docs
ahwagner Sep 12, 2022
dc2c85a
update tests
ahwagner Sep 12, 2022
969a52b
remove unused imports
ahwagner Sep 12, 2022
983f8b3
add definition
ahwagner Sep 12, 2022
36b1268
update inheritance model
ahwagner Sep 12, 2022
6edd4e7
update note
ahwagner Sep 12, 2022
ed3306a
remove note intro
ahwagner Sep 12, 2022
429087e
reverse the VRS
ahwagner Sep 12, 2022
63d86d1
addresses https://github.com/ga4gh/vrs/pull/394#discussion_r932147364
ahwagner Sep 13, 2022
3984b52
closes #401
ahwagner Oct 3, 2022
bd172bf
fix: get smoketests to pass
korikuzma Oct 3, 2022
40ef2af
addresses https://github.com/ga4gh/vrs/pull/394#discussion_r986247682
ahwagner Oct 3, 2022
5d7f29c
Merge pull request #402 from ga4gh/fix-smoketests
korikuzma Oct 4, 2022
fa70d4e
merge main
ahwagner Oct 7, 2022
ffdc666
update metaschema proc version
ahwagner Oct 7, 2022
0f72199
enable pre-releases
ahwagner Oct 7, 2022
90893a4
fix pip install command
ahwagner Oct 7, 2022
2a718ab
merge gt & rcn docs
ahwagner Oct 7, 2022
013fe65
Merge branch 'main' into swsu
ahwagner Oct 7, 2022
92e88fd
update vrs.yaml
ahwagner Oct 7, 2022
1750ddb
Merge pull request #394 from ga4gh/swsu
ahwagner Oct 7, 2022
8e348ea
fix: absolute + relative copy number models in models.yaml
korikuzma Oct 24, 2022
f48077c
add GenotypeMember + Genotype models to validation data
korikuzma Oct 24, 2022
057342f
copy number subject is a sequence location
korikuzma Oct 24, 2022
c933513
Merge pull request #407 from ga4gh/issue-405
korikuzma Oct 26, 2022
d0446fb
fix: models.yaml to use ordered property for digests
korikuzma Nov 2, 2022
8f521e8
update models.yaml with serialization changes
korikuzma Nov 8, 2022
ebcf629
Merge pull request #409 from ga4gh/fix-models
korikuzma Nov 8, 2022
0698c46
Add tests to check schema validation in models.yaml (#406)
korikuzma Nov 10, 2022
55d0859
Add tests for ComposedSequenceExpression (#408)
korikuzma Nov 14, 2022
a8c8c90
Merge pull request #412 from ga4gh/issue-406
korikuzma Jan 16, 2023
5e6976b
Merge branch 'main' into issue-408
korikuzma Jan 16, 2023
09f3276
Merge pull request #413 from ga4gh/issue-408
korikuzma Jan 16, 2023
20afe5a
draft CopyNumberAssessment
ahwagner Mar 3, 2023
20fe924
.gitignore
ahwagner Mar 3, 2023
a747f49
build schema
ahwagner Mar 3, 2023
6a5f9ef
update validation tests
ahwagner Mar 3, 2023
2d1c6bd
update type prefixes
ahwagner Mar 3, 2023
34ce2b1
update model validation tests
ahwagner Mar 3, 2023
b143463
update Sphinx to 4.x
ahwagner Mar 6, 2023
c831651
CNV and CNA docs
ahwagner Mar 6, 2023
82a7adb
back to 3.8
ahwagner Mar 8, 2023
a270654
Update schema/vrs.json
ahwagner Mar 8, 2023
0b22fdb
build JSON
ahwagner Mar 8, 2023
3f29329
addresses #404
ahwagner Mar 31, 2023
2329b62
baseline ploidy as state
ahwagner Mar 31, 2023
d8e37cf
CSE initial draft
ahwagner Mar 31, 2023
0c0c977
Merge pull request #416 from ga4gh/issue-404
ahwagner Apr 1, 2023
4599dcd
clarify example
ahwagner Apr 1, 2023
3350089
Merge branch 'issue-363'
ahwagner Apr 1, 2023
06dbd46
Merge branch '1.3' into main
ahwagner Apr 1, 2023
981442b
add 1.3 release notes
ahwagner Apr 1, 2023
225a7cb
Merge remote-tracking branch 'origin/main'
ahwagner Apr 1, 2023
f62eb59
fix documentation build errors
ahwagner Apr 1, 2023
3e1ab48
Merge branch '1.3' into main
ahwagner Apr 1, 2023
842b213
style bug fix attempt
ahwagner Apr 1, 2023
a7c442f
Merge remote-tracking branch 'origin/main'
ahwagner Apr 1, 2023
4bdf9b6
fix markup error
ahwagner Apr 1, 2023
cb2e546
remove extra tildes
ahwagner Apr 1, 2023
15366b8
remove redundant descriptive description
ahwagner Apr 1, 2023
3ba3ed0
add v1.2 ref
ahwagner Apr 1, 2023
7823558
relative_copy_class -> copy_assessment
mbaudis Apr 1, 2023
db53f2d
update change_assertion
ahwagner Apr 1, 2023
00d2dca
update example
ahwagner Apr 1, 2023
f5f0feb
Merge pull request #419 from mbaudis/patch-2
ahwagner Apr 2, 2023
c4595ae
update efo codes to CURIEs
ahwagner Apr 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip setuptools
pip install -r .requirements.txt
pip install --pre -r .requirements.txt

- name: Test with pytest
run: |
Expand Down
7 changes: 4 additions & 3 deletions .requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
pytest
python-jsonschema-objects>=0.3,<=0.3.10
python-jsonschema-objects>=0.4.0
jsonschema==3.2.0
ipython
pyyaml
ga4gh.gks.metaschema>=0.1.1
sphinx ~= 3.5
ga4gh.gks.metaschema==0.2.0rc4
sphinx ~= 4.5
sphinx-rtd-theme ~= 1.2
96 changes: 53 additions & 43 deletions docs/source/appendices/design_decisions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,11 @@ Allele Rather than Variant
The most primitive sequence assertion in VRS is the :ref:`Allele`
entity. Colloquially, the words "allele" and "variant" have similar
meanings and they are often used interchangeably. However, the VR
contributors believe that it is essential to distinguish the state of
the sequence from the change between states of a sequence. It is
contributors assert that it is essential to distinguish between the *state of*
a reference sequence from the *change from* a reference sequence. It is
imperative that precise terms are used when modelling data. Therefore,
within VRS, Allele refers to a state and "variant" refers to the change
from one Allele to another.
within VRS, "allele" refers to a state of a reference sequence and "variant" refers to a change
from a reference sequence.

The word "variant", which implies change, makes it awkward to refer to
the (unchanged) reference allele. Some systems will use an HGVS-like
Expand All @@ -45,45 +45,6 @@ when referring to an unchanged residue. In some cases, such "variants"
are even associated with allele frequencies. Similarly, a predicted
consequence is better associated with an allele than with a variant.

.. _should-normalize:

Implementations should normalize
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

VRS STRONGLY RECOMMENDS that Alleles be :ref:`normalized
<normalization>` when generating :ref:`computed identifiers
<computed-identifiers>`. The rationale for recommending, rather than
requiring, normalization is grounded in dual views of Allele objects
with distinct interpretations:

* Allele as minimal representation of a change in sequence. In this
view, normalization is a process that makes the representation
minimal and unambiguous.

* Allele as an assertion of state. In this view, it is reasonable to
want to assert state that may include (or be composed entirely of)
reference bases, for which the normalization process would alter the
intent.

Although this rationale applies only to Alleles, it may have have
parallels with other VRS types. In addition, it is desirable for all
VRS types to be treated similarly.

Furthermore, if normalization were required in order to generate
:ref:`computed-identifiers`, but did not apply to certain instances of
VRS Variation, implementations would likely require secondary
identifier mechanisms, which would undermine the intent of a global
computed identifier.

The primary downside of not requiring normalization is that Variation
objects might be written in non-canonical forms, thereby creating
unintended degeneracy.

Therefore, normalization of all VRS Variation classes is optional in
order to support the view of Allele as an assertion of state on a
sequence.



.. _fully-justified:

Expand Down Expand Up @@ -113,6 +74,55 @@ occurs in a low-complexity region, but rather describes the final and
unambiguous state of the resultant sequence.


.. _should-normalize:

Implementations should normalize Alleles
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

VRS STRONGLY RECOMMENDS that Alleles be :ref:`normalized
<normalization>` when generating :ref:`computed identifiers
<computed-identifiers>` unless there is compelling reason to do
otherwise. Those reasons are the subject of this section.

:ref:`Allele Normalization <normalization>` is the process of
comparing a span of reference sequence to a sequence state (often the
alternative sequence) and resolving that span to an unambiguous form. The fully-justified Allele normalization in VRS consists of two steps: trimming
and shuffling. In the trimming step, common flanking prefix and
suffix sequences are removed. For example, a CAG-to-CTG Allele would
be trimmed to merely A-to-T, with the position adjusted accordingly.
There are four cases of the resulting sequences:

1. The trimmed sequences are empty: The Allele refers to reference
state.
2. The trimmed sequences are non-empty: The Allele is a substitution
(perhaps multi-residue).
3. The reference sequence is empty: The Allele is a net insertion.
4. The state sequence is empty: The Allele is a net deletion.

When the Allele refers to a reference state (case 1), trimming would
reduce the variant to a null change. However, reduction to a null
state would make it impossible to refer to a specific span of
reference sequence. In order to permit users to refer to spans of
reference sequence, VRS does not require normalizing reference
agreement Alleles.

The trimming step applies only when the reference or the state
sequences are empty (cases 3 and 4). When these occur in the context
of repeating reference sequence that matches the inserted or deleted
sequence, the Allele may be shuffled left and right to identify the
fully-justified location of the variation. (See :ref:`normalization`
for details.)

In rare cases, data originators might have reason to associate an
annotation with a specific repeating unit in the context of repeated
sequence. In order to support this case, normalization is not
strictly required.

Most users will normalize most Alleles. Normalization should be
skipped only when doing so would decrease the intended precision of an
Allele.


.. _inter-residue-coordinates-design:

Inter-residue Coordinates
Expand Down
123 changes: 0 additions & 123 deletions docs/source/appendices/future_plans.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,129 +96,6 @@ Under consideration. See https://github.com/ga4gh/vrs/issues/28.
t(9;22)(q34;q11) in BCR-ABL


.. _genotype:

Genotype
########

The genetic state of an organism, whether complete (defined over the
whole genome) or incomplete (defined over a subset of the genome).

**Computational definition**

A list of Haplotypes.

**Information model**

.. list-table::
:class: reece-wrap
:header-rows: 1
:align: left
:widths: auto

* - Field
- Type
- Limits
- Description
* - _id
- :ref:`CURIE`
- 0..1
- Variation Id; MUST be unique within document
* - type
- string
- 1..1
- Variation type; MUST be set to '**Genotype**'
* - completeness
- enum
- 1..1
- Declaration of completeness of the Haplotype definition.
Values are:

* UNKNOWN: Other Haplotypes may exist.
* PARTIAL: Other Haplotypes exist but are unspecified.
* COMPLETE: The Genotype declares a complete set of Haplotypes.

* - members
- :ref:`Haplotype`\[] or :ref:`CURIE`\[]
- 0..*
- List of Haplotypes or Haplotype identifiers; length MUST agree
with ploidy of genomic region


**Implementation guidance**

* Haplotypes in a Genotype MAY occur at different locations or on
different reference sequences. For example, an individual may have
haplotypes on two population-specific references.
* Haplotypes in a Genotype MAY contain differing numbers of Alleles or
Alleles at different Locations.

**Notes**

* The term "genotype" has two, related definitions in common use. The
narrower definition is a set of alleles observed at a single
location and with a ploidy of two, such as a pair of single residue
variants on an autosome. The broader, generalized definition is a
set of alleles at multiple locations and/or with ploidy other than
two.The VRS Genotype entity is based on this broader definition.
* The term "diplotype" is often used to refer to two haplotypes. The
VRS Genotype entity subsumes the conventional definition of
diplotype. Therefore, the VRS model does not include an explicit
entity for diplotypes. See :ref:`this note
<genotypes-represent-haplotypes-with-arbitrary-ploidy>` for a
discussion.
* The VRS model makes no assumptions about ploidy of an organism or
individual. The number of Haplotypes in a Genotype is the observed
ploidy of the individual.
* In diploid organisms, there are typically two instances of each
autosomal chromosome, and therefore two instances of sequence at a
particular location. Thus, Genotypes will often list two
Haplotypes. In the case of haploid chromosomes or
haploinsufficiency, the Genotype consists of a single Haplotype.
* A consequence of the computational definition is that Haplotypes at
overlapping or adjacent intervals MUST NOT be included in the same
Genotype. However, two or more Alleles MAY always be rewritten as an
equivalent Allele with a common sequence and interval context.
* The rationale for permitting Genotypes with Haplotypes defined on
different reference sequences is to enable the accurate
representation of segments of DNA with the most appropriate
population-specific reference sequence.

**Sources**

SO: `Genotype (SO:0001027)
<http://www.sequenceontology.org/browser/current_svn/term/SO:0001027>`__
— A genotype is a variant genome, complete or incomplete.

.. _genotypes-represent-haplotypes-with-arbitrary-ploidy:

.. note:: Genotypes represent Haplotypes with arbitrary ploidy
The VRS defines Haplotypes as a list of Alleles, and Genotypes as
a list of Haplotypes. In essence, Haplotypes and Genotypes represent
two distinct dimensions of containment: Haplotypes represent the "in
phase" relationship of Alleles while Genotypes represents sets of
Haplotypes of arbitrary ploidy.

There are two important consequences of these definitions: There is no
single-location Genotype. Users of SNP data will be familiar with
representations like rs7412 C/C, which indicates the diploid state at
a position. In the VRS, this is merely a special case of a
Genotype with two Haplotypes, each of which is defined with only one
Allele (the same Allele in this case). The VRS does not define a
diplotype type. A diplotype is a special case of a VRS Genotype
with exactly two Haplotypes. In practice, software data types that
assume a ploidy of 2 make it very difficult to represent haploid
states, copy number loss, and copy number gain, all of which occur
when representing human data. In addition, assuming ploidy=2 makes
software incompatible with organisms with other ploidy. The VRS
makes no assumptions about "normal" ploidy.

In other words, the VRS does not represent single-position
Genotypes or diplotypes because both concepts are subsumed by the
Allele, Haplotype, and Genotypes entities.



.. _GitHub issue: https://github.com/ga4gh/vrs/issues
.. _genetic variation: https://en.wikipedia.org/wiki/Genetic_variation

Expand Down
6 changes: 2 additions & 4 deletions docs/source/impl-guide/computed_identifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,7 @@ If the object is an instance of a VRS class, implementations MUST:
* ensure that objects are referenced with identifiers in the
``ga4gh`` namespace
* replace each nested :term:`identifiable object` with their
corresponding *digests*. (Note: Attributes of some objects, such
as :ref:`CopyNumber`, permit a mix of identifiable and
non-identifiable values.)
corresponding *digests*.
* order arrays of digests and ids by Unicode Character Set values
* filter out fields that start with underscore (e.g., `_id`)
* filter out fields with null values
Expand Down Expand Up @@ -193,7 +191,7 @@ Truncated Digest (sha512t24u)
The sha512t24u truncated digest algorithm [Hart2020]_ computes an ASCII digest
from binary data. The method uses two well-established standard
algorithms, the `SHA-512`_ hash function, which generates a binary
digest from binary data, and `Base64`_ URL encoding, which encodes
digest from binary data, and a URL-safe variant of `Base64`_ encoding, which encodes
binary data using printable characters.

Computing the sha512t24u truncated digest for binary data consists of
Expand Down
10 changes: 5 additions & 5 deletions docs/source/releases/1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ Major Changes
#############

* :ref:`CopyNumberChange` introduced for relative copy number calls
* :ref:`CopyNumberCount` replaces `CopyNumber`
* :ref:`Genotype` introduced for describing genotypes
* :ref:`ComposedSequenceExpression` introduced for composing expressions
from multiple other sequence expressions
* :ref:`CopyNumberCount` replaces `CopyNumber (v1.2) <https://vrs.ga4gh.org/en/1.2.1/terms_and_model.html#copynumber>`_
* :ref:`Genotype` introduced as a new systemic variation concept
* :ref:`ComposedSequenceExpression` introduced for composing expressions from multiple other sequence expressions

Minor Changes
#############

* Clarifying updates for :ref:`Allele normalization guidance <>`
* Clarifying updates for :ref:`Allele normalization guidance
<should-normalize>`
* :ref:`Haplotype` allele member minimum was revised from 1 to 2
* Updated metaschema processor version
* Introduced ordered / unordered attribute in array declarations
Expand Down
1 change: 1 addition & 0 deletions docs/source/releases/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Releases
:maxdepth: 2
:includehidden:

1.3.rst
1.2.rst
1.1.rst
1.0.rst
Loading