Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust ClinVar XML parsing to cardinality fixes #233

Closed
holtgrewe opened this issue Jun 7, 2024 · 1 comment · Fixed by #234
Closed

Adjust ClinVar XML parsing to cardinality fixes #233

holtgrewe opened this issue Jun 7, 2024 · 1 comment · Fixed by #234
Labels
enhancement New feature or request

Comments

@holtgrewe
Copy link
Contributor

Is your feature request related to a problem? Please describe.
NCBI Helpdesk clarified that maxOccurences="1" is the default when omitted. Further, a number of elements appear always ommitted.

Describe the solution you'd like
Adjust protobuf code and XML interpretation code accordingly. Move questionable fields to the end of the messages so we can remove them without harm.

Describe alternatives you've considered
N/A

Additional context
N/A

@holtgrewe holtgrewe added the enhancement New feature or request label Jun 7, 2024
@holtgrewe
Copy link
Contributor Author

holtgrewe commented Jun 7, 2024

Will change the following for now. We might need to adjust this to the upcoming v2.1 change to the XSD.

diff --git a/protos/clinvar_data/pbs/clinvar_public.proto b/protos/clinvar_data/pbs/clinvar_public.proto
index d471690..3b01062 100644
--- a/protos/clinvar_data/pbs/clinvar_public.proto
+++ b/protos/clinvar_data/pbs/clinvar_public.proto
@@ -731,28 +731,21 @@ message Trait {

     /* nested elements */

-    // names
-    //
-    // NB: in XSD this is explictely given as unbounded but XML always has
-    // one element
-    repeated GenericSetElement names = 1;
-    // symbols (NB: never occur in the XML)
-    repeated GenericSetElement symbols = 2;
-    // attributes (NB: never occur in the XML)
-    repeated AttributeSetElement attributes = 3;
+    // Name of the trait.
+    GenericSetElement name = 1;
     // Citation list.
-    repeated Citation citations = 4;
+    repeated Citation citations = 2;
     // Xref list.
-    repeated Xref xrefs = 5;
+    repeated Xref xrefs = 3;
     // Comment list.
-    repeated Comment comments = 6;
+    repeated Comment comments = 4;
     // Sources
-    repeated string sources = 7;
+    repeated string sources = 5;

     /* attributes */

     // Trait type.
-    Type type = 8;
+    Type type = 6;
   }

   // names
@@ -985,7 +978,7 @@ message AggregateClassificationSet {
   // The aggregate germline classification.
   optional AggregatedGermlineClassification germline_classification = 1;
   // The aggregate somatic clinical impact.
-  repeated AggregatedSomaticClinicalImpact somatic_clinical_impacts = 2;
+  optional AggregatedSomaticClinicalImpact somatic_clinical_impact = 2;
   // The aggregate oncogenicity classification.
   optional AggregatedOncogenicityClassification oncogenicity_classification = 3;
 }
@@ -1116,7 +1109,7 @@ message ClassificationScv {
   optional string germline_classification = 2;
   // Information on the clinical impact; mutually exlusive with `germline_classification`
   // and `oncogenicity_classification`.
-  optional SomaticClinicalImpact somatic_clinical_impacts = 3;
+  optional SomaticClinicalImpact somatic_clinical_impact = 3;
   // The oncogenicity classification; mutually exlusive with `germline_classification`
   // and `oncogenicity_classification`.
   optional string oncogenicity_classification = 4;
@@ -1770,7 +1763,7 @@ message AlleleScv {
   // being reported.
   repeated Gene genes = 1;
   // Name provided by the submitter.
-  repeated OtherName names = 2;
+  OtherName name = 2;
   // Variant type.
   optional string variant_type = 3;
   // Location.
@@ -2129,8 +2122,8 @@ message ClinicalAssertion {
   // Replaced list; mutually exclusive with replaces
   repeated ClinicalAssertionRecordHistory replaceds = 6;

-  // SCV classifications.
-  repeated ClassificationScv classifications = 7;
+  // SCV classification.
+  ClassificationScv classification = 7;
   // The assertion.
   Assertion assertion = 8;
   // Attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant