Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental data upload for mutations, case lists, patient and sample attributes #32

Merged
merged 42 commits into from
May 15, 2024

Conversation

forus
Copy link
Contributor

@forus forus commented Apr 14, 2024

What is done in this PR:

  • Enhancements to the DAO Layer: Introduced new methods in the Data Access Object (DAO) layer to enable:

    • Removal of sample attributes using the sample's internal ID.
    • Deletion of mutations associated with a specific genetic profile ID and sample internal ID.
  • Reversion of Method Deprecations: Reinstated previously deprecated methods for retrieving mutations.

  • Data Import Extension:

    • Implemented a new --overwrite-existing flag for the ImportClinicalData and ImportProfileData scripts, allowing for the re-upload of entries if they already exist in the database.
  • Python scripts Extension

    • Modified loader and validation scripts to work with partial incremental upload data.
  • Testing Enhancements: Added integration tests to verify the incremental upload functionality for sample and mutation data.

Demo (you can download high-quality video here):
incremental-upload-demo

study_es_0_inc folder data description (the green = new entries; the yellow = updated entries; the light blue = existing in db):
Screenshot 2024-04-10 at 00 02 22

forus added 30 commits April 14, 2024 13:02
To make the dataset look like real data in the database
Apperently, the flag does not change anything.
But we add it anyway as the tests for "incremental" data upload.
adding to the all case list and case list specified with command arguments is supported
From case lists that is not _all case list and not specified with --add-to-case-lists option
We changed them to work for the demo.
Mutation numbers did not change on demo.
Not it was easy to be confused where sample and clinical_sample (attributes),
patient and clinical_patient (attributes) related code
This flag for command to upload molecular profile data
- change location of the files
- make sure assertions could work on the seed mini db
- get rid from absent cbioportal dependencies
@forus forus removed the DO NOT MERGE This is not yet ready for merge label Apr 30, 2024
@forus forus changed the title WIP: Incremental data upload for mutations, case lists, patient and sample attributes Incremental data upload for mutations, case lists, patient and sample attributes Apr 30, 2024
@forus forus force-pushed the inc-data-upload-poc branch 3 times, most recently from c87f63f to 63fa0b9 Compare April 30, 2024 16:21
@forus forus force-pushed the inc-data-upload-poc branch 3 times, most recently from 82b7bd3 to 1b6ba41 Compare May 2, 2024 08:50
Copy link
Collaborator

@haynescd haynescd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@forus forus merged commit a693eac into cBioPortal:rfc79 May 15, 2024
1 check passed
forus added a commit that referenced this pull request May 15, 2024
Incremental data upload for mutations, case lists, patient and sample attributes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants