-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNA long format (RFC 65) #9847
CNA long format (RFC 65) #9847
Conversation
77adf0c
to
9660ad2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Lots of small changes :)
core/src/main/java/org/mskcc/cbio/portal/scripts/GeneticAlterationImporter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/GeneticAlterationImporter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/GeneticAlterationImporter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/GeneticAlterationImporter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/GeneticAlterationImporter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/ImportTabDelimData.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/ImportTabDelimData.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/dao/DaoGeneticProfileSamples.java
Show resolved
Hide resolved
b2a9291
to
2ef2c99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BasLee! Just a few minor comments about this pr, it looks good to me.
core/src/main/java/org/mskcc/cbio/portal/scripts/ImportCnaDiscreteLongData.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/scripts/ImportCnaDiscreteLongData.java
Show resolved
Hide resolved
There is one more suggestion about this pull request, since this pr introduces a new data format, we should also add a new method to validate this new data format in validateData.py |
Good point! The validator should be updated. I discussed it with @pvannierop @oplantalech and we would like to do this in a separate PR. |
@averyniceday Yes, we restrict a study to have only one format. I do not see how the recognition of the new format will differ from any other data types. The meta file for discrete CNA will reference the format used (long or legacy wide). Or am I overlooking a complication here? |
@pvannierop This might be specific to how we import on our end and how the importer is implemented (in conjunction with how we organize datatypes in google sheets). We currently have a one to one mapping of meta to data files (e.g. meta_cna to data_cna). We also have a step inside the importer that merges records for genes (I imagine this is implemented to be specific to the legacy format). If we are not planning on introducing a new meta/data pairing for the long format (e.g. data_cna_long/meta_cna_long) and instead only differentiating by a field inside the metafile, the importer will need to be updated to be able to differentiate between the two based on the metafile contents. |
@averyniceday The current wide format is still supported by this change (there is 100% backward compatibility). Therefore, I fail to see how this PR could possible interfere with your current processes. This PR merely adds a format. If you internally decide not to use this long format for the time being you should be fine. And on a note of process, I think it is not really proper procedure to have merging of PRs to cBioPortal codebase to depend on update of internal tooling. |
@pvannierop @BasLee Agreed, this doesn't need to hold up merging into the cbioportal codebase. 👍 Could you also add documentation for the new format in the File Formats section? |
@averyniceday The documentation will be added for sure! We will do this in the PR that updates the python validator script. |
e78f1b4
to
b90f830
Compare
core/src/main/java/org/mskcc/cbio/portal/scripts/ImportCnaDiscreteLongData.java
Outdated
Show resolved
Hide resolved
@BasLee CNA long format validator merged, please feel free to rebase this pr. |
773bff4
to
3a09063
Compare
3a09063
to
a0f749d
Compare
Rebased and fixed some validation issues, most (relevant?) checks seem to run now |
After some discussion with @pvannierop: The new Only the importer should have knowledge of the These changes can be found in 9c92524 |
9c92524
to
6f5b13d
Compare
SonarCloud Quality Gate failed. |
Add long data format for CNA as described in RFC 65.
Looking for feedback on the new importer and functionality that should be shared between the new ImportCnaDiscreteLongData and ImportTabDelimData.
Sample of new data format
In the current CNA data format the events of each gene x sample are stored in a tabular format, with a column for each sample and a row for each gene.
The new CNA 'long' data format contains a row for each gene x sample.
Thie new data format will import data using a new importer, the old data format is kept as is. Shared functionality should be extracted as much as possible.
Changes
ImportCnaDiscreteLongData
for CNA long formatImportTabDelimData
andImportCnaDiscreteLongData
intoCnaUtil
Database schema change
Tests
ImportCnaDiscreteLongData
inTestImportCnaDiscreteLongData