-
Notifications
You must be signed in to change notification settings - Fork 67
Independent sample analysis (2 of 2) #171
Independent sample analysis (2 of 2) #171
Conversation
This workbook is an analysis of the cases where we have multiple samples from the same participant, to explore how best to manage future analysis where we want only a single sample from each tumor
…nto jashapiro/independent-samples
Script complete, and generated files are in directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one question to check my understanding. I did pull this locally and kick the tires a bit. Everything was as expected/described as far as the sample sets go.
Should this PR include adding canonical versions of these files elsewhere in the project? If so, where? Should they be part of the data download?
I am a big fan of including this as part of the download. Is this something we could potentially do as part of #146 @jharenza?
How do you envision this in the download? A separate clinical file or as another column in the clinical file for which samples to use? I am working on adding a lot more info to the clinical file, and we also have two samples which need to be pulled due to non-consent, so I would suggest not as another file, but we could add a column if you give me the final list? |
I would think having a separate file(s) is probably the easiest. You can run the |
OK - I can provide these as lists in the release so people can extract those samples from the clinical file (since so many clinical file edits are upcoming). Once the PR is merged, can you point me to the output files? I will also have to remove two newly-annotated as non-consent samples. I haven't yet set up docker for this and to do so will require some tutorial-ing on my end and have a pressing deadline nov 15, so won't be able to do this until after then. |
Purpose/implementation
Generate files of independent samples for downstream analysis where more than one samples from the same indidvidual would bias analysis or is otherwise not desirable
Issue
#155
Directions for reviewers
Do the sample lists look reasonable (truly independent?)
Should this PR include adding canonical versions of these files elsewhere in the project? If so, where? Should they be part of the data download?
Results
The script generates 4 files of independent samples: one with all primary tumors and WGS sequences, as well as files that include WXS samples and/or non-primary tumors.
In summary, the independent spain lists contain:
641 WGS primary specimens
788 WGS specimens (including non-primary)
657 WGS+WXS primary specimens
804 WGS+WXS specimens (including non-primary)
Docker and continuous integration
Check all those that apply or remove this section if it is not applicable.