-
Notifications
You must be signed in to change notification settings - Fork 67
Possible to run docker in cavatica on openPBTA read data? #341
Comments
@hbeale - thanks for reaching out! We can do one of two things:
With either of these solutions, we would release the output files in a new data release and when you'd work on the PR, you could pull those files from the data release. If the data release is not immediate (may not be until the new year, as I am going to release #326 tomorrow), we could get you the files ahead of time/if you run it, you'd have them immediately. Which do you prefer? |
Thanks, @jharenza! Option two sounds more reproducible; let's go with that if possible. What else do you need from me? |
(And your time line sounds fine with me). |
Ok, @hbeale - we have a group meeting tomorrow and will discuss and get back to you with some plans. |
Thanks! The value of the analysis is maximized if we also have access to the STAR output log titled "Log.final.out" and fastqc output (e.g. R1_fastqc.zip and R2_fastqc.zip). I've amended the first comment to reflect this. Can you discuss including these in your data release as well? Thank you. |
@hbeale - @zhangb1 was able to create a workflow for this today, so I think we can queue that up later this week. Re: the STAR output, that should be no problem - we can zip and release those. Re: the fastqc output, we actually run RNASeqQC. I am attaching a sample output file here so you can check whether what you need is in these files or whether you need the FASTQC program run? Thanks! 96a41796-c1b6-447f-9f88-b2e7e52005b1.Aligned.out.sorted.bam.metrics.txt |
thanks @zhangb1! |
ok great! |
@hbeale we have completed the MEND QC run, and will plan to release this data + STAR Log.final.out files with #v13. Can give you an updated timeline for release in the next week. |
Hi @hbeale - below are the outputs from Mend QC - which did you want in the release?
cc: @migbro Thanks! |
Great! Please release
readDist.txt: The output of RSeqQC read_distribution.py (~1kb)
and
bam_umend_qc.tsv: uniqMappedNonDupeReadCount,
estExonicUniqMappedNonDupeReadCount and PASS/FAIL
…On Mon, Jan 6, 2020 at 1:14 PM Jo Lynne ***@***.***> wrote:
Hi @hbeale <https://github.com/hbeale> - below are the outputs from Mend
QC - which did you want in the release?
readDist.txt: The output of RSeqQC read_distribution.py (~1kb)
bam_umend_qc.tsv: uniqMappedNonDupeReadCount,
estExonicUniqMappedNonDupeReadCount and PASS/FAIL
bam_umend_qc.json: Same as bam_umend_qc.tsv but in json format
sortedByCoord.md.bam: BAM with duplicates marked sorted by coordinate
sortedByCoord.md.bam.bai: Index for sortedByCoord.md.bam
cc: @migbro <https://github.com/migbro>
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#341?email_source=notifications&email_token=AAANLA5BMIJPIKZEJZXWCBTQ4ONLVA5CNFSM4J3TKFZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIGZYNI#issuecomment-571317301>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAANLA4H5JYLBAQYGC2K5YLQ4ONLVANCNFSM4J3TKFZA>
.
|
@jharenza (Jackie Taroni suggested I ping you as the person most likely to have expertise in this area.)
We have a docker that we would like to be applied to the aligned read data to produce data that our analysis in OpenPBTA would consume. Do you have a process for making this request? I've summarized our proposed analysis below.
We would like to perform QC analysis on the aligned read data. We count the number of Mapped Exonic Non-Duplicate (MEND) reads. We will be performing outlier analysis (#229) and meta-analysis of the outlier results, and we wish to know which (if not all) are high quality enough to generalize from. We have defined the relationship of MEND counts to sensitivity and specificity of outlier calling in the manuscript linked below.
The QC analysis takes as input a hg38-aligned bam file and generates as output several small text files, as well as a duplicate-marked bam file (which can be discarded). The process takes approximately 2 hours on a sample containing 70 million reads when processed on a computer with 64GB of memory and 12 VCPUs.
The value of the analysis is maximized if we also have access to the STAR output log titled "Log.final.out" and fastqc output (e.g. R1_fastqc.zip and R2_fastqc.zip).
The QC process is dockerized and the code is available at https://github.com/UCSC-Treehouse/mend_qc.
The MEND approach is described more fully here. https://www.biorxiv.org/content/10.1101/716829v1.
The text was updated successfully, but these errors were encountered: