Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition #550

hbeale · 2020-02-20T19:29:04Z

What are the scientific goals of the analysis?

Identify samples that substantially differ in read-type composition in order to allow any other analyses of the RNA-Seq expression to identify results that are potentially based on technical artifacts

What methods do you plan to use to accomplish the scientific goals?

Catalog unmapped reads, multi-mapped reads, duplicate reads and exonic reads for each sample. Then, calculate four fractions: unmapped reads/all reads, multi-mapped reads/mapped reads, duplicate reads/mapped reads, non-exonic reads/non-duplicate (mapped) reads. Using the distribution of each fraction, samples with fractions outside mean +/- 2sd are flagged. In preliminary analysis 121/1027 samples were flagged, which is low compared to analysis spanning multiple projects and reflects consistent sample composition across the cohort. The individual samples that are flagged should not necessarily be excluded from subsequent analysis, but results should be reviewed for the impact of flagged samples. If, for example, a clustering analysis identifies an interesting group of samples, that group should be checked against the composition flag status to ensure that the cluster is not driven by a technical artifact.

What input data are required for this analysis?

the STAR output log titled "Log.final.out"
bam_umend_qc.tsv
descriebd in #341, included in the data release #444

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

2 weeks. No.

Who will complete the analysis (please add a GitHub handle here if relevant)?

@hbeale

What relevant scientific literature relates to this analysis?

e.g. Unmapped reads can reflect contamination from different species:
Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data

cc: @e-t-k

jaclyn-taroni · 2020-02-20T20:03:06Z

Hi @hbeale - I've updated the issue title, please feel free to edit if it's not quite right!

jaclyn-taroni · 2020-03-23T12:13:18Z

Hi @hbeale - was this closed by #596?

hbeale · 2020-03-23T17:23:32Z

Yes, thanks.

hbeale added the proposed analysis label Feb 20, 2020

jaclyn-taroni changed the title ~~Proposed Analysis:~~ Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition Feb 20, 2020

hbeale mentioned this issue Mar 4, 2020

Rna seq composition #596

Merged

5 tasks

hbeale closed this as completed Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition #550

Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition #550

hbeale commented Feb 20, 2020

jaclyn-taroni commented Feb 20, 2020

jaclyn-taroni commented Mar 23, 2020

hbeale commented Mar 23, 2020

Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition #550

Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition #550

Comments

hbeale commented Feb 20, 2020

What are the scientific goals of the analysis?

What methods do you plan to use to accomplish the scientific goals?

What input data are required for this analysis?

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

Who will complete the analysis (please add a GitHub handle here if relevant)?

What relevant scientific literature relates to this analysis?

jaclyn-taroni commented Feb 20, 2020

jaclyn-taroni commented Mar 23, 2020

hbeale commented Mar 23, 2020