Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition #550

Closed
hbeale opened this issue Feb 20, 2020 · 3 comments

Comments

@hbeale
Copy link
Contributor

hbeale commented Feb 20, 2020

What are the scientific goals of the analysis?

Identify samples that substantially differ in read-type composition in order to allow any other analyses of the RNA-Seq expression to identify results that are potentially based on technical artifacts

What methods do you plan to use to accomplish the scientific goals?

Catalog unmapped reads, multi-mapped reads, duplicate reads and exonic reads for each sample. Then, calculate four fractions: unmapped reads/all reads, multi-mapped reads/mapped reads, duplicate reads/mapped reads, non-exonic reads/non-duplicate (mapped) reads. Using the distribution of each fraction, samples with fractions outside mean +/- 2sd are flagged. In preliminary analysis 121/1027 samples were flagged, which is low compared to analysis spanning multiple projects and reflects consistent sample composition across the cohort. The individual samples that are flagged should not necessarily be excluded from subsequent analysis, but results should be reviewed for the impact of flagged samples. If, for example, a clustering analysis identifies an interesting group of samples, that group should be checked against the composition flag status to ensure that the cluster is not driven by a technical artifact.

What input data are required for this analysis?

the STAR output log titled "Log.final.out"
bam_umend_qc.tsv
descriebd in #341, included in the data release #444

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

2 weeks. No.

Who will complete the analysis (please add a GitHub handle here if relevant)?

@hbeale

What relevant scientific literature relates to this analysis?

e.g. Unmapped reads can reflect contamination from different species:
Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data

cc: @e-t-k

@jaclyn-taroni jaclyn-taroni changed the title Proposed Analysis: Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition Feb 20, 2020
@jaclyn-taroni
Copy link
Member

Hi @hbeale - I've updated the issue title, please feel free to edit if it's not quite right!

@hbeale hbeale mentioned this issue Mar 4, 2020
5 tasks
@jaclyn-taroni
Copy link
Member

Hi @hbeale - was this closed by #596?

@hbeale
Copy link
Contributor Author

hbeale commented Mar 23, 2020

Yes, thanks.

@hbeale hbeale closed this as completed Mar 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants