You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
Identify samples that substantially differ in read-type composition in order to allow any other analyses of the RNA-Seq expression to identify results that are potentially based on technical artifacts
What methods do you plan to use to accomplish the scientific goals?
Catalog unmapped reads, multi-mapped reads, duplicate reads and exonic reads for each sample. Then, calculate four fractions: unmapped reads/all reads, multi-mapped reads/mapped reads, duplicate reads/mapped reads, non-exonic reads/non-duplicate (mapped) reads. Using the distribution of each fraction, samples with fractions outside mean +/- 2sd are flagged. In preliminary analysis 121/1027 samples were flagged, which is low compared to analysis spanning multiple projects and reflects consistent sample composition across the cohort. The individual samples that are flagged should not necessarily be excluded from subsequent analysis, but results should be reviewed for the impact of flagged samples. If, for example, a clustering analysis identifies an interesting group of samples, that group should be checked against the composition flag status to ensure that the cluster is not driven by a technical artifact.
What input data are required for this analysis?
the STAR output log titled "Log.final.out"
bam_umend_qc.tsv
descriebd in #341, included in the data release #444
How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?
2 weeks. No.
Who will complete the analysis (please add a GitHub handle here if relevant)?
jaclyn-taroni
changed the title
Proposed Analysis:
Proposed Analysis: Flag RNA-seq samples with substantially different read-type composition
Feb 20, 2020
What are the scientific goals of the analysis?
Identify samples that substantially differ in read-type composition in order to allow any other analyses of the RNA-Seq expression to identify results that are potentially based on technical artifacts
What methods do you plan to use to accomplish the scientific goals?
Catalog unmapped reads, multi-mapped reads, duplicate reads and exonic reads for each sample. Then, calculate four fractions: unmapped reads/all reads, multi-mapped reads/mapped reads, duplicate reads/mapped reads, non-exonic reads/non-duplicate (mapped) reads. Using the distribution of each fraction, samples with fractions outside mean +/- 2sd are flagged. In preliminary analysis 121/1027 samples were flagged, which is low compared to analysis spanning multiple projects and reflects consistent sample composition across the cohort. The individual samples that are flagged should not necessarily be excluded from subsequent analysis, but results should be reviewed for the impact of flagged samples. If, for example, a clustering analysis identifies an interesting group of samples, that group should be checked against the composition flag status to ensure that the cluster is not driven by a technical artifact.
What input data are required for this analysis?
the STAR output log titled "Log.final.out"
bam_umend_qc.tsv
descriebd in #341, included in the data release #444
How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?
2 weeks. No.
Who will complete the analysis (please add a GitHub handle here if relevant)?
@hbeale
What relevant scientific literature relates to this analysis?
e.g. Unmapped reads can reflect contamination from different species:
Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data
cc: @e-t-k
The text was updated successfully, but these errors were encountered: