Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool for determining bin filters from coverage data. #2992

Closed
droazen opened this issue Jun 5, 2017 · 1 comment
Closed

Tool for determining bin filters from coverage data. #2992

droazen opened this issue Jun 5, 2017 · 1 comment
Assignees

Comments

@droazen
Copy link
Collaborator

droazen commented Jun 5, 2017

@mbabadi commented on Thu May 18 2017

At the moment, we:

  • Remove targets with possibly bad (NaN, infinity, negative) values
  • Remove targets that have uniformly low coverage across all samples

Perhaps we should consider adding more filters:

  • Remove targets with very high and very low GC content (can be done in the CalculateTargetCoverage step)
  • Remove targets with lots of repeats and anomalously low mappability (can be done in the CalculateTargetCoverage step)
  • In the learning mode, remove a target if too many are masked across the samples (in that case, max likelihood parameter estimation is unreliable)

This must be done after careful evaluations, i.e. only if certain features makes a target prone to bad calls no matter what.

@samuelklee samuelklee changed the title ReadCountCollection pre-processing filters Tool for determining bin filters from coverage data. May 24, 2018
@samuelklee
Copy link
Contributor

Some filters are implemented in the ModelSegments CreatePoN code (since these filters were directly ported from GATK CNV). Other filters were implemented as external python scripts by @mbabadi for GPC2 validation. We should extract and productionize if possible. Ideally, the tool would take several coverage files (collected over identical bins) and filtering parameters as input, and output a filtered list of bins. Downstream tools would subset the original coverage files to these bins accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants