You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into a case where the locus was not being successfully genotyped. I tracked it down to /seq_stutter_genotyper.cpp#L570. I found that changing the value of MAX_STUTTER_REPEAT_DEL from -3 to -2 in RepeatStutterInfo.h#L12.
Some of the sites are very noisy by nature, but I wanted to understand better why they were filtered out. I would greatly welcome some advice. I will send you over the raw test data in an email.
Here's the diff where I added a number of logging statements (which you may want to ultimate keep) that explain why genotyping failed at a locus:
The details:
diff --git a/src/seq_stutter_genotyper.cpp b/src/seq_stutter_genotyper.cpp
index 1177615..f163fc4 100644
--- a/src/seq_stutter_genotyper.cpp
+++ b/src/seq_stutter_genotyper.cpp
@@ -566,8 +566,10 @@ bool SeqStutterGenotyper::id_and_align_to_stutter_alleles(std::ostream& logger){
if (block->get_repeat_info() != NULL){
get_stutter_candidate_alleles(i, logger, stutter_seqs[i]);
for (auto allele_iter = stutter_seqs[i].begin(); allele_iter != stutter_seqs[i].end(); allele_iter++)
- if (allele_iter->size() < std::abs(block->get_repeat_info()->max_deletion()))
+ if (allele_iter->size() < std::abs(block->get_repeat_info()->max_deletion())) {
+ logger << "id_and_align_to_stutter_alleles returning false since " << allele_iter->size() << " < " << std::abs(block->get_repeat_info()->max_deletion()) << "\n";
return false;
+ }
added_alleles |= !stutter_seqs[i].empty();
std::sort(stutter_seqs[i].begin(), stutter_seqs[i].end(), orderByLengthAndSequence);
}
@@ -585,8 +587,11 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
// Unsuccessful initialization. May be due to
// 1) Failing to find the corresponding alleles in the VCF (if one has been provided)
// 2) Large deletion extending past STR
- if (!initialized_)
+ if (!initialized_) {
+ logger << "Aborting genotyping of the locus due to long deletion past the STR or missing alleles in the provided VCF (if provided)\n";
return false;
+ }
+
// If the smallest stutter block sequence is smaller than the maximum deletion size, the stutter aligner will fail
// We could extend the stutter block to prevent this, but if this happens, the locus is probably not very high quality
@@ -595,8 +600,10 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
HapBlock* hap_block = haplotype_->get_block(i);
if (hap_block->get_repeat_info() == NULL)
continue;
- if (hap_block->min_size() < std::abs(hap_block->get_repeat_info()->max_deletion()))
+ if (hap_block->min_size() < std::abs(hap_block->get_repeat_info()->max_deletion())) {
+ logger << "Aborting genotyping of the locus as the smallest stutter block sequence is smaller than the maximum deletion size\n";
return false;
+ }
}
// Check if we can assemble the sequences flanking the STR
@@ -626,8 +633,10 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
if (ref_vcf_ == NULL){
// Look for additional alleles in stutter artifacts and align to them (if necessary)
- if (!id_and_align_to_stutter_alleles(logger))
+ if (!id_and_align_to_stutter_alleles(logger)) {
+ logger << "Aborting genotyping of the locus as id_and_align_to_stutter_alleles returned false\n";
return false;
+ }
// Remove alleles with no MAP genotype calls and recompute the posteriors
std::vector< std::vector<int> > unused_indices;
@@ -649,9 +658,12 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
}
}
- if (reassemble_flanks_)
- if (!assemble_flanks(max_flank_haplotypes, min_flank_freq, logger))
+ if (reassemble_flanks_) {
+ if (!assemble_flanks(max_flank_haplotypes, min_flank_freq, logger)) {
+ logger << "Aborting genotyping of the locus as assemble_flanks returned false\n";
return false;
+ }
+ }
return true;
}
The text was updated successfully, but these errors were encountered:
Thanks for pointing out this issue. In short, when HipSTR encountered an STR allele that was shorter than 3 times the repeat motif length, it was previously designed to skip the locus and continue on to the next one. This was primarily because i) I thought this would very rarely occur and ii) there were a few implementation complexities that made it easier to avoid these cases. This behavior was responsible for the loci you reported.
In commit 3e0b4dd, I modified the codebase such that this no longer occurs and that these short alleles are genotyped appropriately
@tfwillems
I ran into a case where the locus was not being successfully genotyped. I tracked it down to /seq_stutter_genotyper.cpp#L570. I found that changing the value of
MAX_STUTTER_REPEAT_DEL
from-3
to-2
inRepeatStutterInfo.h#L12
.Some of the sites are very noisy by nature, but I wanted to understand better why they were filtered out. I would greatly welcome some advice. I will send you over the raw test data in an email.
Here's the diff where I added a number of logging statements (which you may want to ultimate keep) that explain why genotyping failed at a locus:
The details:
The text was updated successfully, but these errors were encountered: