Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genotyping locus failed due to MAX_STUTTER_REPEAT_DEL (no explanation in the log) #45

Closed
nh13 opened this issue Nov 7, 2017 · 2 comments
Labels

Comments

@nh13
Copy link
Contributor

nh13 commented Nov 7, 2017

@tfwillems

I ran into a case where the locus was not being successfully genotyped. I tracked it down to /seq_stutter_genotyper.cpp#L570. I found that changing the value of MAX_STUTTER_REPEAT_DEL from -3 to -2 in RepeatStutterInfo.h#L12.

Some of the sites are very noisy by nature, but I wanted to understand better why they were filtered out. I would greatly welcome some advice. I will send you over the raw test data in an email.

Here's the diff where I added a number of logging statements (which you may want to ultimate keep) that explain why genotyping failed at a locus:

The details:

diff --git a/src/seq_stutter_genotyper.cpp b/src/seq_stutter_genotyper.cpp
index 1177615..f163fc4 100644
--- a/src/seq_stutter_genotyper.cpp
+++ b/src/seq_stutter_genotyper.cpp
@@ -566,8 +566,10 @@ bool SeqStutterGenotyper::id_and_align_to_stutter_alleles(std::ostream& logger){
       if (block->get_repeat_info() != NULL){
        get_stutter_candidate_alleles(i, logger, stutter_seqs[i]);
        for (auto allele_iter = stutter_seqs[i].begin(); allele_iter != stutter_seqs[i].end(); allele_iter++)
-         if (allele_iter->size() < std::abs(block->get_repeat_info()->max_deletion()))
+         if (allele_iter->size() < std::abs(block->get_repeat_info()->max_deletion())) {
+                 logger << "id_and_align_to_stutter_alleles returning false since " << allele_iter->size() << " < " <<  std::abs(block->get_repeat_info()->max_deletion()) << "\n";
            return false;
+         }
        added_alleles |= !stutter_seqs[i].empty();
        std::sort(stutter_seqs[i].begin(), stutter_seqs[i].end(), orderByLengthAndSequence);
       }
@@ -585,8 +587,11 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
   // Unsuccessful initialization. May be due to
   // 1) Failing to find the corresponding alleles in the VCF (if one has been provided)
   // 2) Large deletion extending past STR
-  if (!initialized_)
+  if (!initialized_) {
+       logger << "Aborting genotyping of the locus due to long deletion past the STR or missing alleles in the provided VCF (if provided)\n";
     return false;
+  }
+
 
   // If the smallest stutter block sequence is smaller than the maximum deletion size, the stutter aligner will fail
   // We could extend the stutter block to prevent this, but if this happens, the locus is probably not very high quality
@@ -595,8 +600,10 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
     HapBlock* hap_block = haplotype_->get_block(i);
     if (hap_block->get_repeat_info() == NULL)
       continue;
-    if (hap_block->min_size() < std::abs(hap_block->get_repeat_info()->max_deletion()))
+    if (hap_block->min_size() < std::abs(hap_block->get_repeat_info()->max_deletion())) {
+         logger << "Aborting genotyping of the locus as the smallest stutter block sequence is smaller than the maximum deletion size\n";
       return false;
+       }
   }
 
   // Check if we can assemble the sequences flanking the STR
@@ -626,8 +633,10 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
 
   if (ref_vcf_ == NULL){
     // Look for additional alleles in stutter artifacts and align to them (if necessary)
-    if (!id_and_align_to_stutter_alleles(logger))
+    if (!id_and_align_to_stutter_alleles(logger)) {
+         logger << "Aborting genotyping of the locus as id_and_align_to_stutter_alleles returned false\n";
       return false;
+       }
 
     // Remove alleles with no MAP genotype calls and recompute the posteriors
     std::vector< std::vector<int> > unused_indices;
@@ -649,9 +658,12 @@ bool SeqStutterGenotyper::genotype(int max_flank_haplotypes, double min_flank_fr
     }
   }
 
-  if (reassemble_flanks_)
-    if (!assemble_flanks(max_flank_haplotypes, min_flank_freq, logger))
+  if (reassemble_flanks_) {
+    if (!assemble_flanks(max_flank_haplotypes, min_flank_freq, logger)) {
+         logger << "Aborting genotyping of the locus as assemble_flanks returned false\n";
       return false;
+       }
+  }
 
   return true;
 }
@nh13
Copy link
Contributor Author

nh13 commented Nov 13, 2017

@tfwillems I sent you some test cases for this issue via email, please let me know if you did not receive them. Thanks for any help you could offer.

@tfwillems
Copy link
Owner

Hi @nh13,

Thanks for pointing out this issue. In short, when HipSTR encountered an STR allele that was shorter than 3 times the repeat motif length, it was previously designed to skip the locus and continue on to the next one. This was primarily because i) I thought this would very rarely occur and ii) there were a few implementation complexities that made it easier to avoid these cases. This behavior was responsible for the loci you reported.

In commit 3e0b4dd, I modified the codebase such that this no longer occurs and that these short alleles are genotyped appropriately

Best,
Thomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants