-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing ".with_cigar()" in Aligner? #19
Comments
Hi, Thank you for reporting this! That is an interesting observation. As we don't use DNA CS, could you please test the attached binary to see if your proposed changes solve the problem? Thanks, Wouter |
.with_cigar() fixed the problem! Normally I don't have to worry about DCS reads because I filter 5kb+ but the throughout of my current dataset required me to lower my threshold. Host read removal also looks better. Thank you for the update 😃 |
Thanks for confirming, I will push out a new release now. |
Is this data you can share with me? |
Here is the the FASTQ and reference FASTA for DCS: |
Hi, The requirement for the contamination filter is that at least 90% of the fragment aligns with the contaminant to avoid removing short aspecific matches that are in the read by chance. It appears that in your dataset, about 40-60% of the fragment aligns with the DNA-CS. Can you think of a reason for this? Are there long adapters or something like that? |
Hello! Thank you for creating chopper for us. However, I noticed when I was trying to remove DCS reads from my fastq files that a good portion of contaminating reads still remain. This is an example of one read blasted against the DCS sequence.
(query) bad read: 3,800 bp (90% =3,420bp)
(target) DCS: 3,560 bp
Chopper left these reads, so I decided to manually run
minimap2 -ax map-ont DCS.fasta read.fq
to see the PAF results. My "match_len" was 3,510bp. Please correct me if I'm misinterpreting the filter function, but I assume because 3,510bp > 3,420bp it should be classified as a contaminate.Alternatively if i run
minimap2 -x map-ont DCS.fasta read.fq
my "match_len" was 3,268bp. Because it is not greater than 3,420bp the read would be retained. Could chopper be inaccurately reporting the lengths because the Aligner setup in lines: 178-184 is missing ".with_cigar()"? lh3/minimap2#158The text was updated successfully, but these errors were encountered: