Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The total number of pass-filtered reads is too less #90

Open
Moxsf opened this issue Feb 22, 2022 · 11 comments
Open

The total number of pass-filtered reads is too less #90

Moxsf opened this issue Feb 22, 2022 · 11 comments

Comments

@Moxsf
Copy link

Moxsf commented Feb 22, 2022

my command:
demuxlet --sam ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/gex_possorted_bam.bam --group-list ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/filtered_feature_bc_matrix/barcodes.tsv.gz --field GT --vcf ~/Project_kunlun/00_preData/GenoType/VCF_filter/05_WeGene_HUVEC_change_ALDI.lo_hg38.vcf.gz --alpha 0 --alpha 0.5 --out Multi_HUVEC_gex_alpha

and the report:

NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 4564 UMI warnings...
NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 10839629 droplet/cell barcode warnings...
NOTICE [2022/02/22 17:51:34] - Finished reading 1 markers from the VCF file
NOTICE [2022/02/22 17:51:34] - Total number input reads : 310066990
NOTICE [2022/02/22 17:51:34] - Total number valid droplets observed : 15480
NOTICE [2022/02/22 17:51:34] - Total number valid SNPs observed : 1
NOTICE [2022/02/22 17:51:34] - Total number of read-QC-passed reads : 198500170
NOTICE [2022/02/22 17:51:34] - Total number of skipped reads with ignored barcodes : 36629150
NOTICE [2022/02/22 17:51:34] - Total number of non-skipped reads with considered barcodes : 161816433
NOTICE [2022/02/22 17:51:34] - Total number of gapped/noninformative reads : 161816426
NOTICE [2022/02/22 17:51:34] - Total number of base-QC-failed reads : 0
NOTICE [2022/02/22 17:51:34] - Total number of redundant reads : 0
NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads : 7
NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads overlapping with multiple SNPs : 0
NOTICE [2022/02/22 17:51:34] - Starting to prune out cells with too few reads...
NOTICE [2022/02/22 17:51:34] - Finishing pruning out 0 cells with too few reads...
NOTICE [2022/02/22 17:51:34] - Starting to identify best matching individual IDs
NOTICE [2022/02/22 17:51:34] - Identifying best-matching individual..

When I checked the operation report, I found that the number of reads I screened was only 7. Is there any recommendation for screening criteria?
Thanks for your reply.

@wangweikai233
Copy link

my command: demuxlet --sam ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/gex_possorted_bam.bam --group-list ~/Project_kunlun/02_CellRanger/HUVEC_Omics/outs/filtered_feature_bc_matrix/barcodes.tsv.gz --field GT --vcf ~/Project_kunlun/00_preData/GenoType/VCF_filter/05_WeGene_HUVEC_change_ALDI.lo_hg38.vcf.gz --alpha 0 --alpha 0.5 --out Multi_HUVEC_gex_alpha

and the report:

NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 4564 UMI warnings... NOTICE [2022/02/22 17:51:34] - WARNING: Suppressed a total of 10839629 droplet/cell barcode warnings... NOTICE [2022/02/22 17:51:34] - Finished reading 1 markers from the VCF file NOTICE [2022/02/22 17:51:34] - Total number input reads : 310066990 NOTICE [2022/02/22 17:51:34] - Total number valid droplets observed : 15480 NOTICE [2022/02/22 17:51:34] - Total number valid SNPs observed : 1 NOTICE [2022/02/22 17:51:34] - Total number of read-QC-passed reads : 198500170 NOTICE [2022/02/22 17:51:34] - Total number of skipped reads with ignored barcodes : 36629150 NOTICE [2022/02/22 17:51:34] - Total number of non-skipped reads with considered barcodes : 161816433 NOTICE [2022/02/22 17:51:34] - Total number of gapped/noninformative reads : 161816426 NOTICE [2022/02/22 17:51:34] - Total number of base-QC-failed reads : 0 NOTICE [2022/02/22 17:51:34] - Total number of redundant reads : 0 NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads : 7 NOTICE [2022/02/22 17:51:34] - Total number of pass-filtered reads overlapping with multiple SNPs : 0 NOTICE [2022/02/22 17:51:34] - Starting to prune out cells with too few reads... NOTICE [2022/02/22 17:51:34] - Finishing pruning out 0 cells with too few reads... NOTICE [2022/02/22 17:51:34] - Starting to identify best matching individual IDs NOTICE [2022/02/22 17:51:34] - Identifying best-matching individual..

When I checked the operation report, I found that the number of reads I screened was only 7. Is there any recommendation for screening criteria? Thanks for your reply.

Hello, I also encountered this problem. Have you solved it now?

@hyunminkang
Copy link
Contributor

What does samtools flagstat say? Filtering is mostly done by --excl-flag option. The default is 0xF04, which removes duplicates, supplementary, and unmapped reads. It also uses reads with minimum mapping quality of 20. I would check both flags and mapping quality of the reads.

@wangweikai233
Copy link

What does samtools flagstat say? Filtering is mostly done by --excl-flag option. The default is 0xF04, which removes duplicates, supplementary, and unmapped reads. It also uses reads with minimum mapping quality of 20. I would check both flags and mapping quality of the reads.

Thank you for your reply. I found no information in the qual column in our bam file, which I think caused all reads to be filtered.

@wangweikai233
Copy link

Our bam file like this :
FP300000661L1C002R03901629535 16 chr1 89316 1 100M * 0 0 TTACTGTTTTAGTTTGCATTTCTCTGCTTACAAATGGATTACACGCATTTTCATGTGCTGTTGGCTACTTATTCATTCAGAAAACATACTAAGTGCTGGC * NH:i:4 HI:i:1 AS:i:98 nM:i:0 UR:Z:CAAGTAAAAG CB:Z:E7AE9216DB GN:Z:AL627309.1 UB:Z:CAAGTAAAAG RE:A:E DB:Z:CELL4917_N1

The DB is cell barcode, UB is UMI. Our command line is as follows:
demuxlet --sam ./merge_sort.bam --tag-group DB --tag-UMI UB --vcf ./muxseq.5.sample_reheader_sort.vcf --field GT --alpha 0 --alpha 0.5 --out ./result/merge_sort

Do you think there are other issues that might cause demuxlet to fail?

@wangweikai233
Copy link

Our log :

NOTICE [2023/11/21 21:44:13] - Reading 350000000 reads at chrX:154400707 and skipping 14368055
NOTICE [2023/11/21 21:44:23] - Reading 351000000 reads at chrY:20592382 and skipping 14461389
NOTICE [2023/11/21 21:44:23] - Finished reading 4863645 markers from the VCF file
NOTICE [2023/11/21 21:44:23] - Total number input reads : 351034528
NOTICE [2023/11/21 21:44:25] - Total number valid droplets observed : 17164
NOTICE [2023/11/21 21:44:25] - Total number valid SNPs observed     : 4863645
NOTICE [2023/11/21 21:44:25] - Total number of read-QC-passed reads : 336559368
NOTICE [2023/11/21 21:44:25] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2023/11/21 21:44:25] - Total number of non-skipped reads with considered barcodes : 336236415
NOTICE [2023/11/21 21:44:25] - Total number of gapped/noninformative reads : 312184218
NOTICE [2023/11/21 21:44:25] - Total number of base-QC-failed reads : 24052197
NOTICE [2023/11/21 21:44:25] - Total number of redundant reads : 0
NOTICE [2023/11/21 21:44:25] - Total number of pass-filtered reads : 0
NOTICE [2023/11/21 21:44:25] - Total number of pass-filtered reads overlapping with multiple SNPs : 0
NOTICE [2023/11/21 21:44:25] - Starting to prune out cells with too few reads...
NOTICE [2023/11/21 21:44:25] - Finishing pruning out 0 cells with too few reads...
NOTICE [2023/11/21 21:44:26] - Starting to identify best matching individual IDs
NOTICE [2023/11/21 21:44:26] - Processing 10000 markers...
NOTICE [2023/11/21 21:44:26] - Processing 20000 markers...

@hyunminkang
Copy link
Contributor

It looks that, in both examples, the number of QC-pass reads are fine, but the number of reads overlapping SNPs do not. Perhaps the contigs do not match between BAM and VCF, or there might be problems in VCF?

@wangweikai233
Copy link

It looks that, in both examples, the number of QC-pass reads are fine, but the number of reads overlapping SNPs do not. Perhaps the contigs do not match between BAM and VCF, or there might be problems in VCF?

Thank you for your suggestion. We will carefully check the VCF file later.

@Moxsf
Copy link
Author

Moxsf commented Nov 23, 2023

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense)
And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible.
    Apply all or some of the additional filters:
    a. Classified as common in the 1000 genomes project.
    b. Called with high variant confidence (QD > 10.0)
    c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

@wangweikai233
Copy link

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible.
    Apply all or some of the additional filters:
    a. Classified as common in the 1000 genomes project.
    b. Called with high variant confidence (QD > 10.0)
    c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

Thank you very much for your reply. We will try to reinstall or run demuxlet according to your suggestion. I will show the run log later.

Best wishes.

@yuantiaotiao
Copy link

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible.
    Apply all or some of the additional filters:
    a. Classified as common in the 1000 genomes project.
    b. Called with high variant confidence (QD > 10.0)
    c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

Thank you very much for your reply. We will try to reinstall or run demuxlet according to your suggestion. I will show the run log later.

Best wishes.

Have you solved this problem? My bam file is also a C4 platform. Look forward to receiving your reply

@wangweikai233
Copy link

I have no idea about this question, so I get to use other tools: cellSNP + Vireo . (no offense) And I did something to deal with this problem, just for your information:

  1. The chromosome order of the Header and content (such as chr1 or 1) in the VCF file must be the same as in the bam file.
  2. Adhere to Broad Institute’s GATK best practices whenever possible.
    Apply all or some of the additional filters:
    a. Classified as common in the 1000 genomes project.
    b. Called with high variant confidence (QD > 10.0)
    c. Called in every one of your samples. Minor Allele Frequency (MAF 0.01) Minor Allele Counts, if applicable (MAC 1)
  3. Install demuxlet manually instead of using conda.
  4. Using another version of demuxlet: popscle

Thank you very much for your reply. We will try to reinstall or run demuxlet according to your suggestion. I will show the run log later.
Best wishes.

Have you solved this problem? My bam file is also a C4 platform. Look forward to receiving your reply

I tried to re-run demuxlet after checking the above problems, but the result was still not satisfactory. I think it's a format issue with C4. In the end we used cellSNP + Vireo, but I still think demuxlet is a very nice software tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants