Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacks population summary stats limited to the last population #906

Closed
BELKHIR opened this issue Mar 1, 2019 · 6 comments · Fixed by #1056
Closed

Stacks population summary stats limited to the last population #906

BELKHIR opened this issue Mar 1, 2019 · 6 comments · Fixed by #1056
Labels
bug: core Bug in the main MultiQC code
Milestone

Comments

@BELKHIR
Copy link

BELKHIR commented Mar 1, 2019

Description of bug:
The summary stats of the stacks module return an array listing only the last population of a data sets with more than one pop.

MultiQC Error log:

No error log for this behavior.

File that triggers the error:
In line 259 of the file : multiqc/modules/stacks/stacks.py there is only one entry that is created in the out_dict. The content is iteratively replaced by values from a loop over the analysed populations

out_dict[s_name] = cdict

This should be

out_dict[content[0]] = cdict

content[0] is the population label

Best regards

@ewels
Copy link
Member

ewels commented Mar 1, 2019

https://github.com/ewels/MultiQC/blob/12f519cabd5cda344535b97186c46684722e555c/multiqc/modules/stacks/stacks.py#L243-L261

Thanks @BELKHIR - I think I need to look in to this, as I guess that each sample could have different populations.. (am I right?). Either way, I think that more of the code will need refactoring..

@ewels ewels added the bug: core Bug in the main MultiQC code label Mar 1, 2019
@BELKHIR
Copy link
Author

BELKHIR commented Mar 1, 2019

Running Stacks populations program will generate only one populations.sumstats_summary.tsv regardless of the number of samples (i.e. multiplexed individuals in sequencing files).
So you can't have more than one populations.sumstats_summary.tsv file in a directory !

However in this file you can have more than one pop. if you provide a population map. file (Format is SAMPLE1\tPOP1\n....) with more than one population label.

@BELKHIR BELKHIR closed this as completed Mar 1, 2019
@ewels ewels reopened this Mar 1, 2019
@ewels
Copy link
Member

ewels commented Mar 1, 2019

It was @remiolsen who wrote this module I think. Remi - any thoughts on the above? If we only ever expect to have a single populations file then we can skip the sample name entirely and just loop over the population labels as @BELKHIR suggests.

The fact that the code goes to the effort of parsing the directory name to a sample name makes me suspicious though..

@remiolsen
Copy link
Collaborator

remiolsen commented Mar 4, 2019

Thanks for spotting this @BELKHIR. The behaviour that I intended for when parsing the sumstats_summary.tsv was to capture each population and add them as a row to the Population summary statistics table in the report. In Stacks v. >= 2.0 (I think) it became mandatory to give the populations map .txt file as input, however I specify one population almost every time I run Stacks so I missed this one.

I think I was able to reproduce the error using this fake populations.sumstats_summary.tsv file I created:

# Variant positions
# Pop ID	Private	Num_Indv	Var	StdErr	P	Var	StdErr	Obs_Het	Var	StdErr	Obs_Hom	Var	StdErr	Exp_Het	Var	StdErr	Exp_Hom	Var	StdErr	Pi	Var	StdErr	Fis	Var	StdErr
nfcore_radseq	0	9.37249	18.37673	0.06836	0.75399	0.03125	0.00282	0.38860	0.11271	0.00535	0.61140	0.11271	0.00535	0.30849	0.02930	0.00273	0.69151	0.02930	0.00273	0.36485	0.05975	0.00390	-0.03231	0.16644	0.06836
nfcore_radseq_dup   0       9.37249 18.37673        0.06836 0.75399 0.03125 0.00282 0.38860 0.11271 0.00535 0.61140 0.11271 0.00535 0.30849 0.02930 0.00273 0.69151 0.02930 0.00273 0.36485 0.05975 0.00390 -0.03231        0.16644 0.06836
# All positions (variant and fixed)
# Pop ID	Private	Sites	Variant_Sites	Polymorphic_Sites	%Polymorphic_Loci	Num_Indv	Var	StdErr	P	Var	StdErr	Obs_Het	Var	StdErr	Obs_Hom	Var	StdErr	Exp_Het	Var	StdErr	Exp_Hom	Var	StdErr	Pi	Var	StdErr	Fis	Var	StdErr
nfcore_radseq	0	8716958	3933	3933	0.04512	6.36527	29.57227	0.00184	0.99989	0.00004	0.00000	0.00018	0.00012	0.00000	0.99982	0.00012	0.00000	0.00014	0.00006	0.00000	0.99986	0.00006	0.00000	0.00016	0.00009	0.00000	-0.00001	0.00008	0.00184
nfcore_radseq_dup   0       8716958 3933    3933    0.04512 6.36527 29.57227        0.00184 0.99989 0.00004 0.00000 0.00018 0.00012 0.00000 0.99982 0.00012 0.00000 0.00014 0.00006 0.00000 0.99986 0.00006 0.00000 0.00016 0.00009 0.00000 -0.00001        0.00008 0.00184

@ewels
Copy link
Member

ewels commented Nov 13, 2019

Hi @remiolsen,

Any chance you could submit a PR to fix this please? 😁

Phil

@remiolsen
Copy link
Collaborator

@ewels Sure I can take a look at it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: core Bug in the main MultiQC code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants