Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of an nT-like likelihood #32

Merged
merged 48 commits into from
Jul 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ccfa0f3
draft example configs to think about structure
hammannr Jun 22, 2023
afa71a5
draft structure of BlueiceExtendedModel
hammannr Jun 22, 2023
a94c1bc
improve structure
hammannr Jun 22, 2023
7905879
start implementing data generation
hammannr Jun 22, 2023
2a71187
try to organize likelihood parameters
hammannr Jun 22, 2023
6d3415a
take notes
hammannr Jun 23, 2023
025a21c
start implementing ancillary ll term
hammannr Jun 23, 2023
4d05a81
continue working on anc. likelihood
hammannr Jun 29, 2023
2e9eb09
start _get_constraint_terms
hammannr Jun 29, 2023
e759366
improve ancillary likelihood
hammannr Jun 29, 2023
35e065d
finish ancillary likelihood implementation
hammannr Jun 30, 2023
26e49b0
restructure generation of ancillary measurements
hammannr Jun 30, 2023
898e1b6
start likelihood construction
hammannr Jun 30, 2023
20ec3a4
add aux ll
hammannr Jul 4, 2023
d630458
move blueice extended model to top layer
hammannr Jul 4, 2023
a0ea2c6
adapt config for blueice
hammannr Jul 4, 2023
aecc75e
make it run for the first time
hammannr Jul 4, 2023
65aa50d
fix likelihood init bug
hammannr Jul 11, 2023
e9570d4
Merge branch 'master' into nt_likelihood
hammannr Jul 12, 2023
8e5b6d8
undo indenting docstring
hammannr Jul 12, 2023
cdf6ecd
update template path
hammannr Jul 12, 2023
ff936ec
cleanup some outdated things in statistical_model
hammannr Jul 12, 2023
0faf6a3
minor compatibility fixes
hammannr Jul 12, 2023
031ff42
Expectation value implementation
kdund Jul 12, 2023
3143b1f
fix poi and expectation values
hammannr Jul 13, 2023
a45af71
pause some workflows for now
hammannr Jul 13, 2023
2c5ec8b
docstrings and some cleanup
hammannr Jul 13, 2023
ecda2a6
More sensible order of methods
hammannr Jul 13, 2023
b0105e0
more cleanup and small additions
hammannr Jul 13, 2023
8a190c8
fix template folder problem
hammannr Jul 13, 2023
1d5137e
re-enable workflow
hammannr Jul 13, 2023
536d290
this reviewdog is really nitpicky...
hammannr Jul 13, 2023
329980d
enable template folder list
hammannr Jul 13, 2023
2814612
template_folder fix
hammannr Jul 13, 2023
6b16f4f
change livetime names in config
hammannr Jul 13, 2023
1953474
Utils fcn to find file in one of several paths.
kdund Jul 13, 2023
b3614da
Merge remote-tracking branch 'origin/nt_likelihood' into nt_likelihood
kdund Jul 13, 2023
64ceaa2
Removes all hash for parameters not used for each source, and for all…
kdund Jul 13, 2023
0eccaf0
Accepted some of the linting comments
kdund Jul 13, 2023
28c5dd6
fix ptype - type missmatch
hammannr Jul 14, 2023
97d3dfe
The dog convinced me: Let's go with ptype instead.
hammannr Jul 14, 2023
2752b0f
remove my TODOs
hammannr Jul 14, 2023
5fadf96
Separate out efficiency set fcn
kdund Jul 14, 2023
ac602d3
Remove errant code in get_expectation_values in base class.
kdund Jul 14, 2023
214fd18
Remove errant code in get_expectation_values in base class.
kdund Jul 14, 2023
0c41d99
linter mess
kdund Jul 14, 2023
90c2ca1
.type -> .ptype
hammannr Jul 17, 2023
7fd0d13
Merge pull request #37 from XENONnT/nt_likelihood_efficiency
hammannr Jul 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ development_scripts/*hdf
__pycache__/
*ipynb
!examples/*.ipynb
debug.py
298 changes: 298 additions & 0 deletions alea/blueice_extended_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
from pydoc import locate # to lookup likelihood class
from typing import List

from alea.statistical_model import StatisticalModel
from alea.simulators import BlueiceDataGenerator
from alea.utils import adapt_likelihood_config_for_blueice
from alea.parameters import Parameters
import yaml
import numpy as np
import scipy.stats as stats
hammannr marked this conversation as resolved.
Show resolved Hide resolved
from blueice.likelihood import LogAncillaryLikelihood
from blueice.likelihood import LogLikelihoodSum


class BlueiceExtendedModel(StatisticalModel):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""
A statistical model based on blueice likelihoods.

This class extends the `StatisticalModel` class and provides methods
for generating data and computing likelihoods based on blueice.

Args:
parameter_definition (dict): A dictionary defining the model parameters.
likelihood_config (dict): A dictionary defining the likelihood.
"""

def __init__(self, parameter_definition: dict, likelihood_config: dict):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""Initializes the statistical model.

Args:
parameter_definition (dict): A dictionary defining the model parameters.
likelihood_config (dict): A dictionary defining the likelihood.
"""
super().__init__(parameter_definition=parameter_definition)
self._likelihood = self._build_ll_from_config(likelihood_config)
self.likelihood_names = [t["name"] for t in likelihood_config["likelihood_terms"]]
self.likelihood_names.append("ancillary_likelihood")
self.data_generators = self._build_data_generators()

# TODO analysis_space could be inferred from the data (assert that all sources have the same analysis space)
hammannr marked this conversation as resolved.
Show resolved Hide resolved

@classmethod
def from_config(cls, config_file_path: str) -> "BlueiceExtendedModel":
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""Initializes the statistical model from a yaml config file.

Args:
config_file_path (str): Path to the yaml config file.

Returns:
BlueiceExtendedModel: Statistical model.
"""
with open(config_file_path, "r") as f:
config = yaml.safe_load(f)
return cls(**config)

@property
def data(self) -> list:
"""Returns the data of the statistical model."""
return super().data

@data.setter
def data(self, data: list):
"""
Overrides default setter. Will also set the data of the blueice ll.
Data-sets are expected to be in the form of a list of one or more structured arrays-- representing the data-sets of one or more likelihood terms.
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""
# iterate through all likelihood terms and set the science data in the blueice ll
# last entry in data are the generate_values
for d, ll_term in zip(data[:-1], self._likelihood.likelihood_list):
ll_term.set_data(d)

self._data = data

def get_expectation_values(self, **kwargs) -> dict:
"""
Return total expectation values (summed over all likelihood terms with the same name)
given a number of named parameters (kwargs)
"""
ret = dict()
hammannr marked this conversation as resolved.
Show resolved Hide resolved
for ll in self._likelihood.likelihood_list[:-1]: # ancillary likelihood does not contribute

ll_pars = list(ll.rate_parameters.keys()) + list(ll.shape_parameters.keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS221 Found line with high Jones Complexity: 15 > 14

ll_pars += ["livetime_days"]
call_args = {k: i for k, i in kwargs.items() if k in ll_pars}

mus = ll(full_output=True, **call_args)[1]
for n, mu in zip(ll.source_name_list, mus):
ret[n] = ret.get(n, 0) + mu
return ret

def _build_ll_from_config(self, likelihood_config: dict) -> "LogLikelihoodSum":
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""
Iterate through all likelihood terms and build blueice ll

Args:
hammannr marked this conversation as resolved.
Show resolved Hide resolved
likelihood_config (dict): A dictionary defining the likelihood.
"""
lls = []

# Iterate through each likelihood term in the configuration
for config in likelihood_config["likelihood_terms"]:
likelihood_object = locate(config["likelihood_type"])
if isinstance(likelihood_config["template_folder"], str):
template_folder_list = [likelihood_config["template_folder"]]
elif isinstance(likelihood_config["template_folder"], list):
template_folder_list = likelihood_config["template_folder"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding the alea example data folder as a default search folder here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean always just add "alea/examples/templates/wimp_templates" to the end of the list? It probably doesn't hurt adding it, though I'm not sure how often this will be used (for the examples I think it is good to write things explicitly so that people know what to change when implementing their own model).

else:
raise ValueError(
"template_folder must be either a string or a list of strings.")

blueice_config = adapt_likelihood_config_for_blueice(
config, template_folder_list)
blueice_config["livetime_days"] = self.parameters[
blueice_config["livetime_parameter"]].nominal_value
for p in self.parameters:
blueice_config[p.name] = blueice_config.get(p.name, p.nominal_value)

# add all parameters to extra_dont_hash for each source unless it is used:
for i, source in enumerate(config["sources"]):
parameters_to_ignore: List[str] = [p.name for p in self.parameters if (p.ptype == "shape")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS465 Found likely bitwise and boolean operation mixup

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (106 > 100 characters)

& (p.name not in source["parameters"])]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
W503 line break before binary operator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

# no efficiency affects PDF:
parameters_to_ignore += [p.name for p in self.parameters if (p.ptype == "efficiency")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (102 > 100 characters)

parameters_to_ignore += source.get("extra_dont_hash_settings", [])

# ignore all shape parameters known to this model not named specifically in the source:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (103 > 100 characters)

blueice_config["sources"][i]["extra_dont_hash_settings"] = parameters_to_ignore

ll = likelihood_object(blueice_config)

for source in config["sources"]:

# Set rate parameters
rate_parameters = [
p for p in source["parameters"] if self.parameters[p].ptype == "rate"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

if len(rate_parameters) != 1:
raise ValueError(
f"Source {source['name']} must have exactly one rate parameter.")
rate_parameter = rate_parameters[0]
if rate_parameter.endswith("_rate_multiplier"):
rate_parameter = rate_parameter.replace("_rate_multiplier", "")
ll.add_rate_parameter(rate_parameter, log_prior=None)
else:
raise NotImplementedError(
"Only rate multipliers that end on _rate_multiplier are currently supported.")
hammannr marked this conversation as resolved.
Show resolved Hide resolved

# Set shape parameters
shape_parameters = [
p for p in source["parameters"] if self.parameters[p].ptype == "shape"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p

if shape_parameters:
# TODO: Implement setting shape parameters
raise NotImplementedError("Shape parameters are not yet supported.")

# Set efficiency parameters
if source.get("apply_efficiency", False):
self._set_efficiency(source, ll)

ll.prepare()
lls.append(ll)

# Ancillary likelihood
ll = CustomAncillaryLikelihood(self.parameters.with_uncertainty)
lls.append(ll)

return LogLikelihoodSum(lls, likelihood_weights=likelihood_config["likelihood_weights"])

def _build_data_generators(self) -> list:
# last one is AncillaryLikelihood
# IDEA: Also implement data generator for ancillary ll term.
return [BlueiceDataGenerator(ll_term) for ll_term in self._likelihood.likelihood_list[:-1]]

def _ll(self, **generate_values) -> float:
return self._likelihood(**generate_values)

def _generate_data(self, **generate_values) -> list:
# generate_values are already filtered and filled by the nominal values\
# through the generate_data method in the parent class
science_data = self._generate_science_data(**generate_values)
ancillary_keys = self.parameters.with_uncertainty.names
generate_values_anc = {k: v for k, v in generate_values.items() if k in ancillary_keys}
ancillary_measurements = self._generate_ancillary_measurements(
**generate_values_anc)
return science_data + [ancillary_measurements] + [generate_values]

def _generate_science_data(self, **generate_values) -> list:
science_data = [gen.simulate(**generate_values)
for gen in self.data_generators]
return science_data

def _generate_ancillary_measurements(self, **generate_values) -> dict:
ancillary_measurements = {}
anc_ll = self._likelihood.likelihood_list[-1]
ancillary_generators = anc_ll._get_constraint_functions(**generate_values)
for name, gen in ancillary_generators.items():
parameter_meas = gen.rvs()
# correct parameter_meas if out of bounds
param = self.parameters[name]
if not param.value_in_fit_limits(parameter_meas):
if param.fit_limits[0] is not None and parameter_meas < param.fit_limits[0]:
parameter_meas = param.fit_limits[0]
elif param.fit_limits[1] is not None and parameter_meas > param.fit_limits[1]:
parameter_meas = param.fit_limits[1]
ancillary_measurements[name] = parameter_meas

return ancillary_measurements

def _set_efficiency(self, source, ll):
assert "efficiency_name" in source, "Unspecified efficiency_name for source {:s}".format(source["name"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (112 > 100 characters)

efficiency_name = source["efficiency_name"]
assert efficiency_name in source[

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

"parameters"], "The efficiency_name for source {:s} is not in its parameter list".format(source["name"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (116 > 100 characters)

efficiency_parameter = self.parameters[efficiency_name]
assert efficiency_parameter.ptype == "efficiency", "The parameter {:s} must" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
N400: Found backslash that is used for line breaking

" be an efficiency".format(efficiency_name)
limits = efficiency_parameter.fit_limits
assert 0 <= limits[0], 'Efficiency parameters including {:s} must be' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
N400: Found backslash that is used for line breaking

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS309 Found reversed compare order

' constrained to be nonnegative'.format(efficiency_name)
assert np.isfinite(limits[1]), 'Efficiency parameters including {:s} must be' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
N400: Found backslash that is used for line breaking

' constrained to be finite'.format(efficiency_name)
ll.add_shape_parameter(efficiency_name, anchors=(limits[0], limits[1]))


class CustomAncillaryLikelihood(LogAncillaryLikelihood):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""
Custom ancillary likelihood that can be used to add constraint terms
for parameters of the likelihood.

Args:
parameters (Parameters): Parameters object containing the
parameters to be constrained.
"""

def __init__(self, parameters: Parameters):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""Initialize the CustomAncillaryLikelihood.

Args:
parameters (Parameters): Parameters object containing the
parameters to be constrained.
"""
self.parameters = parameters
# check that there are no None values in the uncertainties dict
assert set(self.parameters.uncertainties.keys()) == set(self.parameters.names)
hammannr marked this conversation as resolved.
Show resolved Hide resolved
parameter_list = self.parameters.names

self.constraint_functions = self._get_constraint_functions()
super().__init__(func=self.ancillary_likelihood_sum,
parameter_list=parameter_list,
config=self.parameters.nominal_values)

@property
def constraint_terms(self) -> dict:
"""
hammannr marked this conversation as resolved.
Show resolved Hide resolved
Dict of all constraint terms (logpdf of constraint functions)
of the ancillary likelihood.
"""
return {name: func.logpdf for name, func in self.constraint_functions.items()}

def set_data(self, d: dict):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""
Set the data of the ancillary likelihood (ancillary measurements).

Args:
hammannr marked this conversation as resolved.
Show resolved Hide resolved
d (dict): Data in this case is a dict of ancillary measurements.
"""
# This results in shifted constraint terms.
assert set(d.keys()) == set(self.parameters.names)
hammannr marked this conversation as resolved.
Show resolved Hide resolved
self.constraint_functions = self._get_constraint_functions(**d)

def ancillary_likelihood_sum(self, evaluate_at: dict) -> float:
"""Return the sum of all constraint terms.

Args:
evaluate_at (dict): Values of the ancillary measurements.

Returns:
float: Sum of all constraint terms.
"""
evaluated_constraint_terms = [
term(evaluate_at[name]) for name, term in self.constraint_terms.items()
]
return np.sum(evaluated_constraint_terms)

def _get_constraint_functions(self, **generate_values) -> dict:
central_values = self.parameters(**generate_values)
constraint_functions = {}
for name, uncertainty in self.parameters.uncertainties.items():
param = self.parameters[name]
if param.relative_uncertainty:
uncertainty *= param.nominal_value
if isinstance(uncertainty, float):
func = stats.norm(central_values[name],
uncertainty)
else:
# TODO: Implement str-type uncertainties
NotImplementedError(
"Only float uncertainties are supported at the moment.")
constraint_functions[name] = func
return constraint_functions
7 changes: 4 additions & 3 deletions alea/examples/gaussian_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,17 @@


class GaussianModel(StatisticalModel):
def __init__(self, parameter_definition: Optional[dict or list] = None):
def __init__(self, parameter_definition: Optional[dict or list] = None,
**kwargs):
"""
Initialise a model of a gaussian measurement (hatmu),
where the model has parameters mu and sigma
For illustration, we show how required nominal parameters can be added to the init
sigma is fixed in this example.
"""
if parameter_definition is None:
parameter_definition = ['mu', 'sigma']
super().__init__(parameter_definition=parameter_definition)
parameter_definition = ["mu", "sigma"]
super().__init__(parameter_definition=parameter_definition, **kwargs)

def _ll(self, mu=None, sigma=None):
hat_mu = self.data[0]['hat_mu'][0]
Expand Down
54 changes: 54 additions & 0 deletions alea/examples/unbinned_wimp_running.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Just a placeholder for now to help thinking about the structure.
statistical_model_config: unbinned_wimp_statistical_model.yaml

poi: wimp_rate_multiplier

computation:
discovery_power:
parameters_to_zip: {}
parameters_to_vary:
{poi_expectation: "np.linspace(0, 30, 10)", wimp_mass: [10, 50, 200] }
parameters_in_common:
{
hypotheses: ["true", "null", "free"],
output_filename: "toymc_power_wimp_mass_{wimp_mass:d}_poi_expectation_{poi_expectation:.2f}.hdf5",
n_mc: 5000,
n_batch: 40,
}
toydata_mode: "generate"

threshold:
parameters_to_zip: {}
parameters_to_vary: { wimp_mass: [10, 50, 200] }
parameters_in_common:
{
hypotheses: ["true", "null", "free"],
output_filename: "toymc_power_wimp_mass_{wimp_mass:d}_poi_expectation_{poi_expectation:.2f}.hdf5",
n_mc: 5000,
n_batch: 40,
}
limit_threshold: "thresholds.hdf5"
toydata_mode: "generate"
parameters_as_wildcards: ["poi_expectation", "n_mc", "n_batch"]

sensitivity:
parameters_to_zip: {}
parameters_to_vary: { poi_expectation: [0.], wimp_mass: [10, 50, 200] }
parameters_in_common:
{
hypotheses: ["true", "null", "free"],
output_filename: "toymc_power_wimp_mass_{wimp_mass:d}_poi_expectation_{poi_expectation:.2f}.hdf5",
n_mc: 5000,
n_batch: 40,
compute_confidence_interval: True,
limit_threshold: "thresholds.hdf5",
}
toydata_mode: "generate"

midway_path: null
OSG_path: null
OSG_parameters:
request_memory: "8000Mb"
singularity_container: null

outputfolder: null
Loading