Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of an nT-like likelihood #32

Merged
merged 48 commits into from
Jul 17, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ccfa0f3
draft example configs to think about structure
hammannr Jun 22, 2023
afa71a5
draft structure of BlueiceExtendedModel
hammannr Jun 22, 2023
a94c1bc
improve structure
hammannr Jun 22, 2023
7905879
start implementing data generation
hammannr Jun 22, 2023
2a71187
try to organize likelihood parameters
hammannr Jun 22, 2023
6d3415a
take notes
hammannr Jun 23, 2023
025a21c
start implementing ancillary ll term
hammannr Jun 23, 2023
4d05a81
continue working on anc. likelihood
hammannr Jun 29, 2023
2e9eb09
start _get_constraint_terms
hammannr Jun 29, 2023
e759366
improve ancillary likelihood
hammannr Jun 29, 2023
35e065d
finish ancillary likelihood implementation
hammannr Jun 30, 2023
26e49b0
restructure generation of ancillary measurements
hammannr Jun 30, 2023
898e1b6
start likelihood construction
hammannr Jun 30, 2023
20ec3a4
add aux ll
hammannr Jul 4, 2023
d630458
move blueice extended model to top layer
hammannr Jul 4, 2023
a0ea2c6
adapt config for blueice
hammannr Jul 4, 2023
aecc75e
make it run for the first time
hammannr Jul 4, 2023
65aa50d
fix likelihood init bug
hammannr Jul 11, 2023
e9570d4
Merge branch 'master' into nt_likelihood
hammannr Jul 12, 2023
8e5b6d8
undo indenting docstring
hammannr Jul 12, 2023
cdf6ecd
update template path
hammannr Jul 12, 2023
ff936ec
cleanup some outdated things in statistical_model
hammannr Jul 12, 2023
0faf6a3
minor compatibility fixes
hammannr Jul 12, 2023
031ff42
Expectation value implementation
kdund Jul 12, 2023
3143b1f
fix poi and expectation values
hammannr Jul 13, 2023
a45af71
pause some workflows for now
hammannr Jul 13, 2023
2c5ec8b
docstrings and some cleanup
hammannr Jul 13, 2023
ecda2a6
More sensible order of methods
hammannr Jul 13, 2023
b0105e0
more cleanup and small additions
hammannr Jul 13, 2023
8a190c8
fix template folder problem
hammannr Jul 13, 2023
1d5137e
re-enable workflow
hammannr Jul 13, 2023
536d290
this reviewdog is really nitpicky...
hammannr Jul 13, 2023
329980d
enable template folder list
hammannr Jul 13, 2023
2814612
template_folder fix
hammannr Jul 13, 2023
6b16f4f
change livetime names in config
hammannr Jul 13, 2023
1953474
Utils fcn to find file in one of several paths.
kdund Jul 13, 2023
b3614da
Merge remote-tracking branch 'origin/nt_likelihood' into nt_likelihood
kdund Jul 13, 2023
64ceaa2
Removes all hash for parameters not used for each source, and for all…
kdund Jul 13, 2023
0eccaf0
Accepted some of the linting comments
kdund Jul 13, 2023
28c5dd6
fix ptype - type missmatch
hammannr Jul 14, 2023
97d3dfe
The dog convinced me: Let's go with ptype instead.
hammannr Jul 14, 2023
2752b0f
remove my TODOs
hammannr Jul 14, 2023
5fadf96
Separate out efficiency set fcn
kdund Jul 14, 2023
ac602d3
Remove errant code in get_expectation_values in base class.
kdund Jul 14, 2023
214fd18
Remove errant code in get_expectation_values in base class.
kdund Jul 14, 2023
0c41d99
linter mess
kdund Jul 14, 2023
90c2ca1
.type -> .ptype
hammannr Jul 17, 2023
7fd0d13
Merge pull request #37 from XENONnT/nt_likelihood_efficiency
hammannr Jul 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ development_scripts/*hdf
__pycache__/
*ipynb
!examples/*.ipynb
debug.py
178 changes: 178 additions & 0 deletions alea/blueice_extended_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
from pydoc import locate # to lookup likelihood class
from alea.statistical_model import StatisticalModel
from alea.simulators import BlueiceDataGenerator
from alea.utils import adapt_likelihood_config_for_blueice
import yaml
import numpy as np
import scipy.stats as stats
hammannr marked this conversation as resolved.
Show resolved Hide resolved
from blueice.likelihood import LogAncillaryLikelihood
from blueice.likelihood import LogLikelihoodSum
# from inference_interface import dict_to_structured_array
hammannr marked this conversation as resolved.
Show resolved Hide resolved


class BlueiceExtendedModel(StatisticalModel):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, parameter_definition: dict, likelihood_terms: dict):
"""
hammannr marked this conversation as resolved.
Show resolved Hide resolved
# TODO write docstring
"""
super().__init__(parameter_definition=parameter_definition)
self._likelihood = self._build_ll_from_config(likelihood_terms)
self.likelihood_names = [c["name"] for c in likelihood_terms]
self.likelihood_names.append("ancillary_likelihood")
self.data_generators = self._build_data_generators()

# TODO analysis_space should be inferred from the data (assert that all sources have the same analysis space)
hammannr marked this conversation as resolved.
Show resolved Hide resolved

@classmethod
def from_config(cls, config_file):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
with open(config_file, "r") as f:
config = yaml.safe_load(f)
return cls(**config)

def _ll(self, **generate_values):
# TODO: Does this make sense?
return self._likelihood(**generate_values)

def _generate_data(self, **generate_values):
# generate_values are already filtered and filled by the nominal values through the generate_data method in the parent class
hammannr marked this conversation as resolved.
Show resolved Hide resolved
science_data = self._generate_science_data(**generate_values)
ancillary_keys = self.parameters.with_uncertainty.names
generate_values_anc = {k: v for k, v in generate_values.items() if k in ancillary_keys}
ancillary_measurements = self._generate_ancillary_measurements(
**generate_values_anc)
# generate_values = dict_to_structured_array(generate_values)
hammannr marked this conversation as resolved.
Show resolved Hide resolved
return science_data + [ancillary_measurements] + [generate_values]

def _generate_science_data(self, **generate_values):
science_data = [gen.simulate(**generate_values)
for gen in self.data_generators]
return science_data

def _generate_ancillary_measurements(self, **generate_values):
ancillary_measurements = {}
anc_ll = self._likelihood.likelihood_list[-1]
ancillary_generators = anc_ll._get_constraint_functions(**generate_values)
for name, gen in ancillary_generators.items():
parameter_meas = gen.rvs()
# correct parameter_meas if out of bounds
param = self.parameters[name]
if not param.value_in_fit_limits(parameter_meas):
if param.fit_limits[0] is not None and parameter_meas < param.fit_limits[0]:
parameter_meas = param.fit_limits[0]
elif param.fit_limits[1] is not None and parameter_meas > param.fit_limits[1]:
parameter_meas = param.fit_limits[1]
ancillary_measurements[name] = parameter_meas
# TODO: Do we need this as a structured array?
# ancillary_measurements = dict_to_structured_array(ancillary_measurements)
hammannr marked this conversation as resolved.
Show resolved Hide resolved

return ancillary_measurements

# TODO: Override uncertainty setter to also set the uncertainty of the ancillary ll term (func_args). Or for now override the uncertainty setter to not work and raise a warning.
hammannr marked this conversation as resolved.
Show resolved Hide resolved

@property
def data(self):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
return super().data

@data.setter
def data(self, data):
"""
Overrides default setter. Will also set the data of the blueice ll.
Data-sets are expected to be in the form of a list of one or more structured arrays-- representing the data-sets of one or more likelihood terms.
hammannr marked this conversation as resolved.
Show resolved Hide resolved
"""
# iterate through all likelihood terms and set the science data in the blueice ll
# last entry in data are the generate_values
for d, ll_term in zip(data[:-1], self._likelihood.likelihood_list):
ll_term.set_data(d)

self._data = data

@property
def nominal_expectation_values(self):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
# TODO
# IDEA also enable a setter that changes the rate parameters?
pass

def get_expectation_values(self, **kwargs):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
# TODO
pass

def _build_ll_from_config(self, likelihood_terms):
# iterate through ll_config and build blueice ll
lls = []
for config in likelihood_terms:
likelihood_object = locate(config["likelihood_type"])
blueice_config = adapt_likelihood_config_for_blueice(config)
blueice_config["livetime_days"] = self.parameters[
blueice_config["livetime_parameter"]].nominal_value
ll = likelihood_object(blueice_config)
# Set rate parameters
for source in config["sources"]:
for param_name in source["parameters"]:
if self.parameters[param_name].type == "rate":
# TODO: Check that only one rate per source is set?
if param_name.endswith("_rate_multiplier"):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
param_name = param_name.replace("_rate_multiplier", "")
hammannr marked this conversation as resolved.
Show resolved Hide resolved
ll.add_rate_parameter(param_name, log_prior=None)
hammannr marked this conversation as resolved.
Show resolved Hide resolved
else:
NotImplementedError
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
# TODO: Set shape parameters

ll.prepare()
lls.append(ll)
# Ancillary likelihood
ll = CustomAncillaryLikelihood(self.parameters.with_uncertainty)
lls.append(ll)

# TODO: Include likelihood_weights
return LogLikelihoodSum(lls, likelihood_weights=None)

def _build_data_generators(self):
# last one is AncillaryLikelihood
# IDEA: Also implement data generator for ancillary ll term.
return [BlueiceDataGenerator(ll_term) for ll_term in self._likelihood.likelihood_list[:-1]]

# Build wrapper to conveniently define a constraint likelihood term


class CustomAncillaryLikelihood(LogAncillaryLikelihood):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
hammannr marked this conversation as resolved.
Show resolved Hide resolved
# TODO: Make sure the functions and terms are properly implemented now.
hammannr marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, parameters):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
self.parameters = parameters
# check that there are no None values in the uncertainties dict
assert set(self.parameters.uncertainties.keys()) == set(self.parameters.names)
hammannr marked this conversation as resolved.
Show resolved Hide resolved
parameter_list = self.parameters.names

self.constraint_functions = self._get_constraint_functions()
super().__init__(func=self.ancillary_likelihood_sum,
parameter_list=parameter_list,
config=self.parameters.nominal_values)

@property
def constraint_terms(self):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
return {name: func.logpdf for name, func in self.constraint_functions.items()}

def set_data(self, d: dict):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
# data in this case is a set of ancillary measurements.
# This results in shifted constraint terms.
assert set(d.keys()) == set(self.parameters.names)
hammannr marked this conversation as resolved.
Show resolved Hide resolved
self.constraint_functions = self._get_constraint_functions(**d)

def _get_constraint_functions(self, **generate_values) -> dict:
central_values = self.parameters(**generate_values)
constraint_functions = {}
for name, uncertainty in self.parameters.uncertainties.items():
param = self.parameters[name]
if param.relative_uncertainty:
uncertainty *= param.nominal_value
if isinstance(uncertainty, float):
func = stats.norm(central_values[name],
uncertainty)
else:
# TODO: Implement str-type uncertainties
NotImplementedError(
"Only float uncertainties are supported at the moment.")
constraint_functions[name] = func
return constraint_functions

def ancillary_likelihood_sum(self, evaluate_at: dict):
hammannr marked this conversation as resolved.
Show resolved Hide resolved
return np.sum([term(evaluate_at[name]) for name, term in self.constraint_terms.items()])
hammannr marked this conversation as resolved.
Show resolved Hide resolved
7 changes: 4 additions & 3 deletions alea/examples/gaussian_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,17 @@


class GaussianModel(StatisticalModel):
def __init__(self, parameter_definition: Optional[dict or list] = None):
def __init__(self, parameter_definition: Optional[dict or list] = None,
**kwargs):
"""
Initialise a model of a gaussian measurement (hatmu),
where the model has parameters mu and sigma
For illustration, we show how required nominal parameters can be added to the init
sigma is fixed in this example.
"""
if parameter_definition is None:
parameter_definition = ['mu', 'sigma']
super().__init__(parameter_definition=parameter_definition)
parameter_definition = ["mu", "sigma"]
super().__init__(parameter_definition=parameter_definition, **kwargs)

def _ll(self, mu=None, sigma=None):
hat_mu = self.data[0]['hat_mu'][0]
Expand Down
54 changes: 54 additions & 0 deletions alea/examples/unbinned_wimp_running.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Just a placeholder for now to help thinking about the structure.
statistical_model_config: unbinned_wimp_statistical_model.yaml

poi: wimp_rate_multiplier

computation:
discovery_power:
parameters_to_zip: {}
parameters_to_vary:
{poi_expectation: "np.linspace(0, 30, 10)", wimp_mass: [10, 50, 200] }
parameters_in_common:
{
hypotheses: ["true", "null", "free"],
output_filename: "toymc_power_wimp_mass_{wimp_mass:d}_poi_expectation_{poi_expectation:.2f}.hdf5",
n_mc: 5000,
n_batch: 40,
}
toydata_mode: "generate"

threshold:
parameters_to_zip: {}
parameters_to_vary: { wimp_mass: [10, 50, 200] }
parameters_in_common:
{
hypotheses: ["true", "null", "free"],
output_filename: "toymc_power_wimp_mass_{wimp_mass:d}_poi_expectation_{poi_expectation:.2f}.hdf5",
n_mc: 5000,
n_batch: 40,
}
limit_threshold: "thresholds.hdf5"
toydata_mode: "generate"
parameters_as_wildcards: ["poi_expectation", "n_mc", "n_batch"]

sensitivity:
parameters_to_zip: {}
parameters_to_vary: { poi_expectation: [0.], wimp_mass: [10, 50, 200] }
parameters_in_common:
{
hypotheses: ["true", "null", "free"],
output_filename: "toymc_power_wimp_mass_{wimp_mass:d}_poi_expectation_{poi_expectation:.2f}.hdf5",
n_mc: 5000,
n_batch: 40,
compute_confidence_interval: True,
limit_threshold: "thresholds.hdf5",
}
toydata_mode: "generate"

midway_path: null
OSG_path: null
OSG_parameters:
request_memory: "8000Mb"
singularity_container: null

outputfolder: null
107 changes: 107 additions & 0 deletions alea/examples/unbinned_wimp_statistical_model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
parameter_definition:
wimp_mass:
nominal_value: 50
fittable: false
description: WIMP mass in GeV/c^2

livetime_0:
nominal_value: 0.2
fittable: false
description: Livetime of SR0 in years

livetime_1:
nominal_value: 1.0
fittable: false
description: Livetime of SR1 in years

wimp_rate_multiplier:
nominal_value: 1.0
ptype: rate
fittable: true
fit_limits:
- 0
- null

er_rate_multiplier:
nominal_value: 1.0
ptype: rate
uncertainty: 0.2
relative_uncertainty: true
fittable: true
fit_limits:
- 0
- null
fit_guess: 1.0

# er_band_shift:
# nominal_value: 0
# ptype: shape
# uncertainty: scipy.stats.uniform(loc=-1, scale=2)
# relative_uncertainty: false
# fittable: true
# blueice_anchors:
# - -1
# - 0
# - 1
# fit_limits:
# - -1
# - 1
# description: ER band shape parameter (shifts the ER band up and down)


likelihood_terms:
# SR0
- name: sr0
default_source_class: alea.template_source.TemplateSource
likelihood_type: blueice.likelihood.UnbinnedLogLikelihood
analysis_space:
- "cs1": 'np.arange(0, 102, 2)'
- "cs2": 'np.geomspace(100, 100000, 51)'
in_events_per_bin: true
livetime_parameter: livetime_0
slice_args: {}
sources:
- name: er
histname: er_template # TODO: implement a default histname based on the source name
parameters:
- er_rate_multiplier
# - er_band_shift
templatepath: examples/er_template.h5
histogram_scale_factor: 1

- name: wimp
histname: wimp_template
parameters:
- wimp_rate_multiplier
- wimp_mass
templatepath: examples/wimp50gev_template.h5
apply_efficiency: False
efficiency_name: 'signal_eff' # TODO: Check

# SR1
- name: sr1
default_source_class: alea.template_source.TemplateSource
likelihood_type: blueice.likelihood.UnbinnedLogLikelihood
analysis_space:
- "cs1": 'np.arange(0, 102, 2)'
- "cs2": 'np.geomspace(100, 100000, 51)'
in_events_per_bin: true
livetime_parameter: livetime_1
slice_args: {}
sources:
- name: er
histname: er_template
parameters:
- er_rate_multiplier
# - er_band_shift
templatepath: examples/er_template.h5
histogram_scale_factor: 2

- name: wimp
histname: wimp_template
parameters:
- wimp_rate_multiplier
- wimp_mass
templatepath: examples/wimp50gev_template.h5
apply_efficiency: False
efficiency_name: 'wimp_eff' # TODO: Check
19 changes: 19 additions & 0 deletions alea/parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,25 @@ def not_fittable(self) -> List[str]:
"""
return [name for name, param in self.parameters.items() if not param.fittable]

@property
def uncertainties(self) -> dict:
"""
hammannr marked this conversation as resolved.
Show resolved Hide resolved
return a dict of name:uncertainty for all parameters with a not-NaN uncertainty.
"""
return {k: i.uncertainty for k, i in self.parameters.items() if i.uncertainty is not None}
hammannr marked this conversation as resolved.
Show resolved Hide resolved

@property
def with_uncertainty(self) -> "Parameters":
"""
Return parameters with a not-NaN uncertainty.
The parameters are the same objects as in the original Parameters object, not a copy.
"""
param_dict = {k: i for k, i in self.parameters.items() if i.uncertainty is not None}
hammannr marked this conversation as resolved.
Show resolved Hide resolved
params = Parameters()
for param in param_dict.values():
params.add_parameter(param)
return params

@property
def nominal_values(self) -> dict:
"""
Expand Down
Loading