Estimator of signal multiplier based on perturbation theory #147

zihaoxu98 · 2024-03-15T21:54:24Z

This PR adds a function to alea.utils that can estimate the best-fit signal multiplier given signal background model and data histogram. Compared to the iminuit fitting, it totally gets rid of the initial guess, which could cause fitting failure if it's too ridiculous. This function can be used as an initial guess of the multiplier when there are more nuisance parameters in the likelihood. The math is summarized here

When testing it, it already gives quite good fit

To reproduce it,

import numpy as np
import multihist as mh
import matplotlib.pyplot as plt
import inference_interface as ii

from alea.utils import signal_multiplier_estimator

np.random.seed(1)

@np.errstate(invalid='ignore', divide='ignore')
def loglikelihood(lam, data, bkg, sig):
    exp = bkg + lam * sig
    assert np.all(data[exp <= 0] == 0)
    return np.sum(np.where(exp > 0, data * np.log(exp) - exp, 0))

bkg = ii.template_to_multihist('/home/zihaoxu/alea/alea/examples/templates/er_template_0.ii.h5', hist_name='er_template')
sig = ii.template_to_multihist('/home/zihaoxu/alea/alea/examples/templates/wimp50gev_template.ii.h5', hist_name='wimp_template')

data = (bkg + sig*1e-1).get_random(size=np.random.poisson(bkg.n))
data = mh.Histdd(*data.T, bins=bkg.bin_edges)

x = np.logspace(-3, 0.3, 100)
plt.plot(x, [loglikelihood(_x, data.histogram, bkg.histogram, sig.histogram) for _x in x])
best_fit_x = signal_multiplier_estimator(sig.histogram, bkg.histogram, data.histogram)
plt.axvline(best_fit_x, color='r', ls='--')
plt.title(f'Best fit multiplier = {best_fit_x:.1e}')
plt.xscale('log')
plt.ylabel('log likelihood')
plt.xlabel('WIMP multiplier')
plt.show()

coveralls · 2024-03-15T21:57:57Z

Pull Request Test Coverage Report for Build 8321909363

Details

22 of 22 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.1%) to 91.892%

Totals
Change from base Build 8314745889:	0.1%
Covered Lines:	1496
Relevant Lines:	1628

💛 - Coveralls

dachengx

Thanks @zihaoxu98 . Would you please:

Set random seed like np.random.seed(1) of the MWE in the PR description to make your plot reproducible
Add a comparison between the result of signal_multiplier_estimator and the result of StatisticalModel.fit
Add a test of the function
Add convergence flag of the result of iteration like https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

zihaoxu98 · 2024-03-18T00:31:36Z

Added
See the following figure. Note that the new function is just an estimator, which could differ from model.fit() if there are nuisance parameters or ancillary term. The advantage is the good convergence performance of the function, thus can be used as the estimator but NOT the true fitting.
Added
I think as long as the data is described normally by the templates, i.e. no data point with pdf = 0, the function will always converge. Not sure if a flag is needed.

The code to make the above plot is here. Maybe there's a better way to hack into the templates?

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from copy import deepcopy
import alea
from alea import BlueiceExtendedModel

# Just some plotting settings
import matplotlib as mpl

mpl.rcParams["figure.dpi"] = 200
mpl.rcParams["figure.figsize"] = [4, 3]
mpl.rcParams["font.family"] = "serif"
mpl.rcParams["font.size"] = 9

# initialize
config_path = "unbinned_wimp_statistical_model_simple.yaml"
model = BlueiceExtendedModel.from_config(config_path)

np.random.seed(1)
truth = {
    'wimp_rate_multiplier': 2.0,
    'er_rate_multiplier': 1.0,
    'er_band_shift': 0,
}
data = model.generate_data(**truth)
model.data = data

bkg = model.likelihood_list[0].base_model.get_source('er')
num_bkg = bkg.events_per_day * model.parameters['livetime'].nominal_value
bkg = bkg._pdf_histogram / bkg._pdf_histogram.n * num_bkg

sig = model.likelihood_list[0].base_model.get_source('wimp')
num_sig = sig.events_per_day * model.parameters['livetime'].nominal_value
sig = sig._pdf_histogram / sig._pdf_histogram.n * num_sig

obs = sig.similar_blank_hist()
obs.add(data['science_run']['cs1'], data['science_run']['cs2'])

x = np.geomspace(1e-1, 20, 100)
plt.plot(x, [model.ll(wimp_rate_multiplier=_x) for _x in x])
plt.axvline(model.fit()[0]['wimp_rate_multiplier'], color='k', label='model.fit')
plt.axvline(alea.utils.signal_multiplier_estimator(sig.histogram, bkg.histogram, obs.histogram), color='k', ls='--', label='signal_multiplier_estimator')
plt.xscale('log')
plt.xlabel('wimp_rate_multiplier')
plt.ylabel('log likelihood')
plt.legend()

dachengx

Thanks @zihaoxu98 . I added a diagnostic plot to show the convergence.

zihaoxu98 added 3 commits March 16, 2024 03:28

add sig multiplier estimator

dd442be

add some docstring

e60c93d

clip

d036934

zihaoxu98 marked this pull request as ready for review March 15, 2024 22:06

zihaoxu98 requested review from kdund, dachengx and hammannr March 15, 2024 22:07

Merge branch 'main' into best_fit_estimator

91979bb

dachengx requested changes Mar 17, 2024

View reviewed changes

add test for new function

5efc8e4

Add a diagnostic plot

d39a007

dachengx self-requested a review March 18, 2024 06:02

dachengx approved these changes Mar 18, 2024

View reviewed changes

dachengx merged commit d14b41a into main Mar 18, 2024
7 checks passed

dachengx deleted the best_fit_estimator branch March 18, 2024 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimator of signal multiplier based on perturbation theory #147

Estimator of signal multiplier based on perturbation theory #147

zihaoxu98 commented Mar 15, 2024 •

edited

Loading

coveralls commented Mar 15, 2024 •

edited

Loading

dachengx left a comment

zihaoxu98 commented Mar 18, 2024

dachengx left a comment

Estimator of signal multiplier based on perturbation theory #147

Estimator of signal multiplier based on perturbation theory #147

Conversation

zihaoxu98 commented Mar 15, 2024 • edited Loading

coveralls commented Mar 15, 2024 • edited Loading

Pull Request Test Coverage Report for Build 8321909363

Details

💛 - Coveralls

dachengx left a comment

Choose a reason for hiding this comment

zihaoxu98 commented Mar 18, 2024

dachengx left a comment

Choose a reason for hiding this comment

zihaoxu98 commented Mar 15, 2024 •

edited

Loading

coveralls commented Mar 15, 2024 •

edited

Loading