Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimator of signal multiplier based on perturbation theory #147

Merged
merged 6 commits into from
Mar 18, 2024

Conversation

zihaoxu98
Copy link
Contributor

@zihaoxu98 zihaoxu98 commented Mar 15, 2024

This PR adds a function to alea.utils that can estimate the best-fit signal multiplier given signal background model and data histogram. Compared to the iminuit fitting, it totally gets rid of the initial guess, which could cause fitting failure if it's too ridiculous. This function can be used as an initial guess of the multiplier when there are more nuisance parameters in the likelihood. The math is summarized here

When testing it, it already gives quite good fit

Screenshot 2024-03-17 at 1 44 34 PM

To reproduce it,

import numpy as np
import multihist as mh
import matplotlib.pyplot as plt
import inference_interface as ii

from alea.utils import signal_multiplier_estimator

np.random.seed(1)

@np.errstate(invalid='ignore', divide='ignore')
def loglikelihood(lam, data, bkg, sig):
    exp = bkg + lam * sig
    assert np.all(data[exp <= 0] == 0)
    return np.sum(np.where(exp > 0, data * np.log(exp) - exp, 0))

bkg = ii.template_to_multihist('/home/zihaoxu/alea/alea/examples/templates/er_template_0.ii.h5', hist_name='er_template')
sig = ii.template_to_multihist('/home/zihaoxu/alea/alea/examples/templates/wimp50gev_template.ii.h5', hist_name='wimp_template')

data = (bkg + sig*1e-1).get_random(size=np.random.poisson(bkg.n))
data = mh.Histdd(*data.T, bins=bkg.bin_edges)

x = np.logspace(-3, 0.3, 100)
plt.plot(x, [loglikelihood(_x, data.histogram, bkg.histogram, sig.histogram) for _x in x])
best_fit_x = signal_multiplier_estimator(sig.histogram, bkg.histogram, data.histogram)
plt.axvline(best_fit_x, color='r', ls='--')
plt.title(f'Best fit multiplier = {best_fit_x:.1e}')
plt.xscale('log')
plt.ylabel('log likelihood')
plt.xlabel('WIMP multiplier')
plt.show()

@coveralls
Copy link

coveralls commented Mar 15, 2024

Pull Request Test Coverage Report for Build 8321909363

Details

  • 22 of 22 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 91.892%

Totals Coverage Status
Change from base Build 8314745889: 0.1%
Covered Lines: 1496
Relevant Lines: 1628

💛 - Coveralls

@zihaoxu98 zihaoxu98 marked this pull request as ready for review March 15, 2024 22:06
Copy link
Collaborator

@dachengx dachengx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zihaoxu98 . Would you please:

  1. Set random seed like np.random.seed(1) of the MWE in the PR description to make your plot reproducible
  2. Add a comparison between the result of signal_multiplier_estimator and the result of StatisticalModel.fit
  3. Add a test of the function
  4. Add convergence flag of the result of iteration like https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

@zihaoxu98
Copy link
Contributor Author

  1. Added
  2. See the following figure. Note that the new function is just an estimator, which could differ from model.fit() if there are nuisance parameters or ancillary term. The advantage is the good convergence performance of the function, thus can be used as the estimator but NOT the true fitting.
    Screenshot 2024-03-17 at 8 14 03 PM
  3. Added
  4. I think as long as the data is described normally by the templates, i.e. no data point with pdf = 0, the function will always converge. Not sure if a flag is needed.

The code to make the above plot is here. Maybe there's a better way to hack into the templates?

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from copy import deepcopy
import alea
from alea import BlueiceExtendedModel

# Just some plotting settings
import matplotlib as mpl

mpl.rcParams["figure.dpi"] = 200
mpl.rcParams["figure.figsize"] = [4, 3]
mpl.rcParams["font.family"] = "serif"
mpl.rcParams["font.size"] = 9

# initialize
config_path = "unbinned_wimp_statistical_model_simple.yaml"
model = BlueiceExtendedModel.from_config(config_path)

np.random.seed(1)
truth = {
    'wimp_rate_multiplier': 2.0,
    'er_rate_multiplier': 1.0,
    'er_band_shift': 0,
}
data = model.generate_data(**truth)
model.data = data

bkg = model.likelihood_list[0].base_model.get_source('er')
num_bkg = bkg.events_per_day * model.parameters['livetime'].nominal_value
bkg = bkg._pdf_histogram / bkg._pdf_histogram.n * num_bkg

sig = model.likelihood_list[0].base_model.get_source('wimp')
num_sig = sig.events_per_day * model.parameters['livetime'].nominal_value
sig = sig._pdf_histogram / sig._pdf_histogram.n * num_sig

obs = sig.similar_blank_hist()
obs.add(data['science_run']['cs1'], data['science_run']['cs2'])

x = np.geomspace(1e-1, 20, 100)
plt.plot(x, [model.ll(wimp_rate_multiplier=_x) for _x in x])
plt.axvline(model.fit()[0]['wimp_rate_multiplier'], color='k', label='model.fit')
plt.axvline(alea.utils.signal_multiplier_estimator(sig.histogram, bkg.histogram, obs.histogram), color='k', ls='--', label='signal_multiplier_estimator')
plt.xscale('log')
plt.xlabel('wimp_rate_multiplier')
plt.ylabel('log likelihood')
plt.legend()

@dachengx dachengx self-requested a review March 18, 2024 06:02
Copy link
Collaborator

@dachengx dachengx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zihaoxu98 . I added a diagnostic plot to show the convergence.

@dachengx dachengx merged commit d14b41a into main Mar 18, 2024
7 checks passed
@dachengx dachengx deleted the best_fit_estimator branch March 18, 2024 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants