Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sources and example notebook for 1D CES inference #161

Open
wants to merge 76 commits into
base: main
Choose a base branch
from
Open

Conversation

yuema137
Copy link

@yuema137 yuema137 commented May 1, 2024

Note: This PR doesn't affect existing functions in Alea but only adds 1D CES inference.

This is a refactored version of the inference developed for SR0. Featured improvements:

  • Accommodate to common alea format to take advantage of existing alea functions
  • Simplify the structure: Instead of having multiple layers in nton.lower, here we only need to add a specific transformation and source to the ces. The rest of the function is handled by existing bases in alea, which users don't need to touch (in nton.lower users need to modify all of the layers in order to add a background or change transformation).
  • Decouple transformations (smearing, bias, efficiency) from sources. Therefore users can feed their self-defined transformations into the framework without touching the sources.
  • Enable the fitting for the parameters in transformations: in nton.lower the smearing and energy bias were hard coded and fixed during the fitting. Now they can also be fit if needed.

ces_source.py:

  • Defines the basic sources needed for 1D CES
  • Validates the transformation parameters and models
  • Users can write extended sources if there's a need. But most of time, the basic ones should be enough.

ces_transformation.py:

  • Defines the basic transformations needed for 1D CES
  • Users can add transformations/models if there's a need. Those functions may need to be changed frequently because of the update of data/cuts/corrections.

xe133_template.ii.h5

An example template (Hist1d)

unblind_ces_simple.yaml:

An example yaml config including:

  • A Hist-based component (Xe133)
  • A Monoenergetic component (test_gaussian)
  • A Flat component (test_flat)

4_ces_inference.ipynb

An example notebook showing how to build model based on example ymal config and do fittings

*Note: the rest of the changes are just auto formatting because of black . && flake8

yuema137 added 30 commits March 13, 2024 22:30
@kdund
Copy link
Collaborator

kdund commented May 2, 2024

The one

@hammannr to be specific-- you are saying that the alea/simulators.py reuses/duplicates some source .simulate functionality, right? I agree that it is cleanest with the sources handling interpolation etc. to the extent possible.

I think so, yes. I can check again and try to improve this. But if I don't miss anything, I think the simulate method of the source is currently completely circumvented.

The one problem I might see with that is that we want the source classes to be fully fledged bluesky sources also so that part kinda has to stay?

@hammannr
Copy link
Collaborator

hammannr commented May 2, 2024

The one

@hammannr to be specific-- you are saying that the alea/simulators.py reuses/duplicates some source .simulate functionality, right? I agree that it is cleanest with the sources handling interpolation etc. to the extent possible.

I think so, yes. I can check again and try to improve this. But if I don't miss anything, I think the simulate method of the source is currently completely circumvented.

The one problem I might see with that is that we want the source classes to be fully fledged bluesky sources also so that part kinda has to stay?

Yes -- my idea is to try to use as much of the source functionality as possible in the simulator. At least right now I don't see the reason (yet) why so much of the functionality of the sources is rewritten in the simulator.

@yuema137
Copy link
Author

yuema137 commented May 2, 2024

Hi @hammannr thanks for the comments! I have done the following tasks:

  • Add docstrings to the public methods
  • Revert the formatting commit
  • Add more explanations to the example notebook

And the following's are ongoing as I need to check if they affect any functions:

  • Use piecewise interpolation for now
  • Enable apply_slice_args

For those two points I'm not perfectly sure so further comments would be helpful:

  • generalized 1D framework: from my view, this is already a generalized framework as the transformations could be overwritten by users. And they could apply any possible transformations to 1d histogram. So the main change would just be renaming the files and classes. Am I missing anything here?
  • Normalization: The motivation to normalize them into event/year/keV but not event/year/ton/keV is that some of the backgrounds don't scale up with the fiducial mass. In the lower inference, we used to calculate rate multipliers in the inference, which made the code hard to modify. So, I think it's better to separate the statistical inference from the rate multiplier calculation. The rate multiplier should come from external calculations (YAML configs). But if there's a better solution I'm glad to implement

Copy link
Collaborator

@kdund kdund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @yuema137 awesome to see this PR where we get a dedicated 1D source! Some comments added, I agree with Robert that we should remove specific CES-things to make this useful for any 1D analysis we make :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here I'd put some pedagogical comments :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also love the 2D likelihood surface :)!

from multihist import Hist1d
from alea.ces_transformation import Transformation

MINIMAL_ENERGY_RESOLUTION = 0.05
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this boundary is required-- for fits, it should be set in the config fit boundary, and the bins can be as fine as we'd like and still be valid. (and other 1D fits may have a different "natural" scale for the bins)

h = efficiency_transformation.apply_transformation(h)
if smearing_transformation is not None:
h = smearing_transformation.apply_transformation(h)
if bias_transformation is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bias makes more sense to apply in true energy (i.e. before smearing) what do you think?

Copy link
Author

@yuema137 yuema137 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it's better to add in the smeared energy. Because the bias should in principle be caused by some area-dependent effect (like merging SEs and lone hits). So for example if a monoenergetic source produces two S2s, 1.1e4 and 1.2e4 separately, the biases would be slightly different for them even if the true energies are the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the bias should be a function of true parameters (I agree if we had a measurement of "true quanta released" we could define our bias in terms of that, and the CES could be considered an approximation of that, but it is only an approximation)

return h

def _transform_histogram(self, h: Hist1d):
# Only efficiency is applicable for flat source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree-- you could have an efficiency defined in true energy and then a smearing on top, I think?

Copy link
Author

@yuema137 yuema137 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to bias, efficiency is applied after the smearing. The smearing in our detector is mainly due to the Poisson fluctuation during the production of n_e and n_gamma which is before any detector effects start to get involved. This is the main reason that we set the smearing as the first transformation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the efficiency should be the probability to detect an event given a certain initial recoil energy. This is what we use for WIMPs, and it is a well-defined probability (that does include the effects you list here as important inputs!).
I think attempting to apply efficiencies based on reconstructed variables on the basis that those reconstructed variables approximate (but does not give) the underlying number of quanta that are important for the probability to be detected

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Knut, to clarify, the efficiency here is only projected in CES space. For S2 threshold and S1 efficiency, we get the CES projected values during the band fitting. For cut acceptance which is data-driven, the values are naturally only in CES. Defining everything in true energy is possible but the gain is limited. It will make every efficiency study entangled with the band fitting which is not necessary for lower or other studies performed in CES.

if bins.size <= 1:
raise ValueError("bins must have at least 2 elements")
bin_centers = 0.5 * (bins[1:] + bins[:-1])
data = stats.norm.pdf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With coarse binning, I would propose instead defining the data as the difference of the CDF evaluated at the upper edge of the bin and evaluated at the lower edge of the bin

@coveralls
Copy link

coveralls commented Dec 8, 2024

Pull Request Test Coverage Report for Build 12290764765

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 207 (0.0%) changed or added relevant lines in 2 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-8.8%) to 80.862%

Changes Missing Coverage Covered Lines Changed/Added Lines %
alea/ces_transformation.py 0 63 0.0%
alea/ces_source.py 0 144 0.0%
Files with Coverage Reduction New Missed Lines %
alea/models/blueice_extended_model.py 7 96.84%
Totals Coverage Status
Change from base Build 12040815089: -8.8%
Covered Lines: 1707
Relevant Lines: 2111

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants