RooUnfold is a framework for unfolding (AKA "deconvolution" or "unsmearing"). It currently implements six methods:
- iterative ("Bayesian"; as proposed by D'Agostini);
- singular value decomposition (SVD, as proposed by Höcker and Kartvelishvili and implemented in TSVDUnfold);
- bin-by-bin (simple correction factors);
- an interface to the TUnfold method developed by Stefan Schmitt;
- simple inversion of the response matrix without regularisation; and
- iterative dynamically stabilized unfolding (IDS) by Bogdan Malaescu, implemented by Chris Meyer.
It can be used from the ROOT prompt, from C++ (Cling) or Python (PyROOT) scripts, or linked against the ROOT libraries.
There is extensive documentation available online:
- the excellent RooUnfold tutorial by Vince Croft
- auto-generated Doxygen class documentation for the RooUnfold package
- RooUnfold package README
- RooUnfold package release notes
For support in using RooUnfold, please contact the primary author, Tim Adye (or @adye), and/or other package maintainers, @cburgard, @lbrenner, and @pverschu.
RooUnfold was written by Tim Adye, Carsten Burgard, Richard Claridge, Chris Meyer, Kerstin Tackmann, and Fergus Wilson. To cite the RooUnfold package in a publication, you can refer to the RooUnfold GitLab repository (previously this web page) and/or the paper:
Tim Adye, in Proceedings of the PHYSTAT 2011 Workshop on Statistical Issues Related to Discovery Claims in Search Experiments and Unfolding, CERN, Geneva, Switzerland, 17–20 January 2011, edited by H.B. Prosper and L. Lyons, CERN–2011–006, pp. 313–318.
Other documents included in this package:
- Corrected error calculation for iterative Bayesian unfolding (source)
- PHYSTAT 2011 proceedings (arXiv, source)
- DESY 2010 Unfolding Workshop proceedings (Indico, source)
We use unfolding to remove the known effects of measurement resolutions, systematic biases, and detection efficiency to determine the "true" distribution. We parametrise the measurement effects using a response matrix that maps the (binned) true distribution onto the measured one. For 1-dimensional true and measured distribution bins Tj and Mi, the response matrix element Rij gives the fraction of events from bin Tj that end up measured in bin Mi. The response matrix is usually determined using Monte Carlo simulation (training), with the true values coming from the generator output.
The unfolding procedure reconstructs the true Tj distribution from the measured Mi distribution, taking into account the measurement uncertainties due to statistical fluctuations in the finite measured sample (without these uncertainties, the problem could be solved uniquely by inverting the response matrix). RooUnfold provides several algorithms for solving this problem.
The iterative and SVD unfolding algorithms require a regularisation parameter to prevent the statistical fluctuations being interpreted as structure in the true distribution. It is therefore necessary to optimise this parameter for the number of bins and sample size, using Monte Carlo samples of the same size as the data. These samples can also be used to measure the effectiveness of the unfolding and hence provide estimates of the systematic errors that result from the procedure (testing).
Note that for this last step (in particular), it is important to use Monte Carlo samples with truth distributions that are statistically and systematically independent of the sample used in training (such samples would anyway be used in a systematics analysis, eg. using a different generator, or reweighting variations within the a-priori uncertainties in the truth distribution). After all, if the Monte Carlo were a perfect model of the data, we could use the Monte Carlo truth information directly and dispense with unfolding altogether!
The bin-by-bin method assumes no migration of events between bins (eg. resolution is much smaller than the bin size and no systematic shifts). This is of course trivial to implement without resorting to the RooUnfold machinery, but is included in the package to allow simple comparison with the other methods.
To use RooUnfold, we must first supply the response matrix object RooUnfoldResponse
.
It can be constructed like this:
RooUnfoldResponse response (nbins, x_lo, x_hi);
or, if different truth and measured binning is required,
RooUnfoldResponse response (nbins_measured, x_lo_measured, x_hi_measured,
nbins_true, x_lo_true, x_hi_true);
or, if different binning is required,
RooUnfoldResponse response (hist_measured, hist_truth);
In that last case, hist_measured
and hist_truth
are used to specify
the dimensions of the distributions (the histogram contents are not used here), eg.
for 2D or 3D distributions or non-uniform binning.
This RooUnfoldResponse
object is often most easily filled by looping over the training sample
and calling response.Fill(x_measured,x_true)
or,
for events that were not measured due to detection inefficiency,
response.Miss(x_true)
if (measurement_ok)
response.Fill (x_measured, x_true);
else
response.Miss (x_true);
Alternatively, the response matrix can be constructed from a pre-existing TH2D
2-dimensional histogram (with truth and measured distribution TH1D
histograms
for normalisation).
This response
object can be passed directly to the unfolding
object, or written to a ROOT file for use at a later stage (search for
examples/RooUnfoldTest.cxx
's
stage
parameter for an example of how to do this).
To do the unfolding (either to try different regularisation parameters,
for testing, or for real data), create a RooUnfold
object and pass it the
test / measured distribution (as a histogram) and the response
object.
RooUnfoldBayes unfold (&response, hist_measured, iterations);
or
RooUnfoldSvd unfold (&response, hist_measured, kterm);
or
RooUnfoldBinByBin unfold (&response, hist_measured);
hist_measured
is a pointer to a TH1D
(or TH2D
for the 2D case)
histogram of the measured distribution (it should have the same binning as the response matrix).
The classes RooUnfoldBayes
, RooUnfoldSvd
, and RooUnfoldBinByBin
all inherit from RooUnfold
and implement the different algorithms.
The integer iterations
(for RooUnfoldBayes) or kterm
(RooUnfoldSvd) is
the regularisation parameter. (Note that RooUnfoldSvd's kterm
parameter
is also known as tau
in the code. That usage is incompatible with the literature,
so we adopt k here.)
The reconstructed truth distribution (with errors) can be
obtained with the Hreco()
method.
TH1D* hist_reco= (TH1D*) unfold.Hreco();
The result can also be obtained as as a TVectorD
with
full TMatrixD
covariance matrix.
Multi-dimensional distributions can also be unfolded, though this does not work for the SVD method, and the interface is rather clumsy (we hope to improve this).
See the class documentation for details of the
RooUnfold
and RooUnfoldResponse
public methods.
A very simple example of RooUnfold's use is given
in examples/RooUnfoldExample.cxx
.
More complete tests, using different toy MC distributions, are
in examples/RooUnfoldTest.cxx
and examples/RooUnfoldTest2D.cxx
.
The regularisation parameter determines the relative weight placed on the data, compared to the training sample truth. Both RooUnfoldBayes and RooUnfoldSvd take integer regularisation parameters, with small values favouring the MC truth and larger values favouring the data.
For RooUnfoldBayes, the regularisation parameter specifies the number of iterations, starting with the training sample truth (iterations=0). You should choose a small integer greater than 0 (we use 4 in the examples). Since only a few iterations are needed, a reasonable performance can usually be obtained without fine-tuning the parameter.
The optimal regularisation parameter can be selected by finding the largest value up to which the errors remain reasonable (ie. do not become much larger than previous values). This will give the smallest systematic errors (reconstructed distribution least biased by the training truth), without too-large statistical errors. Since the statistical errors grow quite rapidly beyond this point, but the systematic bias changes quite slowly below it, it can be prudent to reduce the regularisation parameter a little below this optimal point.
For RooUnfoldSvd, the unfolding is something
like a Fourier expansion in "result to be obtained" vs "MC truth input".
Low frequencies are assumed to be systematic differences between the training MC
and the data, which should be retained in the output. High frequencies are assumed to
arise from statistical fluctuations in data and unfortunately get
numerically enhanced without proper regularization.
Choosing the regularization parameter, k (kterm
),
effectively determines up to
which frequencies the terms in the expansion are kept. (Actually,
this is not quite true, we don't use a hard cut-off but a smooth one.)
The correct choice of k is of particular importance for the SVD method. A too-small value will bias the unfolding result towards the MC truth input, a too-large value will give a result that is dominated by unphysically enhanced statistical fluctuations. This needs to be tuned for any given distribution, number of bins, and approximate sample size — with k between 2 and the number of bins. (Using k=1 means you get only the training truth input as result without any corrections. You basically regularise away any differences, and only keep the leading term which is, by construction, the MC truth input.)
Höcker and Kartvelishvili's paper (section 7) describes how to choose the optimum value for k.
Make sure that ROOT is set up correctly:
the $ROOTSYS
environment variable
should point to the top-level ROOT directory, $ROOTSYS/bin
should be
in your $PATH
, and $ROOTSYS/lib
should be in
your library path ($LD_LIBRARY_PATH
on most Unix systems).
This can be achieved using ROOT's
thisroot.sh
(or thisroot.csh
) setup script.
Eg. to use the CERN CVMFS ROOT version 6.18/02
on 64-bit CentOS7 from a bash shell:
. /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.18.02/x86_64-centos7-gcc48-opt/bin/thisroot.sh
For further details, consult the ROOT "Getting Started" documentation, or your local system administrator.
Check out RooUnfold
git clone https://gitlab.cern.ch/RooUnfold/RooUnfold.git
cd RooUnfold
Use GNU make
. Just type
make
to build the RooUnfold shared library.
Alternatively, you can use cmake
mkdir build
cd build
cmake ..
make -j4
In an interactive ROOT shell, RooUnfoldExample.cxx
makes a simple test of RooUnfold.
% root
root [0] .L examples/RooUnfoldExample.cxx
root [1] RooUnfoldExample()
You can also use python
% python examples/RooUnfoldExample.py
The example programs can also be run from the shell command line.
More involved tests, allowing different toy MC PDFs to be used for training
and testing, can be found in examples/RooUnfoldTest.cxx
(which uses test class RooUnfoldTestHarness
).
% make bin
% ./RooUnfoldTest
The output appears in RooUnfoldTest.ps
.
You can specify parameters for RooUnfoldTest
(either as an argument to the routine,
or as parameters to the program), eg.
% root
root [2] RooUnfoldTest("method=2 ftestx=3")
or
% ./RooUnfoldTest method=2 ftestx=3
Use RooUnfoldTest -h
or RooUnfoldTest("-h")
to list all the parameters and their defaults.
method
specifies the unfolding algorithm to use:
0 no unfolding (output copied from measured input)
1 Bayes
2 SVD
3 bin-by-bin
4 TUnfold
5 matrix inversion
ftrainx
and ftestx
specify training and test PDFs:
0 flat distribution
1 top-hat distribution over mean ± 3 x width
2 Gaussian
3 Double-sided exponential decay
4 Breit-Wigner
5 Double Breit-Wigner, peaking at mean<tt>-</tt>width and mean<tt>+</tt>width
6 exponential
7 Gaussian resonance on an exponential background
The centre and width scale of these signal PDFs can be specified with the mtrainx
and wtrainx
(and mtestx
and wtestx
) parameters respectively.
A flat background of fraction btrainx
(and btestx
) is added.
Detector effects are modelled with a variable shift
(xbias
in the centre, dropping to 0 near the edges),
a smearing of xsmear
bins,
as well as a variable efficiency (between effxlo
at xlo
and effxhi
at xhi
).
For 2D and 3D examples look at RooUnfoldTest2D
and RooUnfoldTest3D
.
The test programs, examples/RooUnfoldTest.cxx
, examples/RooUnfoldTest2D.cxx
, and examples/RooUnfoldTest3D.cxx
use
RooFit to generate the toy distributions.
(RooFit is not required to use the RooUnfold classes from another program,
eg. examples/RooUnfoldExample.cxx
).
Hand-coded alternatives are provided if ROOT was not build with RooFit enabled
(eg. --enable-roofit
not specified).
This version generates peaked signal events over their full
range, so may have a fewer events within the range than requested.
To disable the use of RooFit, #define NOROOFIT
before loading RooUnfoldTest*.cxx
root [0] #define NOROOFIT 1
root [1] .x examples/RooUnfoldTest.cxx
For the stand-alone case, use
% make bin NOROOFIT=1
to build (this is the default if RooFit is not available).
- RooUnfoldSvd does not work correctly for multi-dimensional distributions (gives a warning).
- Using the histogram under/overflow bins (
UseOverflow()
setting) only works for 1D histograms.
- Tim Adye (RAL, T.J.Adye@rl.ac.uk) is the principal RooUnfold developer and contact.
- Carsten Burgard (NIKHEF) migrated the RooUnfold package to GitLab and added CMake build.
- Richard Claridge (RAL) implemented the TUnfold interface, matrix inversion, and bin-by-bin algorithms as well as the error and parameter analysis frameworks.
- Chris Meyer (Indiana) implemented the IDS algorithm.
- Kerstin Tackmann (LBNL) wrote the SVD algorithm for the unfolding of the hadronic mass spectrum in B→Xulν.
- Fergus Wilson (RAL) wrote an initial implementation of the iterative Bayesian algorithm.
- T. Adye, RooUnfold: unfolding framework and algorithms, presentations to the Oxford and RAL ATLAS Groups, May 2008. This includes a brief introduction to unfolding but the description of RooUnfold is now dated (see below for a more recent description).
- G. Cowan, A Survey of Unfolding Methods for Particle Physics, Proc. Advanced Statistical Techniques in Particle Physics, Durham (2002).
- G. Cowan, Statistical Data Analysis, Oxford University Press (1998), Chapter 11: Unfolding
- R. Barlow, SLUO Lectures on Numerical Methods in HEP (2000), Lecture 9: Unfolding
- T. Adye, Unfolding algorithms and tests using RooUnfold, write-up of a
presentatation at the
PHYSTAT 2011 workshop on unfolding and deconvolution, CERN (January 2011).
Previously presented at the Alliance Workshop on Unfolding and Data Correction, DESY (May 2010). - A. Höcker and V. Kartvelishvili, SVD Approach to Data Unfolding, NIM A 372 (1996) 469.
- BaBar Collaboration, B. Aubert et al., Study of b→ulν Decays on the Recoil of Fully Reconstructed B Mesons and Determination of |Vub|, hep-ex/0408068, contribution to the 32nd International Conference on High Energy Physics (ICHEP'04, Beijing 2004).
- G. D'Agostini, A multidimensional unfolding method based on Bayes theorem, NIM A 362 (1995) 487.
- a. V. Blobel, Unfolding Methods in High Energy Physics Experiments,
DESY 84-118 (1984) and Proc. 8th CERN School of Computing (Aiguablava, Spain, 1984), CERN 85-09.
b. V. Blobel, An unfolding method for high energy physics experiments, Proc. Advanced Statistical Techniques in Particle Physics, Durham (2002). - Alliance Workshop on Unfolding and Data Correction, DESY (May 2010).
- PHYSTAT 2011 workshop on unfolding and deconvolution, CERN (January 2011).
RooUnfold is currently being maintained by Tim Adye, Carsten Burgard, Lydia Brenner, and Pim Verschuuren. The latest version can always be found at the CERN GitLab page.