GoodnessofFit
Talking about goodness is easy; achieving it
is difficult. Chinese
proverb
A maximumlikelihood
fit to data provides estimates for parameters (and an error
matrix). Inserting the estimates into the likelihood function
yields a distribution that we hope will model the data reasonably
well. How well the data are modeled by that distribution is known
as goodnessoffit. In general, extra work is needed to
obtain a quantitative measure of the goodnessoffit, and there
is no universally accepted mathematical definition which is valid
in all cases.
DataPoints with Gaussian
Uncertainties
If the data are represented by discrete
points with Gaussian uncertainties, the chisquare test can be
used to measure how well the fit matches the data.
PoissonDistributed Data
If the data are represented by (integer)
numbers of events in discrete bins, Poisson
statistics rule. Pearson's chisquare test and the
likelihoodratio test are two well established methods of dealing
with this case. These tests work best when the expected number of
events in a bin (µ) is large:
CDF note 5718 shows what
happens at small µ.
 Pearson's ChiSquare Test:

Here we attach Pearson's name to the
chisquare to distinguish it from the case with Gaussian
errors (not a universal practice). The following references
emphasize the Poisson case:
 Likelihood Ratio of the Poisson
Distribution:

The logarithm of the Poisson likelihood
can be taken as the sum of contributions of the form
loglikelihood(µ;n) =
n ln µ 
µ
where n
represents the observed number of events in a bin and
µ is the predicted number for that bin (see
Statistical
Data Analysis §6.10). Considered as a function of
µ, the log likelihood is maximized when
µ=n, achieving the value
loglikelihood(n;n) = n ln
n  n
Taking the
difference, the log likelihood ratio is defined as
loglikelihoodratio(µ;n)
= n ln(µ/n) + n 
µ
The
quantity
2
loglikelihoodratio(µ;n) =
2[(µ  n) + n
ln(n/µ)]
asymptotically approaches
(nµ)²/µ (Pearson's
chisquare) as µ becomes large, and has been
suggested as an alternative to Pearson's chisquare in
testing for goodnessoffit. In this formula, the case
n=0 uses 0ln(0)=0.
 This test is mentioned in the
Particle Data Group's
Statistics (2004) Review in
§32.1.2 (the method of maximum likelihood) and
§32.2.2 (goodnessoffit tests).
 A more complete discussion is
presented in S. Baker and R. D. Cousins, Nucl. Instrum.
Methods 221, 437 (1984).
Unbinned Data
 KolmogorovSmirnov Test:

The KolmogorovSmirnov test is designed
for the case of a onedimensional continuous
distribution.
 Bin the Data
 After performing an unbinned maximum
likelihood fit, for example, to obtain goodnessoffit one can
simply bin the data and apply the methods designed for Poisson
distributed data. This may be necessary in the case of multiple
dimensions, where KS is not available. However, binning the
data is not a trivial panacea: On one hand, binning too
coarsely destroys valuable information. On the other hand,
binning too finely leads to difficulties in the interpretation
of the goodnessoffit statistic as its distribution deviates
further from the asymptotic chisquare distribution. In
general, one will need to merge bins, or calculate (often using
Monte Carlo methods) the nonasymptotic behavior for the bins
with small contents.
Warnings, Fallacies, Traps,
...
If you think it's simple, then you have
misunderstood the problem. Bjarne Stroustrup
 A bad fit does not necessarily produce
large errors.

Goodnessoffit and parameter estimation
are, in general, separate issues. Specifically:
 Not every test statistic mentioned
above is recommended for estimating parameters.
 Unbinned maximum likelihood fits,
while recommended wholeheartedly for parameter estimation,
do not in general provide goodnessoffit information.
CDF note 5639 explains in detail why this is so.
 The problems in the Poisson tests
that show up at small µ (
CDF note 5718) are specific to issues of goodnessoffit: the
loglikelihoodratio, in particular, is believed to
be safe to use for parameter estimation at small
µ. In fact, minimizing the
loglikelihoodratio to obtain parameter estimates
is mathematically equivalent to the maximumlikelihood
method as normally applied to Poisson data.
 The chisquare test does not check that
the uncertainties are Gaussian; it assumes that they are
Gaussian. Similarly, the Poissonrelated tests assume the data
is Poissondistributed. If these assumptions are significantly
violated, the tests may not be valid, and the tests won't
necessarily tell you when they are not valid.
 In some cases, having computed the
goodnessoffit statistic, one may need to do Monte Carlo work
to calculate its significance (e.g. to estimate the probability
that a random data sample selected from the distribution in
question will produce a goodnessoffit statistic this large or
larger).
 Rather than quoting a single number
representing the overall goodnessoffit, it may be more useful
to divide the data into regions (e.g. signalregion,
masssideband region, negativelifetime region, ...) and
compute separate goodnessoffit statistics for each region.
Otherwise, if one region dominates the overall goodnessoffit,
it may hide problems present in other regions.
 Having the goodnessoffit come out too
good (e.g. you ran 1000 toy Monte Carlo fits and none of them
were as good as your fit to the real data) usually means that
something is wrong.
 Computing goodnessoffit does not
substitute for plotting the data, and vice versa.
Joel Heinrich
Last modified: April 1, 2004