Setting Limits

1) There are many recipes for calculating limits. In many cases it is not a question of one method being correct, and the others wrong. It is crucial, however, to be aware of the properties of one's method; and also to be very explicit in the paper about the technique used. However, it is useful if at all possible to use a similar method to other measurements/limits for the same quantity, so that the different results can be meaningfully compared.

2) It may be desirable for a CDF Physics group to adopt the same procedure for all their limit calculations. We would recommend that it is submitted to the CDF Statistics Committee for approval before being used.

3) If a Bayesian method is used, it is important to test the sensitivity of the answer to other reasonable priors. For example, in a measurement of the CP-violating parameter sin(2*beta), if a prior that is flat in beta had been used, then maybe another that is flat in sin(2*beta) should also be used. Similarly for a determination of the neutrino mass squared, priors that are flat in m, m^2, ln(m), etc all seem reasonable.

4) Improper priors (i.e. those which have divergent integrals) can give problems, especially in several dimensions. An example is the calculation of a cross-section upper limit in the presence of an uncertainty on the detector efficiency. If the cross-section prior is taken as constant and a Gaussian truncated at zero is used for the acceptance, the integral of the posterior probability diverges (see notes by John and Luc). Such divergencies may not be apparent if calculations are performed numerically. It is essential to avoid such problems. Analytic calculations can help detect these effects. Proper priors do not cause such problems.

5) In a Bayesian approach, it is useful to quote the frequentist coverage of the procedure. This is the fraction of MC repetitions of the experiment in which the estimated interval for the parameter includes the `true' value used in the simulation. Similarly a Frequentist should quote the sensitivity of the experiment (see appendix) and/or the Bayesian credibility of the interval obtained, especially when this is anomolously small, because the observed data is smaller than that expected from background alone.

6) It is important to decide on the details of the technique to be used BEFORE the data is analysed. This is inevitable if a blind analysis (see Paul Harrison's talk at Durham) is being performed. It is also necessary to specify all the details if a frequentist coverage calculation is to be performed.

7) In a Publication, it is possible to use a Frequentist method for extracting the result of an experiment; and a Bayesian one in any discussion of the implications of the experiment in the Conclusions.

8) In both Bayesian and standard Frequentist methods, there is freedom in the choice among the many ranges with the same credibilty or coverage. It is thus important to specify which was chosen. Furthermore there can be ambiguity about whether a Frequentist upper limit resulted because the method (a) was chosen so as always to yield an upper limit; or (b) usually gave 2-sided intervals, but the lower limit happened to be zero. For the same data, these limits would (as expected) differ.

9) To provide more information than just the limit, it is recommended to give the likelihood function. However it should be realised that (a) this can be non-trivial when there are several parameters; (b) the incorporation of systematic errors in likelihood usually is complicated ; and (c) a Frequentist calculation requires more than just the likelihood function. [The Neyman construction requires the pdf for the data for all values of the parameter(s). The likelihood involves the same function of the same variables, but only for the observed data values.]

10) Feldman and Cousins is a variant of the Frequentist method that aims (a) to avoid empty intervals, when the observed result is smaller than expected [however, see Punzi and Giunti]; (b) to provide a unified method of extracting two-sided intervals when the sought-for effect is clear, and upper limits otherwise, therby avoiding the problem of `flip-flopping'; and (c) to reduce the coupling between the confidence interval size and the goodness of fit. Users of the Feldman-Cousins approach should also be aware that, because of (a), both upper and lower limits tend to be larger than in the conventional Frequentist approaches.

11) The CL_s method [See Alex Read's talk in CERN CLW Proceedings CERN 2000-005] used in the CERN Higgs search is designed to protect against excluding the Higgs when an experiment has no sensitivity to whether or not the Higgs is produced; this could happen because of a statistical fluctuation. The method is useful in comparing two precisely specified hypotheses e.g. SM without a light Higgs, and SM with Higgs of mass m_H (but m_H could be varied). Because of the way CL_s is constructed, it is conservative, in that it overcovers.

12) It is usually quite complicated to incorporate the effect of systematics in a fully frequentist manner. This implies performing the Neyman construction using as parameters the physical one(s) and also the nuisance one(s); deriving the acceptance region for the observed measurement values; and projecting this onto the axes for the physics parameters. In the first stage, the increase in the dimensionality of the parameter space complicates the ordering algorithm for constructing the confidence belt. However, if this method is possible, that is what we would recommend for frequentist calculations.

13) There are several methods that approximate the full frequentist approach for nuisance parameters. The main ones are profile methods (see note by Giovanni), or simply using the likelihood contours in all the parameters, and then projecting onto the physics axes.

14) It is generally easier to incorporate systematics by Bayesian methods. The nuisance parameters are eliminated by integrating over them ('marginalisation').

15) There are different views about the importance of using the same (Bayesian or Frequentist) technique for the main extraction of the limit and the incorporation of systematics. If we insisted on the same technique, this would result in almost all limits being calculated in a Bayesian way. However, since the mixed method almost always overcovers (Bob Cousins, private communication), it can be regarded as a procedure which has reasonable frequentist properties.

RECOMMENDED READING

Bob Cousins `Why isn't every Physicist a Bayesian?' This gives some clear examples of how to calculate limits in simple cases using Bayesian, Frequentist and Likelihood techniques.

APPENDIX: SENSITIVITY

The sensitivity of an experiment is a measure of the result that the experiment is expected to give for the limit on a parameter in the absence of signal. It is defined as the median limit expected from repetitions of the experiment. (The median is better than the mean, because it is invariant with respect to transformations of the parameter.) It describes the accuracy of the experiment, and is independent of the actual data. The latter could have fluctuated upwards (or maybe there is even a small signal present) and give a poor limit; or it could fluctuate downwards below what is expected from background, and hence give a better than expected limit. The sensitivity is impervious to such effects. The sensitivity is especially useful when the actual data fluctuates downwards, and the confidence range is either very small, or maybe even empty. The sensitivity then gives a better way of assessing the capability of the experiment.

To compare two experiments, their relative sensitivities are more useful than their actual limits. The ensemble of MC `experiments' give the expected limit distribution in the absence of signal. The median provides a one-parameter view of this distribution, but clearly does not convey the complete picture. Giving other percentiles may also be useful. A positive feature of the Feldman-Cousins method is that the decision of whether to quote an upper limit or a two-sided range is removed from the Physicist, and decided by the method itself. This however causes a weakening of the sensitivity calculation, where only the upper limits (whether they are of one-sided or of two-sided intervals) are considered. When two (or more) physics parameters are being estimated, the upper limit confidence region will be a curve [e.g. in sin^2(2*theta) versus delta(m^2) in a search for neutrino oscillations], which will vary for each MC repetition of the experiment. Calculating the median limit then is not defined.



Louis Lyons

Last modified: May 2003