Selection:

Separating Signal and Background - Optimally


High-energy physicists, whether making a precise measurement of a known quantity or searching for hypothetical new particles, select from the sample of events recorded those with characteristics resembling those of the desired signal, while rejecting as many non-signal events as possible.  This section reviews briefly some of the main methods used in this process and offers some general recommendations and references for deeper study, while decoding some of the jargon for novices.

 

Cut methods

Historically, physicists have used simple "rectangular" cuts to select events of interest.  Rectangular here means that one demands that certain measured quantities in the event lie in well-defined ranges which do not vary with other quantities in the event.  Students are taught at the beginning of their careers to study the distribution of these quantities before placing cuts so as to understand the effect of the cuts.  This is good advice, as one's preconceived notions of the ranges and distribution of typical event quantities can differ markedly from those in the actual data.

Usually event selection proceeds in stages.  In the first, the event is required to have the "topology" of interest: it must have the requisite number of various final state "objects" such as electrons, jets, etc. meeting certain angular and energy criteria.  These objects are usually identified in the primary reconstruction phase, using (you guessed it) "object identification cuts". Parts of both the object identification and topology selection can in fact occur in the online trigger. Then, typically one applies "kinematic cuts" to refine the selection and reject background events for the final analysis.

This now-classic approach to high-energy physics analysis has the advantage of being readily grasped and easily described.  Its failings lie in the facts that a) making rectangular cuts can be an arbitrary procedure, and the criteria for setting cut values an inexact procedure, and b) the final criteria most probably do not make optimal use of the information available in the event.  In even a very simple analysis in CDF, for example, there can be tens of cuts applied.  Thinking about this problem in a multivariate sense, it is clear that the space of all possible cuts on all event quantities (even restricting attention to those on which cuts are placed) is enormous.  There can be hidden correlations, which can sometimes be found by making two-dimensional distribution plots of pairs of variables.

If one chooses to perform a "simple" cut based analysis, it is very important to keep a few rules in mind:

 

Likelihood/probability density estimation

Given the drawbacks of cut-based methods, one may wish to seek from the beginning an inherently multivariate approach to separating signal from background, taking advantage of as much of the existing information as possible.  Though likelihood-based methods vary, all attempt to quantify how likely (or relatively likely) it is that a particular event is from the desired signal source given an array of event quantities.  Typically one calculates for each event a single quantity which is then either used as a single-variable discriminant or as the variable in which one performs a spectral fit to signal and background contributions.

The most straightforward approach in these methods is to first define, and estimate, the probability density in the n-dimensional space of event quantities.  This can be calculated numericallly using a histogram-type approach, simply counting the number of events which populate each of the n-dimensional "bins" in the input space.  As is the case with the cut methods described above, the choice of input variables is limited only by the imagination and cleverness of the physicist.  With a multivariate, presumably automated approach, in addition, there is little incentive to limit the total number of variables if they might add useful information.

For some problems one may wish to simply use as the discriminant/fit variable a (log) likelihood proportional to the probability density for the event having come from the signal source.  However, it is not obvious that simply because an event is unlikely to be from the signal source that it is more likely to be from the background sources!  Furthermore, very interesting events with rare kinematics might have quite small likelihood.  Thus, one may benefit from using a likelihood ratio (LR), which compares the likelihood for an event being signal with the likelihood of being background, again employing the probability density technique.  

The clear problem with the LR, however, is that for regions of the n-dimensional input space which are sparsely populated, the LR is difficult to estimate accurately, numerically, without inordinate amounts of computer time.  Worse, the LR in these low-probability bins may be the same or similar to the LR in the high-probabilty regions in the input space!  

[Still under construction - not sure what to conclude/recommend here!]

 

Neural networks

In the past decade neural network methods have become one of the more favored techniques for separating signal from background samples. In the computer science literature many variations of the neural network approach exist, but in high energy physics the main type used is called a "feed-forward multilayer perceptron", which is "trained" using a "backpropagation" algorithm. Despite the intimidating (or skepticism-arousing) names and jargon, a feed-forward network can be thought of as a single-valued function of an array (or vector) of input values. The function (the net) has many parameters, called weights and thresholds, the values of which determine the output for a given input vector (event quantities). Usually the output ranges from 0 to 1 continuously. "Training the network" is in fact a function minimization procedure; backpropagation is in essence a gradient descent, starting with random weights and thresholds. The aim is to reduce the "error function" which is essentially a chi-square-like quantity, the sum of the squared deviations of the neural network output from the desired output for signal (usually 1) and background (usually 0).  Then the trained network with its optimised weights and thresholds is used with real events, and the net output for each event is the statistic on which a decision about signal selection is made.

It is in fact rather simple to write one's own neural net software, but many such packages exist. Perhaps the best known and most widely used is JETNET. In CDF, the OSU group has written a handy ROOT interface to this package. The JETNET website also has many references to neural network papers, which are an excellent introduction to the subject.

There are many common questions which arise in conjunction with using neural network techniques. "What is the network actually doing?" "Can I tell what the individual network nodes are selecting?" "Does the network approach give me some new, hidden source of systematic error?" "Do I actually benefit from using a neural net rather than cuts?" A few answers follow:

The above having been said, there are a number of tricks and tips to follow when using neural networks, collected here:

 

A Note on Optimization

 In designing an analysis, and determining the selection method and cuts, it is important to consider the ultimate goal, and optimize accordingly.  One should be careful in determining the proper measure for optimization: contrary to popular opinion, for example, if one is searching for a new particle, if the expected signal rate is in the low statistics Poisson range, then cuts which maximize S/sqrt(B) for example do not result in the most sensitive limit in the absence of a signal!  

For any analysis it is important to model the expected limit or signal significance (without using the actual data, of course!) and optimize the final cut(s)  or fit method to obtain the best possible expected result.  In the presence of large systematic errors, though, there can be difficult-to-estimate tradeoffs...


John Conway
Last modified: February 2002