Search for Electroweak Single-Top-Quark Production using Neural Networks with 955 pb-1 of CDF II data

Matthias Bühler, Jan Lück, Thomas Müller, Svenja Richter, Wolfgang Wagner

Universität Karlsruhe

 


Abstract
Results
Common Event Selection
Fit to b tag Neural Network
Common Neural Network Input Variables
Templates for Combined Search Templates for Separate Search
Common Systematic Uncertainties
Common Likelihood Function
Expected Sensitivity and Significance for Combined Search Expected Sensitivity and Significance for Separate Search
Binned Likelihood Fit to Data for Combined Search Binned Likelihood Fit to Data for Separate Search
Public Note (CDF Note 8677) .pdf
 

To download a plot in .eps format, left-click on the plot.

To view a plot with full resolution in .gif format , right-click and select "View Image."

 

Abstract

We report on a search for electroweak single-top-quark production with CDF II data corresponding to 955 pb-1 of integrated luminosity. We apply neural networks to construct discriminants that distinguish between single-top and background events. Two analyses are performed, assuming a top quark mass of 175 GeV/c2. In the first one we combine t- and s-channel events to one single-top signal under the assumption that the ratio of the two processes is given by the standard model (SM). Using ensemble tests, we determine that we expect with a probability of 50% to see a single-top signal that is larger than a 2.6 &sigma fluctuation of the background (p-value of 0.5%). A binned likelihood fit to the data yields no evidence for single-top. The observed p-value is 54.6% and indicates that the data are compatible with the background hypothesis only. A combined single-top cross section above 2.6 pb is excluded at the 95% confidence level.

In the second analysis we separate the two single-top production modes. A binned likelihood fit to a two-dimensional distribution of two neural network outputs yields most probable values for the cross sections of 0.2-0.2+1.1 pb for the t-channel and 0.7-0.7+1.5 pb for the s-channel. The separate search analysis features an expected p-value of 0.4% (2.7 &sigma). The observed p-value, i.e. the probability for the data to be due to a background fluctuation only, is found to be 21.9%.

 

Results
Combined s- and t-channel Search Separate s- and t-channel Search
Posterior probability density for the combined search

 

The likelihood fit estimate for t- and s-channel cross section measurement. The contours of the 1&sigma uncertainty and the 95% C.L. are valid for both channels simultaneously. The error bars represent the 1&sigma uncertainty and the 95% C.L. of the given channel without assumptions on the other channel. 
 

 

For the combined search a binned likelihood fit to the data yields no evidence for single-top. For the separate search the observed t- and s-channel cross sections are:

 

 

Summary of expected and observed upper limits at the 95% confidence level for the combined and the separate search.

 

Common Event Selection

The CDF event selection exploits the kinematic features of the signal final state, which contains a top quark, a bottom quark, and possibly additional light quark jets. To reduce multijet backgrounds, the W originating from the top quark is required to have decayed leptonically. One therefore demands a single high-energy electron or muon (ET(e) > 20 GeV, or PT(μ) > 20 GeV/c) and large missing transverse energy (MET) from the undetected neutrino MET > 25 GeV.

The backgrounds belong to the following categories: Wbb, Wcc, Wc, mistags (light quarks misidentified as heavy flavor jets), top pair production tt events (one lepton or two jets are lost due to detector acceptance), non-W (QCD multijet events where a jet is erroneously identified as a lepton), Z→ll and diboson WW, WZ, and ZZ. We remove a large fraction of the backgrounds by demanding exactly two jets with ET > 15 GeV and |η| < 2.8 be present in the event. At least one of these two jets has to be tagged as a b quark jet by using displaced vertex information from the silicon vertex detector (SVX). The non-W content of the selected electron dataset is further reduced by several requirements to the angle between the MET vector and the transverse momentum vector of the jets. The numbers of expected and observed events are listed in the tables below.

 

Fit to b tag Neural Network

To cross-check the background estimate, we perform a fit to the output of a neural network b tagger. The network tagger is applied to jets that are already tagged by the secondary vertex tagger. In case of double-tagged events the leading b jet (highest in ET) is included in this distribution. The network output is quite characteristic, not only for b jets, but also for charm and light jets. The tagger thereby allows to determine the flavor composition of our data sample. 

We create templates of the neural network output distributions for b, c and light jets using simulated events. Those templates are fitted to the W+jets data output distributions in the 1, 2 and 3 jets bin. The results of the fits are displayed in the figures at the left. The upper plot shows the fit to the 1 jet bin, the middle one the fit to the 2 jet bin and the lower one the fit to the 3 jet bin. The b templates are displayed in red, the charm ones in blue and the light ones in green. The sum of the fitted templates is shown in black with a yellow error band. The black points are the data distribution. For all three cases, the fitted distributions describe the data very well.

 

 

Common Neural Network Input Variables
Using neural networks 26 kinematic or event shape variables are combined to a powerful discriminant. One of the variables is the output of a neural net b tagger. The neural net b tagger gives an additional handle to reduce the large background components where no real b quarks are contained, mistags and charm-backgrounds. Both of them amount to about 50% in the W+2 jets data sample even after imposing the requirement that one jet is identified by the secondary vertex tagger of CDF.

MC distributions:

 the neural network output of the b tagger for for the first b tagged jet

 

data - MC comparison:

 the neural network output of the b tagger for for the first b tagged jet

 

MC distributions:

 the reconstructed top mass

 

data - MC comparison:

 the reconstructed top mass

 

MC distributions:

 the charge of the lepton times pseudorapidity of the leading light jet

 

data - MC comparison:

 the charge of the lepton times pseudorapidity of the leading light jet

 

MC distributions:

the invariant mass of the two leading jets

 

data - MC comparison:

 the invariant mass of the two leading jets

 

MC distributions:

 the cosine of the angle &Thetal,q where &Thetal,q is reconstructed by determining the angle between the tight lepton and the beam axis in the top rest frame

data - MC comparison:

 the cosine of the angle &Thetal,q where &Thetal,q is reconstructed by determining the angle between the tight lepton and the beam axis in the top rest frame

 

Templates for Combined Search
In principle, it would be possible to create separate templates for each of the expected physical processes. Since it is difficult for a likelihood fit to distinguish between distributions that are not very distinct, it is more practical to combine similar shapes into one discriminant. Comparing the neural network outputs for the simulated non-top backgrounds it can be seen that some of the distributions are very similar. Therefore, three non-top templates are created: Wbb and WZ build the so-called b-like background template. The c-like template consists of Wcc, Wc, WW, and mistagged light events. The non-W background is combined with Z→ee, Z→μμ, Z→ττ, and ZZ. Together with the tt template and the single-top processes, we get five templates for the combined search which are displayed below.

MC distributions:

 s- and t-channel are combined at SM ratio to the single-top template

MC distributions:

 The five templates for the combined search: single-top, tt, charm-like, bottom-like and non-W

 

Templates for Separate Search
For the separate search we use two independent neural networks, one trained for s-channel and the other one for t-channel, which provide the opportunity to search for both channels simultaneously. The creation of the templates for signal and background processes is made in a similar way as in the combined search, even though it is done in 2D for both network outputs simultaneously. For the separate search, the non-top backgrounds are combined likewise. Together with tt and two single-top signal templates, we use six discriminants which are shown below.

MC distributions:

 2D template of the t-channel signal

 

MC distributions:

 2D template of the s-channel signal

 

MC distributions:

 2D template of the tt background

 

MC distributions:

 2D template of the b-like background

 

MC distributions:

 2D template of the c-like background

MC distributions:

 2D template of the non-W background

 

Common Systematic Uncertainties
Systematic uncertainties can cause a shift in the event detection efficiency for events of different physics processes, but can also cause a change in the shape of the template distributions. The rate uncertainties are summarized in the table below. Ten sources of systematic shape uncertainties are considered: the jet energy scale (JES), initial state gluon radiation (ISR), final state gluon radiation (FSR), parton distribution functions (PDFs), neural net b tagger, the factorization and renormalization scale for W + heavy flavor processes, the modeling of the W + heavy flavor samples, the modeling of mistag events, the flavor composition and modeling of non-W events. The shape uncertainties are determined by altering the respective effects within their uncertainties. In this way two shifted distributions are obtained for first five sources (see three examples below), one plus and one minus distribution. For the last five systematic sources one alternative model is considered. Therefore, only one systematic shape is obtained for theses effects.

Table of systematic rate uncertainties

 

single-top MC distribution:

 the systematic shape uncertainty due to the JES

 

c-like MC distribution:

 the systematic shape uncertainty due to the neural net b tagger

tt MC distribution:

 the systematic shape uncertainty due to the ISR

 

Common Likelihood Function
The likelihood function consists of Poisson terms for the individual bins of the fitted histograms, Gaussian constraints to the background rates, and Gaussian constraints to the strengths of systematic effects.

Systematic uncertainties are included as factors modifying the expectation value &muk of events in a certain bin k.

The index j runs over the different physics processes that occur in the likelihood function. The cross section of process j is &sigmaj. In the likelihood function we use the parameter &betaj, which is the cross section normalized to its standard model prediction. The event detection efficiency of process j is named &nuj. The normalized content of bin k of the template histogram for process j is &alphajk. We consider five effects which cause systematic uncertainties in acceptance. Ten sources of uncertainties in the template shape are taken into account. The sources of systematic uncertainties are indexed with i. The relative acceptance uncertainties due to these sources are named &epsilonji. The relative uncertainties in the bin content of bin k of the template histograms are called &kappajik. The variation in strength of a systematic effect i is measured with the variable &deltai.

 

Expected Sensitivity and Significance for Combined Search
We use ensemble tests to compute the sensitivity of our analysis. An ensemble test consists of a set of pseudo experiments. For each pseudo experiment we determine first the number of events Nj of each process by drawing a random number from a Poisson distribution with a mean &muj. In a second step we draw random numbers from the template distributions of the neural network output. We perform two ensemble tests: one with single-top events included at the predicted standard model rate (see below at the left hand side), one without any single-top events (below right hand side). The main results of the ensemble tests are shown below, where the most probable values for the rates of the different processes, i.e. the central values obtained from the likelihood fit for each pseudo experiment, are given in units of the expected rates. We define the RMS of single-top distribution as the expected uncertainty for a potential measurement of the cross section. We find a value of 45%. This figure includes all systematic uncertainties.
Results of ensemble tests: distribution of expected single-top measurements. Left: single-top events are included in the pseudo experiments at the expected standard model rate. Right: pseudo experiments are done without single-top events

 

Distribution of expected upper limits. Left: single-top events are included at the expected standard model rate. Right: the pseudo experiments include no single-top events. We define the median of the distribution as the expected upper limit.

 

To compute the significance of a potentially observed signal, we perform a hypothesis test. Two hypotheses are considered. The first one, H0, assumes that the single-top cross section is zero (&beta1 = 0) and is called the null hypothesis. The second hypothesis, H1, assumes that the single-top production cross section is the one predicted by the standard model (&beta1 = 1). The objective of our analysis is to observe single-top, that means to reject the null hypothesis. The hypothesis test is based on the Q-value, Q= -2(ln Lred(&beta1=1) - ln Lred(&beta1=0)) , where Lred(&beta1=1) is the value of the reduced likelihood function at the standard model prediction and Lred(&beta1=0) is the value of the reduced likelihood function for a single-top cross section of zero. Using the two ensemble tests the distribution of Q-values is determined for the case with single-top included at the standard model rate, q1, and for the case of zero single-top cross section, q0. The two Q-value distributions are shown below. In order to quantify the probability for the null hypothesis to be correct we define the p-value, often also named 1-CLb. To quantify the sensitivity of our analysis we define the expected p-value pexp = p(Q1med) where Q1med is the median of the Q-value distribution q1 for the hypothesis H1. The meaning of pexp is the following: Under the assumption that H1 is correct one expects to observe pexp with a probability of 50%. We find pexp = 0.5%, including all systematic uncertainties. In other words, assuming the predicted single-top cross section, we expect, with a probability of 50%, to see at least that many single-top events that the observed excess over the background corresponds to a 2.6&sigma background fluctuation.

Distributions of Q-values for two ensemble tests, one with single-top events present at the expected standard model rate, one without any single-top events

 

Expected Sensitivity and Significance for Separate Search
As for the combined search, we perform two sets of ensemble tests for the separate search: one with single-top events included at the predicted standard model rate (see below at the left hand side), one without any single-top events (below right hand side). The main results of the ensemble tests are shown below, where the most probable values for the rates of the different processes, i.e. the central values obtained from the s- and t-channel likelihood fit for each pseudo experiment, are given in units of the expected rates. 
Pseudo experiment distribution with (left hand side) and without (right hand side) single-top at SM rate: t-channel (first row) and s-channel (second row) cross section measurement normalized to the SM prediction and the 2D distribution (third row). The expected t- and s-channel limits at 95% confidence level are &sigmat-ch95 exp. = 3.8 pb and &sigmas-ch95 exp. = 2.9 pb.

 

To obtain the significance of a potentially observed signal, we calculate the Q-value, as described above for the combined search. The two Q-value distributions (with and without SM single-top) are shown below. We find an expected p-value of 0.4% including all systematic uncertainties. Assuming the predicted single-top cross section, we expect, with a probability of 50%, to see at least that many single-top events that the observed excess over the background corresponds to a 2.7&sigma background fluctuation.

Distributions of the Q-values for ensemble test with and without single-top present at SM rate.

 

Binned Likelihood Fit to Data for Combined Search
In the signal region we expect 45.6±7.5 events, while we observe 31 events in data. The data are displayed below.
Data distribution of the neural network output in the signal region.

 

The likelihood fit to the entire NN output region yields a rate of zero single-top events.

Fit result versus data distribution. Left: in the entire NN output domain. Right: only the signal region. The normalization is the same in both histograms. Since the single-top contribution is zero, it is omitted in these histograms.

 

The resulting upper limit on the combined cross section is 2.6 pb at the 95% confidence level. The posterior probability density is shown below.

Posterior probability density for the combined search using a neural network.

 

The observed Q-value is 9.13 which yields a p-value of 54.6%. The means that the observed data are well compatible with being a background fluctuation. In the figure below we compare the observed Q-value to the expectation. The corresponding CLsb value is 0.64%, that is the probability to observe this little single-top or less under the assumption of the predicted single-top cross section.

Comparison of the observed Q-value to the expected distribution, (1) if single-top is present at the standard model rate, (2) if single-top is entirely absent.

 

Binned Likelihood Fit to Data for Separate Search
As described above for the combined search we apply a maximum likelihood fit to the network 2D output. The only difference in the likelihood function used is the generalization for two dimensions. The corresponding likelihood fit estimate of the cross sections at 0.2-0.2+1.1 pb for the t-channel and 0.7-0.7+1.5 pb for the s-channel is shown below. At the 95% confidence level the resulting upper limits on the t- and s-channel cross sections are 2.6 pb and 3.7 pb, respectively.
The likelihood fit estimate for t- and s-channel cross section measurement. The contours of the 1&sigma uncertainty and the 95% C.L. are valid for both channels simultaneously. The error bars represent the 1&sigma uncertainty and the 95% C.L. of the given channel without assumptions on the other channel.

 

The observed Q-value of 2.94 yields a p-value of 21.9%. The figure below compares the observed Q-value to the expectation.

Comparison of the observed Q-value for the 2D likelihood to the expected distribution, (1) if single-top is present at the standard model rate, (2) if single-top is entirely absent.

 

Our single-top results were approved (blessed) by CDF on Thursday 12/14/2006 and Thursday 01/18/2006.