Search for Electroweak Single Top-Quark Production using Neural Networks with 2.2 fb-1 of CDF II data

Thorsten Chwalek, Dominic Hirschbühl, Jan Lück, Thomas Müller, Adonis Papaikonomou, Thomas Peiffer, Manuel Renz, Svenja Richter, Irja Schall, Jeannine Wagner-Kuhr, Wolfgang Wagner

KIT, Universität Karlsruhe

 


Abstract
Results
Event Selection
Neural Network Input Variables
Templates for Combined Search
Templates for Separate Search
Systematic Uncertainties
Expected Significance for Combined Search
Binned Likelihood Fit to Data for Combined Search
Binned Likelihood Fit to Data for Separate Search
Observed Significance for Combined Search
Variables in the high-output region of the Combined Search
Public Conference Note (pdf)
 

To download a plot in .eps format, left-click on the plot.

To view a plot with full resolution in .gif format , right-click and select "View Image."

 

Abstract

We report on a search for electroweak single top-quark production with CDF II data corresponding to 2.2 fb-1 of integrated luminosity. We apply neural networks to construct discriminants that distinguish between single top-quark and background events. Two analyses are performed, assuming a top-quark mass of 175 GeV/c2.

In the first one, we combine s- and t-channel events to one single top-quark signal under the assumption that the ratio of the two processes is given by the standard model (SM). The expected significance under the assumption of a SM cross-section is determined to be 4.4 σ (p-value of 0.00000529). A binned likelihood fit to the data measures a single top-quark production cross-section of 2.0-0.8+0.9 pb. The observed p-value is 0.00060790 which corresponds to a significance of 3.2 σ.

In the second analysis, we separate the two single top--quark production modes, namely s- and t-channel. A binned likelihood fit done simultanously to a two-dimensional and three one-dimensional distributions of neural network outputs yields most probable values for the cross sections of 1.6-0.9+0.8 pb for the s-channel and 0.8-0.6+0.7 pb for the t-channel production mode.

 

Results
Combined s- and t-channel Search Separate s- and t-channel Search
The sum of the NN Outputs of all four channels. Background and signal templates are normalized to the SM prediction.







The likelihood fit estimate for the simultaneous s- and t-channel cross section measurement. The contours of the 1σ, 2σ, and 3σ uncertainties are valid for both channels simultaneously. The error bars represent the 1σ, 2σ, and 3σ uncertainties of the given channel without any assumptions on the other channel.



For the combined search the observed single top-quark cross section is: For the separate search the observed s- and t-channel single top-quark cross sections are:

 

 

 

Event Selection

The CDF event selection exploits the kinematic features of the signal final state, which contains a top quark, a bottom quark, and possibly additional light quark jets. To reduce multijet backgrounds, the W boson originating from the top quark is required to decay leptonically. One therefore demands a single high-energetic electron or muon (ET(e) > 20 GeV, or PT(μ) > 20 GeV/c) and large missing transverse energy (MET) from the undetected neutrino MET > 25 GeV.

The backgrounds belong to the following categories: Wbb, Wcc, Wc, mistags (light quarks misidentified as heavy flavor jets), top pair production tt events (one lepton or two jets are lost due to detector acceptance), non-W (QCD multijet events where a jet is erroneously identified as a lepton), Z→ll and Diboson WW, WZ, and ZZ. We remove a large fraction of the backgrounds by demanding exactly two jets with ET > 20 GeV and |η| < 2.8 be present in the event. At least one of these two jets has to be tagged as a b-quark jet by using displaced vertex information from the silicon vertex detector (SVX). The non-W content of the selected electron dataset is further reduced by several requirements to MET, MET significance, transverse W boson mass, and several angles between the MET vector, lepton vectors and jet vectors. The numbers of expected and observed events are listed in the tables below.

 

Neural Network Input Variables
Using neural networks kinematic or event shape variables are combined to a powerful discriminant. In the combined search we use four different networks in our analysis, one for the 2jet1tag category, one for 2jet2tag events, one for 3jet1tag events, and one for 3jets2tags. For the separate search we include an additional network in the 2jet1tag category to build a 2D discriminant. This improves the apriori sensitivity for s-channel of about 15%.
One of the variables is the output of the KIT flavor separator. The KIT flavor separator gives an additional handle to reduce the large background components where no real b quarks are contained, mistags and charm-backgrounds. Both of them amount to about 50% in the W+2 jets data sample even after imposing the requirement that one jet is identified by the secondary vertex tagger of CDF. The following plots show the 14 variables for the 2jet1tag channel. The plots in the third column show the variables in the "zero-tag" sample (for cross-check).

Please find the plots for the variables of the other channels here: 2jet2tag, 3jet1tag, 3jet2tag.



MC distributions: the mass of the reconstructed top-quark data - MC comparison: the mass of the reconstructed top-quark data - MC comparison: the mass of the reconstructed top-quark
MC distributions: the neural network output of the KIT flavor separator for the b-tagged jet data - MC comparison: the neural network output of the KIT flavor separator for the b-tagged jet
MC distributions: the invariant mass of the two jets data - MC comparison: the invariant mass of the two jets data - MC comparison: the invariant mass of the two jets
MC distributions: the product of the lepton-charge and the pseudorapidity of the light quark jet data - MC comparison: the product of the lepton-charge and the pseudorapidity of the light quark jet data - MC comparison: the product of the lepton-charge and the pseudorapidity of the light quark jet
MC distributions: the transverse mass of the reconstructed top-quark data - MC comparison: the transverse mass of the reconstructed top-quark data - MC comparison: the transverse mass of the reconstructed top-quark
MC distributions: the cosine of the polar angle between the tight lepton and the light-quark jet in the top-quark rest-frame data - MC comparison: the cosine of the polar angle between the tight lepton and the light-quark jet in the top-quark rest-frame data - MC comparison: the cosine of the polar angle between the tight lepton and the light-quark jet in the top-quark rest-frame
MC distributions: the transverse energy of the light-quark jet data - MC comparison: the transverse energy of the light-quark jet data - MC comparison: the transverse energy of the light-quark jet
MC distributions: the cosine of the polar-angle between the charged lepton in the W-Boson rest-frame and the direction of the W-boson data - MC comparison: the cosine of the polar-angle between the charged lepton in the W-Boson rest-frame and the direction of the W-boson data - MC comparison: the cosine of the polar-angle between the charged lepton in the W-Boson rest-frame and the direction of the W-boson
MC distributions: the pseudorapidity of the reconstructed W boson data - MC comparison: the pseudorapidity of the reconstructed W boson data - MC comparison: the pseudorapidity of the reconstructed W boson
MC distributions: the transverse mass of the reconstructed W-boson data - MC comparison: the transverse mass of the reconstructed W-boson data - MC comparison: the transverse mass of the reconstructed W-boson
MC distributions: the sum of the pseudorapidities of the two jets data - MC comparison: the sum of the pseudorapidities of the two jets data - MC comparison: the sum of the pseudorapidities of the two jets
MC distributions: the transverse momentum of the charged lepton data - MC comparison: the transverse momentum of the charged lepton data - MC comparison: the transverse momentum of the charged lepton
MC distributions: the scalar sum of transverse energies data - MC comparison: the scalar sum of transverse energies data - MC comparison: the scalar sum of transverse energies
MC distributions: the cosine of the angle between the charged lepton in the W-boson rest-frame and the W-boson momentum in the top-quark rest-frame data - MC comparison: the cosine of the angle between the charged lepton in the W-boson rest-frame and the W-boson momentum in the top-quark rest-frame data - MC comparison: the cosine of the angle between the charged lepton in the W-boson rest-frame and the W-boson momentum in the top-quark rest-frame

 

Templates for Combined Search
We use four different neural networks, one for the 2jet1tag channel, one for the 2jet2tag channel, one for the 3jet1tag channel, and one for the 3jet2tag channel. Since this is a combined search, we have one fit template for single top-quark events, which is the combination of the template for s-channel and the template for t-channel single top-quark production according to the ratio of the cross-sections predicted by the SM.

Fit templates of the 2jet1tag channel. Fit templates of the 2jet2tag channel.

Fit templates of the 3jet1tag channel. Fit templates of the 3jet2tag neutral net.

 

Templates for Separate Search
For the separate search we use five neural networks, whereas in the most sensitive channel 2jet1tag two independent neural networks are combined to a 2D discriminant. Here one network is trained for s-channel and the other one for t-channel, which provides the following 2D templates to search for both channels simultaneously:

2D template of s-channel single top-quark production in the 2jet1tag channel 2D template of t-channel single top-quark production in the 2jet1tag channel

2D template of top pair production in the 2jet1tag channel 2D template of Wbb+Wcc production in the 2jet1tag channel
2D template of Wc production in the 2jet1tag channel 2D template of Wqq production in the 2jet1tag channel

2D template of Diboson production in the 2jet1tag channel 2D template of Z+jets production in the 2jet1tag channel

2D template of QCD multijet production in the 2jet1tag channel

The 2D neural network outputs get unwinded bin by bin to obtain one-dimensional templates to be fitted to data simultaneously with the templates of the networks in the three remaining channels:

Fit templates of the 2jet1tag channel. Fit templates of the 2jet2tag channel.

Fit templates of the 3jet1tag channel. Fit templates of the 3jet2tag channel.

 

Systematic Uncertainties
Systematic uncertainties can cause a shift in the event detection efficiency for events of different physics processes, but can also cause a change in the shape of the template distributions. The rate uncertainties for the four different channels are summarized in the tables. Below you find three examples of systematic shape uncertainties in the 2jet 1tag channel: jet energy scale (JES) for the single top-quark template, factorization and renormalization scale (Q2) for Wbb events, and modeling uncertainty on the KIT flavor separator output (KIT opt.).

Systematic rate uncertainties for the 2jet1tag channel. Systematic rate uncertainties for the 2jet2tag channel.
Systematic rate uncertainties for the 3jet1tag channel. Systematic rate uncertainties for the 3jet2tag channel.



The JES systematic uncertainty for the four different channels.


Systematic shape uncertainties in the 2jet 1tag channel: jet energy scale (JES) for the single top-quark template. Systematic shape uncertainties in the 2jet 1tag channel: factorization and renormalization scale (Q2) for Wbb events. Systematic shape uncertainties in the 2jet 1tag channel: modeling uncertainty on the KIT flavor separator output (KIT opt.).

 

Expected Significance for Combined Search
To compute the significance of a potentially observed signal, we perform a hypothesis test. Two hypotheses are considered. The first one, H0, assumes that the single-top cross section is zero (β1 = 0) and is called the null hypothesis. The second hypothesis, H1, assumes that the single-top production cross section is the one predicted by the standard model (β1 = 1). The objective of our analysis is to observe single-top, that means to reject the null hypothesis. The hypothesis test is based on the Q-value, Q= -2(ln Lred1=1) - ln Lred1=0)) , where Lred1=1) is the value of the reduced likelihood function at the standard model prediction and Lred1=0) is the value of the reduced likelihood function for a single-top cross section of zero. Using the two ensemble tests the distribution of Q-values is determined for the case with single-top included at the standard model rate, q1, and for the case of zero single-top cross section, q0. The two Q-value distributions are shown below. In order to quantify the probability for the null hypothesis to be correct we define the p-value, often also named 1-CLb. To quantify the sensitivity of our analysis we define the expected p-value pexp = p(Q1med) where Q1med is the median of the Q-value distribution q1 for the hypothesis H1. The meaning of pexp is the following: Under the assumption that H1 is correct one expects to observe pexp with a probability of 50%. We find pexp = 0.00000529, including all systematic uncertainties. In other words, assuming the predicted single-top cross section, we expect, with a probability of 50%, to see at least that many single-top events that the observed excess over the background corresponds to a 4.4σ background fluctuation.


Distributions of Q-values for two ensemble tests, one with single-top events present at the expected standard model rate, one without any single-top events. The expected significance under the assumption of a SM cross-section is determined to be 4.4 σ.

 

Binned Likelihood Fit to Data for Combined Search
Finally, the templates for all four networks are fitted simultaneously to the observed distributions using a binned likelihood function. The fit yields a single top-quark cross section of 2.0-0.8+0.9 pb. Below you find the distributions of observed data and MC normalized to the SM prediction (left-hand side) and MC normalized to the simultaneously fitted values (right-hand side) for all four networks and for the sum.

NN Output for the 2jet1tag channel. The background and signal templates are normalized to the SM prediction. NN Output for the 2jet1tag channel. The background and signal templates are normalized to the simultaneously fitted values.

NN Output for the 2jet2tag channel. The background and signal templates are normalized to the SM prediction. NN Output for the 2jet2tag channel. The background and signal templates are normalized to the simultaneously fitted values.

NN Output for the 3jet1tag channel. The background and signal templates are normalized to the SM prediction. NN Output for the 3jet1tag channel. The background and signal templates are normalized to the simultaneously fitted values.

NN Output for the 3jet2tag channel. The background and signal templates are normalized to the SM prediction. NN Output for the 3jet2tag channel. The background and signal templates are normalized to the simultaneously fitted values.

The sum of the NN Outputs of the four different channels. The background and signal templates are normalized to the SM prediction. The sum of the NN Outputs of the four different channels. The background and signal templates are normalized to the simultaneously fitted values.



Summary of the results for the four different channels and the final result of the simultaneous fit in all channels.

 

Binned Likelihood Fit to Data for Separate Search
The templates for all channels are fitted simultaneously in all four channel to the observed distributions using a binned likelihood function. The fit yields a s-channel single top-quark cross section of 1.6-0.9+0.8 pb for the s-channel and 0.8-0.6+0.7 pb for the t-channel production mode. Below you find the resulting likelihood as a function of the s- and t-channel cross section.



The likelihood fit estimate for the simultaneous s- and t-channel cross section measurement. The contours of the 1σ, 2σ, and 3σ uncertainties are valid for both channels simultaneously. The error bars represent the 1σ, 2σ, and 3σ uncertainties of the given channel without any assumptions on the other channel.

 

Observed Significance for Combined Search


The observed Q-value (indicated by the arrow) yields a p-value of 0.00060790 which corresponds to a observed significance of 3.2 σ.

 

Variables in the high-output region of the Combined Search
The invariant mass of the reconstructed top-quark in the high-output region (NN Output > 0.4). The invariant mass of the reconstructed top-quark in the high-output region (NN Output > 0.8).

The output of the KIT flavor separator in the high-output region (NN Output > 0.4). The output of the KIT flavor separator in the high-output region (NN Output > 0.8).

The product of the lepton-charge and the pseudo-rapidity of the light-quark jet in the high-output region (NN Output > 0.4). The product of the lepton-charge and the pseudo-rapidity of the light-quark jet in the high-output region (NN Output > 0.8).

 

Our single top-quark results were approved (blessed) by CDF on Tuesday 2/26/2008, Thursday 3/6/2008, and on Thursday 5/8/2008.