Measurement of Single Top Quark Production in L=2.1 fb -1 of CDF Run II Data in the Missing Transverse Energy (MET) plus Jets Signature.


Artur Apresyan, Daniela Bortoletto, Fabrizio Margaroli and Karolos Potamianos (Purdue University)  [Contact]

Abstract [Link to public note]

Top quarks are produced mostly in pairs at the Tevatron through the strong force. The production of one top quark per process is allowed through electroweak processes, with its cross section being half the size of the former. In addition, the less distintinctive signature makes it much harder to observe. Tevatron experiments looked until now only in events where one high energy electron or muon has been identified, in order to suppress the huge QCD background and achieve a reasonable signal over background ratio. We look here for the first time at events where no electron or muon has been identified, or where tau leptons decay hadronically and are reconstructed as calorimetric jets. Multivariate analysis techniques are used to discriminate the single top signal against the dominant backgrounds, and we use a likelihood profile of this discriminant to measure the production cross section of single top events, reaching the expected sensitivity of 1.4 σ. Once looking at the first 2.1 fb -1 of data recorded by the CDF II detector, we measure a cross section of σs+tobs = 4.9 +2.5 -2.2 pb with an expected sensitivity of 2.1σ. We also measure the Vtb element of the CKM matrix: |Vtb| = 1.24 +0.34 -0.29 ± 0.07 (theory).

To view the images in EPS format, click on each image. To view them in high resolution PNG format, right-click on them and select 'Save As'. To expand/collapse each section, click on it.

+Introduction

We have analyzed electroweak single top production. The Standard Model predicts that the top quark decays to a W boson and a b quark almost 100% of the times, and the W subsequently decays hadronically or leptonically. We are interested in those events where the W decays leptonically but the electron or muon escapes detection, or where the tau is reconstructed as a jet. The final state we are interested in consists of two b-quark jets, no leptons and large missing transverse energy (MET, from W decay).

Many Standard Model processes can produce that final state, such as single top (our signal), top pair production, W/Z + jets, diboson production. These backgrounds are modelled using PYTHIA Monte Carlo simulation. In addition, QCD multijet production can mimic this signature due to severely mismeasured jets which appear to have large MET. Most of background processes considered in this analysis do produce real high MET, e.g. W/Z decays to neutrinos or muons, which escape detection in the calorimeter.

Since the QCD heavy flavour production cross-section is orders of magnitude higher than that of the signal, it constitutes the biggest background in this search. Additionally light flavour jets can be falsely identified as b-jets (commonly referred to as "mistags"). For a search for the Standard Model Higgs boson with the same signature, we have developed a technique which allows us to estimate both of these backgrounds in a unified manner directly using data collected by CDF.

As a way to get a better estimate of the event true MET we calculate the track missing transverse momentum, MPT, which is defined as negative vector sum of charged particle track PT's. For true MET events MPT is highly correlated with Calorimeter MET, while for QCD events with mismeasured jets it is not. Thus, MPT would provide an additional handle to separate mismeasurements from real MET events.

We further increase the acceptance to signal by accepting events which contain a 3rd jet with ET>15 GeV. Single top appears as events with three jets in the NLO t-channel events. The third jet might also be coming from hard radiation from the final state quarks, as well as from hadronic tau decays from W -> τ ν.

Having no leptons identified in the final state, this channel is peculiar in the fact that backgrounds are many orders of magnitude higher than the signal even after requiring the final state topology of MET plus b-jets. It is thus necessary to develop an event selection which reduces backgrounds to a more manageable size before trying to build a discriminant to measure the single top cross section. We have studied the dynamical properties of these events and implemented a multivariate technique (an artificial neural network, NN) with the goal of cutting out the dominant QCD multijet background as much as possible, separating it from the signal. The Signal Region is defined by placing a cut on this NN output. This approach is better than the traditional rectangular cuts as it allows us to keep ~91% of the signal while rejecting ~65% of the backgrounds.

We then use again a machine learning technique to discriminate the signal from the surviving backgrounds, and finally scan its output distribution to measure the production cross section of the single top processes in the missing transverse energy plus jets final state. This sample being statistically independent from the one used till now by CDF, it will provide independent measurements of σsingle top and Vtb which can be regarded as a consistency check; moreover, these measurements can be combined to the existing measurements to increase the precision in the determinations of these two quantities.

We define several control regions to check our modeling of the data, check the performance of data-driven QCD multijet background modeling and validate the Monte Carlo-based background simulations.

The data were collected with the CDF II detector at the Tevatron collider at Fermilab.


+Event Selection NN Output

We train an artificial neural network (NN), a multilayer perceptron (MLP) fed with 15 kinematic variables, to separate the signal from the main background: QCD multijet production. We check our QCDNN with data in two control regions: one QCD-rich and one with mainly Electroweak/Top processes. We then split the pre-selection region into two: events with NN output > -0.1 form the signal region, while events with NN output < -0.1 form a signal-like, QCD-rich control region.

Performance of our QCDNN, separating the QCD multijet background from the signal region.

We use two different algorithms to identify jets originating from b-quarks: "Secondary Vertex" and "Jet Probability" (SecVTX and JetProb). We define three exclusive tagging categories : one containing events with 1 tight tag (SecVTX), one with events with one tight-tagged and one "JetProb-tagged" jet (SecVTX + JetProb) and one with events with 2 tight tags (SecVTX + SecVTX). The last plot in each row shows the three tagging categories together.

QCD-rich Control Region

Event Selection NN Output in a QCD-rich control region. This serves as a check of our data-driven model for QCD multijet background estimation.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Electroweak/Top Control Region

Event Selection NN Output in an Electroweak/Top control region. This serves as a check of our modeling of electroweak processes.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Pre-selection Region

Event Selection NN Output in the Pre-selection region.
Events with NN output > -0.1 form the signal region, while events with NN output < -0.1 form a signal-like, QCD-rich control region.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Shape Comparison in Event Selection Region

Shape comparison of the Event Selection NN Output in the Pre-selection region for all the processes involved.
Events with NN output > -0.1 form the signal region, while events with NN output < -0.1 form a signal-like, QCD-rich control region.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

+Discriminant NN Input Variables

We then feed the following 11 kinematic variables to a new MLP aiming at separating further the signal from the backgrounds. The figures below show the distribution of those variables for events in the Signal Region. The data are also shown, in good agreement with the predictions.


+Final NN Discriminant Output

The 11 variables shown above are fed to a MLP trained with events with at least one tight tag (SecVTX). We check our final NN discriminant in three control regions. We then use a likelihood profile of this discriminant to measure the production cross section of single top events. For this, we use three templates: the three taggging categories.

QCD-rich Control Region

Final NN Discriminant Output in a QCD-rich control region. This serves as a check of our data-driven model for QCD multijet background estimation.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Electroweak/Top Control Region

Final NN Discriminant Output in an Electroweak/Top control region. This serves as a check of our modeling of electroweak processes.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Signal-like, QCD-rich Control Region

Final NN Discriminant Output in a signal-like, QCD-rich control region. This serves as an additional check of our data-driven model for QCD multijet background estimation and is the region from which we extract the normalization of the QCD multijet production.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Signal Region

Final NN Discriminant Output in the signal region, with the binning used for the likelihood profile.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

Shape Comparison in Signal Region

Shape comparison of the Final NN Discriminant Output for each process in the signal region, with the binning used for the likelihood profile.
Excl. SecVTX SecVTX + JetProb SecVTX + SecVTX

+Events in Signal Region

Number of events in the signal region.

+Systematic Uncertainties

Systematic uncertainties are split in normalization uncertainty and shape uncertainty. The normalization uncertainty relfects changes to the event yield due to the systematic effect while the shape uncertainty reflect changes to the template histograms. Both of these effects can be included, depending on the source the systematic uncertainty.

The table below summarizes the systematic uncertainties and their effects.

Summary of all the systematic uncertainties, with their effects.

+Results

We apply our analysis to 2.1 fb$^{-1}$ of CDF Run II data and measure the single top production cross-section for the first time in this channel (events where the lepton from the W decay is either not identified or reconstructed as a jet).

Linearity check

We check our machinery by performing a linearity check, during which we vary the input cross-section, expecting an output varied in a same way, which we do obtain.

Linearity check. The varied outcomes (dots) are on the input = output line (black), which confirms our machinery is working properly.

Cross-section measurement

The result of the binned maximum likelihood fit is shown below. All sources of systematic uncertainties (normalization and shape) but the top mass variation are included.

σs+t = 4.9 +2.5 -2.2 pb

Distribution of the outcomes of the pseudo-experiments and observed cross-section combining the three tagging categories.

Observed cross-section measurements for the combination as well as for the three tagging categories taken alone. The grey band represent the theoretical predictions by Z. Sullivan et al., PRD 70, 114012 (2004).

Hypothesis test

We determine the significance of this result using a likelihood ratio technique. We generate pseudo-experiments assuming the background only hypothesis (B). The test statistic used is the likelihood ratio. The p-value is the probability of the background only hypothesis (B) to fluctuate to the observed result in data. We estimate the expected p-value by taking the median of the test hypothesis (S+B) distribution as the 'observed' value (dashed arrow). All sources of systematic uncertainties (normalization and shape) are included.

Expected p-value: 0.0785 (1.4σ)
Observed p-value: 0.0160 (2.1σ)

Test statistics for expected cross-section measurement.

Measurement of the Vtb element of the CKM matrix.

|Vtb| = 1.24 +0.34 -0.29 ± 0.07 (theory)

Measurement of the Vtb element of the CKM matrix.

These results were blessed on December, 18 2008 & January, 8 2009. Created by Karolos Potamianos. Last updated on January, 12 2009. [Contact]