Search for Single Top Quark Production using Boosted Decision Trees in 2.2 fb-1 of CDF Data   1Fermilab, 2IFCA (CSIC-UC), 3UCLA

- Abstract -
- Event Selection -
- Input Variables -
- Cross-Checks -
- Systematics -
- Results -
- High Scored Regions -

 Abstract:
 We present a search for electroweak single top quark production using 2.2 fb-1 of CDF II data collected between February 2002 and August 2007 at the Tevatron in proton-antiproton collisions at a center-of-mass energy of 1.96 TeV. The analysis employes a multivariate technique based on Boosted Decision Trees (BDT)[ref], where the output is used to build a discriminant variable which we will fit to the data using a binned likelihood approach. We search for a combined single top s- and t-channel signal and measure a cross section of 1.9+0.8-0.7pb assuming a top quark mass of 175 GeV/c2. The probability that the observed excess originated from a background fluctuation (p-value) is 0.0028 (2.8&sigma) and the expected (median) p-value in pseudo-experiments is 0.0000022 (4.6&sigma).

 Event Selection:
 This analysis uses events from leptonic decay of the W boson. We require a single, well isolated high-transverse-energy lepton, large missing transverse energy (from the neutrino), and exactly two or three high-transverse-energy jets. Of these jets, we require at least one to be identified as originating from a b-quark by secondary vertex tagging. The secondary vertex tag identifies tracks associated with the jet originating from a vertex displaced from the primary vertex. We further require the missing transverse energy and the jets not to be collinear for low values of missing transverse energy. This requirement removes a large fraction of the non-W background while retaining most of the signal. Our major backgrounds come from W + heavy flavor jets, Wbb-bar, Wcc-bar, and Wc+jet; mistags which are W + light quark/gluon events that are mistakenly tagged as b-jets due to detector resolution effects; Non-W, which are mostly multijet events in which a jet is mistakenly identified as a lepton and jets are mismeasured, providing a false missing transverse energy signature; and top pair production events in which one lepton or two jets are lost due to detector acceptance.

 Input Variables:
 The variables used for the training of the BDTs in the 2-jet bin channels are listed and ploted (left: W+2jets 1 tag, midle: W+2jets 2tag, right: W+2jets zero tags) below: ET and the η of both jets pT and the η of the lepton missing transverse energy in the event \$\slashed{E}_T\$ scalar sum of the transverse energies HT=&SigmajetsET+pT+\slashed E_T\$ invariant mass of the di-jet system mj1j2 η and the transverse mass of the W boson mT(W) mass of the reconstructed top mlνb invariant mass of the lepton, neutrino and both jets mlνj1j2 charge of the lepton times the η of the b-quark jet Qxη KIT NN flavor separator Δφ between the jets and the \$\slashed{E}_T\$ Δφ between the jets and the lepton Δφ between the the lepton and the \$\slashed{E}_T\$ cosine of the angle between the lepton and the jets Look here for the variables in the W+3jets samples, and here for the W+4 jets with at least one tag

 Cross-Checks:
 In addition to validate the input variables of the BDT, we validate the output of the four trained BDTs in several control regios. We evaluate the BDT outputs in the untagged W + 2 jets (plots in the left) and W + 3 jets (plots in the midle) sample, a high-statistics control sample with very little single-top content (<0.5%). We also evaluate the BDT outputs in the tagged lepton + 4 jets sample (plots in the right), which should agree well with tt-bar Monte Carlo. In all control samples, the data agrees well with the Monte Carlo prediction. Distribution of the output of the BDT trained for W+2jets-1tag evaluated in three control regions. Distribution of the output of the BDT trained for W+2jets-2tag evaluated in three control regions. Distribution of the output of the BDT trained for W+3jets-1tag evaluated in three control regions. Distribution of the output of the BDT trained for W+3jets-2tag evaluated in three control regions.

 Systematics:

Each source of systematic uncertainty can posses a normalization uncertainty and a shape uncertainty. The normalization uncertainty includes changes to the event yield due to the systematic effect, and the shape uncertainty includes changes to the template histograms. Both of these effects are included in the likelihood function as shown above.

Listed below are systematic uncertainties estimated from various Monte Carlo samples.

• The jet energy scale systematic is obtained by changing the jet energy scale by 1 standard deviation (SD) and recalculating the event yield and the discriminant template histograms. This affects both normalization and shape.
• We increase or decrease the amount of initial state radiation in the Monte Carlo to assign a systematic from this effect.
• We increase or decrease the amount of final state radiation in the Monte Carlo to assign a systematic from this effect.
• We vary the eigenvectors in the CTEQ parton distribution function tables to determine the uncertainty from this effect. We also include the effect of using different versions of CTEQ and of using MRST with different values of ΛQCD.
• We include a systematic error to account for the modeling of the single top sample (MadEvent).
• We include an uncertainty on event detection efficiency due to the scale factors that we apply to our Monte Carlo samples (mainly b-tagging and lepton ID scale factors)
• We include a 6% uncertainty on our measured luminosity.
• We include a systematic which accounts for systematic variation of the neural network b tagger output.
• We use an alternative model for our mistag model and use the difference to the default model as a systematic uncertainty.
• We use an alternate model to model our non-W background. We also assign a systematic effect to the flavor composition of the background, which is necessary to include for the neural-net b tagger to run.
• We vary the factorization and renormalization scele (Q2) in the Monte Carlo samples that have been created with the ALPGEN Monte Carlo program.

 Systematic uncertainty Rate Shape Jet energy scale 0...16% X Initial state radiation 0...11% X Final state radiation 0...15% X Parton distribution functions 2...3% X Monte Carlo generator 1...5% Event detection efficiency 0...9% Luminosity 6.0% Neural-net b tagger N/A X Mistag model N/A X Non-W model N/A X Q2 scale in Alpgen MC N/A X Monte Carlo mismodeling N/A X W+bottom normalization 30% W+charm normalization 30% Mistag normalization 17...29% tt-bar normalization 23%
Systematic uncertainties. The numbers here are given for the combined single-top channel. Jet energy scale and neural network b tagger systematics are applied to all processes (not shown here).

 Results:

1. Cross Section Measurement
The result of the binned maximum likelihood fit is shown below. All sources of systematic uncertainties (normalization and shape) are included in the result.

Results from full dataset (all W+2/3 jets candidate events):

### $\sigma$single top =1.9+0.8-0.7pb

BDT output distribution for signal and background processes in the four signal channels. All templates are normalized to the prediction.

2. Hypothesis Test:

We have calculate the signal significance of this result using a standard likelihood ratio technique [4]. In this approach, pseudo-experiments are generated from background only events. The likelihood ratio is used as the test statistic. We then calculate the p-value which is the probability of the background only hypothesis (b) to fluctuate to the observed result in data. We estimate the expected p-value, by taking the median of the test hypothesis (signal + background) distribution as the 'observed' value (dashed red line).

Expected p-value: 0.0000022 (4.6σ)
Observed p-value: 0.0028 (2.8σ)

 High Scored Regions:
 Enriched regions. Top: BDT > 0.25; Bottom BDT > 0.6

-- email us