Search for SM Higgs boson production in association with ttbar using a final state with no lepton

Hyun Su Lee1, Wesley Ketchum1, and Young-Kee Kim1,2
1 The University of Chicago, and 2 Fermilab
Contact the authors

Final results

Public Note

Images are saved as .gif or .png files. Click on any image to link to an .eps file


We describe a search for Higgs boson production in association with a ttbar in the ppbar collisions of sqrt(s) = 1.96 TeV Tevatron collider at Fermilab, collected with CDF II detector. To avoid overlap with lepton channel analysis, we explicitly exclude events which have high pT lepton. We consider a scenario of all hadronic decay or lepton+jets decay of ttbar but, an electron or a muon escapes detection. The Higgs boson is considered to be bbbar but, we do not explicitly exclude other decay mode. We remove the dominant QCD multijet production background using neural network trained between ttH signal and pretag data taking into advantage of different kinematics of signals. We only select events which are corresponding to signal like events with high score neural network output. We then have another neural network to discriminate ttbar background from signal. The final discriminants are built by multiplying two neural networks and used to set the 95% confidence level upper limit of Higgs boson cross section based on no access of Higgs boson signature.

Event selection:

The sample of events used in this measurement is a subset of events selected by a CDF II online selection (trigger), which identifies and records events with at least four jets of transverse energy ET > 15 GeV and a sum ET of these jets greater than 175 GeV. After trigger selection, events are required to further offline reconstruction, where jets are reconstructed with JETCLUE cone algorithm using a cone radius of ∆R=√{∆ϕ2+∆η2}=0.4. We define a tight jet to be ETgreater than 15 GeV with |η| less than 2.0. To avoid overlaps with lepton channel analysis, we reject events with high pT electrons or muons. We then select events with the following requirements: We categorize the sample based on the presence of large missing transverse energy (MET) using MET significance , where MET significance=MET/√{ΣET}, to be greater than 2 (MET+jets channel) or less than 2 (all jets channel). In the MET+jets channel, we require following selection additionally where NNQCD is output of neural network trained between signal and QCD multijet production explained later. All jets channel have requirements. Where NNQCD1 is output of neural network trained between signal and QCD multijet production and NNQCD2 is output of 2nd stage neural network trained between signal and QCD multijet production only using NNQCD1 > 0.9 events. Number of jets requirement are corresponding to the our consideration of each channel which are six jets in the MET+jets channel and eight jets in the all jets channel. The neural network (NN) output of each sample will be discuss later.

The NN based b tagger uses the track information to tag jets as coming from b quarks. We require at least two tagged jets per event. In both channels, we separate the sample based on number of b-tagged jets as exactly two b-tagged jets (2-tag) and three or more than three b-tagged jets (3-tag) events.

Background events with b tags can arise not only from ttbar but also from QCD multijet and electroweak productions of W bosons associated with heavy flavor jets. However no requirement of charged lepton makes the dominant background caused by QCD multijet production. In order to improve the signal-to-background ratio in this analysis, we should remove the dominant QCD multijet production. A NN is trained to identify the kinematic and topological characteristics of SM ttH events against data without b-tag requirement which are dominated by QCD multijet production. The trainings are done separately for MET+jets and all jets channel with different selection of input variables. You can see the validation of input variables (nine input for MET+Jets and 12 input for all jets) and neural network output in the inclusive tagged (≥ 1tag) sample in below link.
MET+Jets channel    All Jets channel

We apply the NN to all events and reject large amount of QCD multijet events by selecting high score NN output events (NNQCD > 0.8 (Signal region) for MET+Jets and NNQCD1 > 0.9 (Pre-signal region) for all jets channel.

Even though we reject significant amount of QCD multijet background by using NNQCD1 > 0.9 in the all jet channels, this channel still have significant amount of background due to the dominant QCD multijet production in the Pre-signal region. We then have 2nd stage NN (NNQCD2) for the further rejection of background events in the pre-signal region. You can see the validation in the pre-signal region to check the input variables. Later ttbar separation neural network (NNTop) also use almost same input in the pre-signal region.
2-tag pre-signal region (All Jets)    3-tag pre-signal region (All Jets)

We select the events which have this NN output (NNQCD2) greater than 0.7 (Signal region).

We estimate the background b tags based on a per jet parameterization of the b-tagging probability from a background dominant sample. In the MET+jets, we use the sample containing events with exactly three jets, which have negligible contamination of ttbar and ttH. For the all jets channel, we use exactly four jets events. We parameterize the per jet b-tag rate as a function of three jet characteristics. We extrapolate the b-tagging probability to higher jet-multiplicity events. We calculate the background b tags for 2-tag and 3-tag samples separately, but a b-tagging correction factor is applied to take into account the fact that most of the heavy flavor jets are produced in pairs. Due to the difference of tag probability caused by different sample composition and detector effect from higher jet-multiplicity events, we have reweighted number of jets distribution using background dominant samples (low score, less than 0.4, NNQCD or NNQCD1). Due to the imperfect modeling of very low score NN output (NNQCD or NNQCD1 less than 0.05), we do not use those events in this modeling but, total rate differences are assigned as systematic uncertainties.
With the background rate estimation procedure described above, we obtain the estimated numbers of background events. We also estimate ttbar background, with production cross section of 7.0 pb at Mtop = 172.5 GeV/c2, and ttH signal, with cross section of 4.9 fb  at MH = 120 GeV/c2. Table below shows the expected backgrounds and signal.

Final discriminant:

We use NNQCD (MET+Jets) and NNQCD2 (All jets) to separate non-ttbar background events as shown below. Upper are MET+Jets and lower are all jets channel with 2tag (left) and 3tag (right).
MET+Jets 2tag NN_QCD MET+Jets 3tag NN_QCD
All Jets 2tag NN_QCD All Jets 3tag NN_QCD
However, we need to discriminate ttbar background also because it has similar kinematics with our signal. We have another neural network (NNTop) training between signal and ttbar. We use NNQCD > 0.8 events for MET+Jets training. In the all jets channel, only NNQCD1 > 0.9 cut are applied without NNQCD2 requirement. To validate input variables, one can see the plots in this region (Signal region for MET+Jets and Pre-signal region for all jets channels). The neural network output (NNTop) are shown below.
MET+Jets 2tag NN_top MET+Jets 3tag NN_top
All Jets 2tag NN_top All Jets 3tag NN_top
The output (NNTop) is multiplied with QCD neural network (NNQCD or NNQCD2) to make final discriminant histograms. These histograms are direct input of binned likelihood calculation to extract signal components. The results are shown below.
MET+Jets 2tag Final histogram MET+Jets 3tag Final histogram
All Jets 2tag Final histogram All Jets 3tag Final histogram
Below plots show the flowchart of each step of analysis from background modeling to building final discriminant in the MET+Jets (left) and all jets(right) channel. To note we have different step betwen MET+Jets and all jets channel because of different amounts of QCD multijet background. We have two stage neural network training to reject QCD multijet production in the all jets channel. In the plots, colors indicate the region used in each process.
MET+Jets All Jets

Validation of modeling:

To verify the modeling of backgrounds and signal, we test our ability to predict the background in the various control regions and signal region.
In the MET+Jets channel, we use 0.05 < NNQCD ≤ 0.4 (background region) to model the number of jet distribution of background by reweight using inclusive tagged sample. Because we use NNQCD > 0.8 (Signal region) as signal sample, between two region 0.4 < NNQCD ≤ 0.8 (Control region) can be out control region. One can see the various validation plots in different region separating tag categories in below (We include 1-tag as another control region because it has different composition of background).
1-tag background region (MET+Jets)    2-tag background region (MET+Jets)    3-tag background region (MET+Jets)
1-tag control region (MET+Jets)    2-tag control region (MET+Jets)    3-tag control region (MET+Jets)
1-tag signal region (MET+Jets)    2-tag signal region (MET+Jets)    3-tag signal region (MET+Jets)

In the All Jets channel, we use 0.05 < NNQCD1 ≤ 0.4 (background region) to model the number of jet distribution of background by reweight using inclusive tagged sample. We use NNQCD1 > 0.9 (Pre-Signal region) as input of another layer neural network, between two region 0.4 < NNQCD1 ≤ 0.8 (Control1 region) can be out one of control region. Our final selection is request 2nd layer neural network output NNQCD2 > 0.7 (Signal region). So then, NNQCD1 >0.9 and NNQCD2 ≤ 0.7 (Control2 region) can be another control region. One can see the various validation plots in different region separating tag categories in below of all jets channel (We include 1-tag as another control sample).
1-tag background region (All Jets)    2-tag background region (All Jets)    3-tag background region (All Jets)
1-tag control1 region (All Jets)    2-tag control1 region (All Jets)    3-tag control1 region (All Jets)
1-tag control2 region (All Jets)    2-tag control2 region (All Jets)    3-tag control2 region (All Jets)
1-tag signal region (All Jets)    2-tag signal region (All Jets)    3-tag signal region (All Jets)

All of validation plots show very nice agreement between our modeling and data which allow us to use final discriminant to extract ttH signal components.

Systematic uncertainties:

We consider a variety of systematic effects that could change the rate as well as the shape of signals or backgrounds. The rate uncertainty reflects changes to the event yield due to systematic effects while the shape uncertainty reflects changes to the final discriminant template histograms.
Cross section: We use NLO cross section to normalize the events for ttbar and ttH. The theoretical uncertainties of each calculation, which are 10% in both case, are assigned as systematic uncertainty.
Trigger simulation: For the MC generated events ttbar and ttH, we simulate our top multijet trigger using calorimeter trigger tower information. We check our simulation using jet data compared with jet MC sample. The difference between simulation and data are assigned as systematic which is 7% in the rate of ttbar and ttH.
Luminosity: The uncertainty of luminosity measurement (6%) is assigned as systematic of ttbar and ttH
B-tagging scale factor: B-tag scale factor (0.92±0.04) (b-tagging rate difference between data and MC) are applied to b-tagged jet to correct b-tag rates of MC generated events. We applied this scale factor for ttbar and ttH in each category and assign the propagated uncertainties as systematic corresponding to approximately 7% for 2-tag and 9% for 3-tag in both ttbar and ttH sample.
Jet energy scale (JES): We vary the JES of MC generate events within our knowledge within ±1σ uncertainty. The variations of JES bring not only rate change but also shape change of final discriminant template. We have 2% (3%) and 11% (13%) of rate uncertainties for ttH and ttbar respectively in 2-tag (3-tag) MET+jets category. In the all jets channel, we have 5% (7%) and 20% (22%) of rate uncertainties for ttH and ttbar respectively for 2-tag (3-tag) events.
Initial and Final state radiation: We consider the 2% rate uncertainties from initial and final state radiation uncertainty
Parton distribution functions: We consider the rate variation of from different choice of parton distribution functions. We assign 2% rate uncertainties from PDF.
NLO ttbb cross section uncertainty: We increase ttbb cross section to be twice of leading order (LO) MC which brings rate change as well as shape change of ttbar. We assign 3-6% rate increasing depending on categories.
Background (non-ttbar) rate and shape uncertainty: We consider the uncertainty of non-ttbar background estimation. The uncertainties of background estimation, which are caused by mismatching the rate in NNQCD < 0.05 region, are 6% and 9% for MET+jets and all jets channel respectively. The scale uncertainty of b-tagging categorization also give systematic uncertainty about 5% for 2-tag and 10% for 3-tag in both channels. We consider the shape change of background template within rate uncertainties.
Below table shows summary of rate systematic uncertainties. All uncertainties are relative to the rate of each process.


We compute the expected limit for standard model Higgs boson cross section using final discriminant histograms at signal region of each category. We build a binned likelihood to extract signal components with gaussian constraints of background normalization within their uncertainties. The systematic normalizations are incorporated into the likelihood as nuisance parameters. We use MCLIMIT packages for the statistical treatment of limit calculation. Below plots show 95% confidence level upper limit of Higgs boson cross section in the MET+Jets channel and all jets channel calculated with above way.
Final results Final results

The combined fit including both MET+jets and all jets channel is shown in below plot and table. All the cross sections are ratios with respect to the standard model cross section. We set 95% confidence level upper limit of standard model Higgs boson cross section as 24.5 (17.8) times the standard model prediction of observed (expected) limit for Higgs boson mass 110 GeV/c2
Final results
Final results


[1] The Tevatron Electroweak Working Group (CDF and D0 Collaborations), FERMILAB-TM-2466-E, arXiv:1007.3178v1.
[2] W. Beenakker et al., Phys. Rev. Lett. 87, 201805 (2001); U. Aglietti et al., arXiv:hep-ph/0612172v2.
[3] L. Reina and S. Dawson, Phys. Rev. Lett. 87, 201804 (2001); L. Reina, S. Dawson, and D. Wackeroth, Phys. Rev. D 65, 053017 (2002).
[4] The Tevatron Higgs Working Group (CDF and D0 Collaborations), FERMILAB-CONF-11-044-E, arXiv:1103.3233v2.
[5] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71, 032001 (2005).
[6] S. Lai, Ph.D thesis, CDF public note 9508; CDF Collaboration, CDF note 10548, conference note in preparation.
[7] F. Abe, et al., Phys. Rev. D 45, 1448 (1992).
[8] CDF Collaboration, CDF Conference note 10311 (2011).
[9] T. Aaltonen et al. (CDF Collaboration), arXiv:1105.1806v1, submitted in Phys. Rev. D.
[10] T. Aaltonen et al. (CDF Collaboration), Phys. Rev. D 81, 052001 (2010).
[11] A. Bhatti, et al., Nucl. Instrum. Methods Phys. Res. A 566, 375 (2006).
[12] T. Sjostrand, S. Mrenna, and P. Skands, J. High Energy Phys. 05 (2006) 026.
[13] G. Bevilacqua, et al., J. of High Energy Phys. 09 (2009) 109.
[14] T. Junk, CDF public note 8128; T. Junk, Nucl. Instrum. Methods in Phys. Res. A 434, 435 (1999).

Hyun Su Lee for the Authors

Last modified July 5, 2011