Search for the Standard Model Higgs boson in the MET+b-jets signature with relaxed kinematic cuts in 7.8 fb-1 of CDF data.

D. Bortoletto, Q. Liu, F. Margaroli, and K. Potamianos (Purdue University) 
O. Gonzalez (CIEMAT)
B. Kilminster (FNAL)
H. Wolfe (OSU)

Abstract [Link to public note]

We search for the Higgs boson produced in association with a Z or W boson. We consider a scenario where Z->νν, W-> l ν or Z-> ll and the lepton(s) escape(s) detection ; the Higgs boson decays into a bb pair. This note describes the update using 7.8fb-1 of CDF data of previous searches for the Higgs boson in this signature [Phys. Rev. Lett. 104, 141801 (2010)]. The analysis implements a NN to remove the huge QCD multijet backgrounds. This NN has been significanly improved since the last iterations, and now rejects more backgrounds overall while keeping high signal acceptance. We check the goodness of our background modeling by comparing data against backgrounds in many control regions, and find good agreement. An additional NN, with separate optimization for 2- and 3-jet events, is used to discriminate the Higgs signal from the remaining backgrounds. This new iteration includes significantly improved background modeling (W/Z+jets and multi-jet in particular) and a new neural network based parameterization of the trigger turn-on (which now includes an additional trigger, increasing the acceptance by about 5%). We observe no significant excess in the data, and put an expected (observed) upper limit on the Standard Model cross section times the branching ratio σVH x Br(H -> bb) of 3+1.2-0.8 (2.3), assuming MH=115 GeV. This analysis yields about 10% improvement in the expected limit throughout the low Higgs mass range (100-150 GeV/c2), not including the effect of the increase in integrated luminosity.

To view the images in EPS format, click on each image. To view them in high resolution PNG format, right-click on them and select 'Save As'. To expand/collapse each section, click on it.


We analyse ZH and WH associated production using events where the Higgs boson decays to a bb pair and where the Z decays into two neutrinos, or the W decays leptonically but the electron or muon escapes detection. Thus, the final state we are interested in consists of two b-quark jets, no identified leptons and large missing transverse energy (MET).

Many Standard Model processes can produce this final state: single top, top pair, W/Z + jets, and diboson production. We use Monte Carlo generators (PYTHIA/ALPGEN/POWHEG) to model these backgrounds; the parton showering is done by PYTHIA. In addition, QCD multijet production can mimic this signature due to severely mis-measured jets which appear to have large MET. This is in contrast to the other background processes considered in this analysis, which produce real high MET, e.g. W/Z decays to neutrinos or muons which escape detection in the calorimeter.

Since the QCD heavy flavour production cross-section is orders of magnitude higher than that of the signal, it constitutes the biggest background in this search. We use a matrix technique which allows us to estimate the multi-jet background using data collected by CDF. Additionally light flavour jets can be falsely identified as b-jets (commonly referred to as "mis-tags"). We also use CDF data to estimate the probability for this to happen, which we then apply to the pre-tagged Monte Carlo events to estimate the mis-tags.

As a way to get a better estimate of the event true MET we calculate the track missing transverse momentum, MPT, which is defined as negative vector sum of charged particle track PT's. For true MET events MPT is highly correlated with Calorimeter MET, while for QCD events with mismeasured jets it either correlated or anti-correlated. Thus, MPT provides an additional handle to separate mismeasurements from real MET events.

We further increase the acceptance to signal by accepting events which contain a 3rd jet with ET>15 GeV. The third jet might also be coming from hard radiation from the final state quarks, as well as from hadronic tau decays from W -> τ ν.

Since the last iteration of this analysis, we have significantly increased our acceptance to the ZH/WH signal (by 30-40%) by relaxing the kinematic requirements on each event. We select events with MET larger than 35 GeV/c2 (was 50), a leading jet with ET larger than 25 GeV/c2 (was 35), a second jet with ET larger than 20 GeV/c2. Additionally we require the two leading jets to be separated in η-φ by ΔR larger than 0.8 (was 1.0).

Having no leptons identified in the final state, this channel is peculiar in the fact that backgrounds are many orders of magnitude higher than the signal even after requiring the final state topology of MET plus b-jets. It is thus necessary to develop an event selection which reduces backgrounds to a more manageable size before trying to build a discriminant to measure the single top cross section. We have studied the dynamical properties of these events and implemented a multivariate technique (an artificial neural network, NN) with the goal of cutting out the dominant QCD multijet background as much as possible, separating it from the signal. The Signal Region is defined by placing a cut on this NN output. This approach is better than the traditional rectangular cuts as it allows us to keep ~90% of the signal while rejecting ~70% of the backgrounds.

We then use again a machine learning technique to discriminate the signal from the surviving backgrounds, and finally scan its output distribution to set an upper limit on the associated production cross section times branching ratio in the missing transverse energy plus b-jets (MET+b-jets) final state. This measurement is part of the latest CDF combination.

We define several control regions to check our modeling of the data, check the performance of data-driven QCD multijet background modeling and validate the Monte Carlo-based background simulations.

We analyse 7.8 fb-1 of data collected with the CDF II detector at the Tevatron collider at Fermilab. Our analysis technique has been successfully applied to a top pair production cross-section measurement (using 5.7fb-1) and a electroweak single top production cross-section measurement (using 2.1fb-1) in this signature.

+Event Selection NN Output

Using events from our pre-selection (described above), we train an artificial neural network (NN), a multilayer perceptron (MLP) fed with 14 kinematic variables, to separate the signal from the main background: QCD multijet production. We check our NNQCD with data in two control regions: one QCD-rich and one with mainly Electroweak/Top processes. We then split the pre-selection region into three: events with NN output > 0.45 form the signal region, events with NN output < 0.1 form a signal-like, QCD-rich control region, which serves to extract the QCD scale factor, and finally events with NN output between 0.1 and 0.45 form an region used to cross-check our scale factor.

For a few validation plots of the input variables, please click here.

We use two different algorithms to identify jets originating from b-quarks: "Secondary Vertex" and "Jet Probability" (SecVTX and JetProb). We define three exclusive tagging categories : one containing events with 1 tight tag (SecVTX), one with events with one tight-tagged and one "JetProb-tagged" jet (SecVTX + JetProb) and one with events with 2 tight tags (SecVTX + SecVTX). The fourth plot shows the Double Tag region, i.e. the sum of the SecVTX + SecVTX & SecVTX + JetProb regions. The last plot in each row shows the three tagging categories together.

Pre-selection Region

Event Selection NN Output in the Pre-selection region.
Events with NN output > 0.45 form the signal region.
Excl. SecVTX Excl. SecVTX Excl. SecVTX
Excl. SecVTX Excl. SecVTX Excl. SecVTX
Excl. SecVTX Excl. SecVTX Excl. SecVTX

+Discriminant NN Input Variables

We then feed the following 6 kinematic variables and the NNQCD to a new MLP aiming at separating further the signal from the backgrounds. We train two different networks: one for 2-jet events and one for 3-jet events. The figures below show the distribution of those variables for events in the Signal Region, for all cateogries merged. For category-specific plots, please click here. The data are also shown, in good agreement with the predictions.

Single SecVTX (1S)

SecVTX + SecVTX (SS)

SecVTX + JetProb (SJ)

+Final NN Discriminant Output

The 6 variables shown above are fed to a MLP trained with events with at least one tight tag (SecVTX). We use a likelihood profile of this discriminant to measure the production cross section of single top events. For this, we use three templates, one for each of our taggging categories.

Signal Region

Final NN Discriminant Output in the signal region, with the binning used for the likelihood profile.
Excl. SecVTX SecVTX + SecVTX SecVTX + JetProb
Excl. SecVTX SecVTX + SecVTX SecVTX + JetProb

+Events in Signal Region

Number of events in the signal region.

+Systematic Uncertainties

Systematic uncertainties are split in normalization uncertainty and shape uncertainty. The normalization uncertainty relfects changes to the event yield due to the systematic effect while the shape uncertainty reflect changes to the template histograms. Both of these effects can be included, depending on the source the systematic uncertainty.


We apply our analysis to 5.7 fb$^{-1}$ of CDF Run II data and set an upper limit to the associated production cross-section times branching ratio. All sources of systematic uncertainties (normalization and shape) are included.
The predicted and observed cross-section limits of the ZH and WH processes combined when H -> bb divided by the SM cross-section.

These results were blessed on July, 8 2011. Created by Karolos Potamianos on July, 4 2011. Last updated on July, 15 2011. [Contact]