HOBIT

Higgs Optimized B-jet Identification Tagger

Index

Description of HOBIT

The Higgs-Optimized b-Identification Tagger (HOBIT) is a multivariate b-jet tagger that, as the name implies, has been optimized to identify b-jets from the decay of Higgs bosons while rejecting jets from u-, d-, s-quarks, and gluons. It has a continuous output so that different operating points can be used to optimize Higgs sensitivity for each analysis in which HOBIT is implemented.  In addition, the HOBIT outputs could potentially be used as inputs for final discriminant used to separate signal from background. A public note describing the details of HOBIT development, calibration, and validation is available on the Higgs public results website or via direct link at the top of the page and will be submitted to NIM.

The HOBIT Tagger is an implementation of a multivariate tagger trained and tested using the TMVA package.  It uses variables from the SecVtx tagger [1], Roma tagger [2] and BNess tagger [3] as inputs, and outputs a value that is ideally between -1 and 1. An output of -1 means that the jet is light-jet-like, and an output value of 1 means that the jet is b-quark-like.
Back to the top

HOBIT Inputs

The inputs to HOBIT are a combination of general jet properties, and inputs developed for SecVtx, Roma and BNess taggers.  The training was performed using a sample of b-quark jets from Higgs to bb samples with a Higgs mass of 120 GeV as signal and light jets (udsg) from Alpgen W+jets samples.  The light jets background sample was reweighted to have the same Et spectrum as the b-jets from Higgs decays.  The full list of inputs are:
  • Inputs from the BNess tagger
    • BNess Track 0 (most heavy-flavor-like track)
    • BNess Track 1 (second most)
    • BNess Track 2
    • BNess Track 3
    • BNess Track 4
    • BNess Track 5
    • BNess Track 6
    • BNess Track 7
    • BNess Track 8
    • BNess Track 9
    • Number of BNess-eligible tracks
  • Inputs from the Roma tagger
    • Status of the SecVtx loose tag
    • The invariant mass of the SecVtx loose fit vertex
    • Number of muons in the jet
    • Momentum perpendicular to jet axis of best muon candidate
    • Jet Et
    • 3D impact parameter significance of best Roma vertex
    • Invariant mass of Roma's heavy-flavor-like tracks
    • Number of of Roma's heavy-flavor-like tracks
    • Fraction of total Roma track Pt carried by its heavy-flavor-like tracks
    • 3D impact parameter of best Roma vertex
    • pseudoCtau of best Roma vertex
    • Invariant mass of best Roma vertex
    • Total number of Roma tracks
    • Pt of all Roma tracks

Back to the top

HOBIT Performance

The plots below show the performace of HOBIT in Monte Carlo. Figure 1 shows the output distributions in the training and testing samples used to develop HOBIT.  Figure 2 shows the light-jet rejection versus the b-jet acceptance in the Monte Carlo samples that were used for the training and testing samples. The efficiencies and misidentification rates have not had the MC corrections applied.  The corrected b-jet efficiency and light-jet misidentification rates for several operating points are shown in the Table 1.

Figure 1 & 2.

The left figure shows the HOBIT output distributions in Monte Carlo simulation for b-jets from Higgs decays in blue and from light-jets in red. The right figure shows the HOBIT light-jet rejection rate versus the b-jet efficiency in MC compared to several other CDF b-jet taggers.
Back to the top

HOBIT Operating Points

HOBIT Cut b-jet eff light-jet mistag
0.72 0.70 0.089
0.89 0.59 0.029
SecVtx Loose 0.47 0.029
0.94 0.54 0.014
SecVtx Tight 0.39 0.014
0.98 0.42 0.0089
Table 1. The b-jet efficiency and light-jet mistag rates for 4 HOBIT operating points. The SecVtx Loose and Tight are also listed for comparison. The table was made for b-jets from light Higgs decays and for light-jets (udsg) from Z decays. The table shows the results after correcting the simulation to match the response in data.


Back to the top

Corrections to HOBIT response in simulation

The performance of HOBIT in the CDF Monte Carlo simulation is not identical to performance in data.  Two orthogonal methods were developed to measure the correction needed in simulation to match the data response.  The first method was the electron-conversion jet method, documented in the public HOBIT note.  The second method is based upon the tt-bar cross-section measurement described in Nazim Hussain's McGill University Masters Thesis.  The simultaneous tt-bar cross-section and b-tagging scale factor measurement from that analysis was modified such that the tt-bar cross section is held constant and the data-MC chi-squared is minimized by varying the b-tag efficiency and light-jet mistag scale factors simultaneously.  The two measurements were combined as the uncertainties in each measurement are completely uncorrelated and each had similar sensitivity making the combination a significant improvment.  The combined measurement of the b-jet tagging efficiency correction varies from 0.993 (loose) to 0.915 (tight) depending on the operating point with a consistent uncertainty of 0.035 at all operating points. The combined measurement of the light-jet mis-identification correction varies from 1.33 (loose) to 1.50 (tight) depending on the operating point with an uncertainty which varies from 0.15 (loose)  to 0.31 (tight).

Below are validation plots of the electron conversion jet and tt-bar methods.
Figure 3 & 4.

The missing Et distribution between data and MC for the case of two loose HOBIT tags (left) and two tight HOBIT tags (right), data vs. MC, W+3/4/5 jets events.



Figures 5 and 6 are a comparison of data vs. MC using electron jets from the electron-conversion jet method. MC is divided into light jets vs. (b + charm = HF) jets. Relative fractions of light vs. HF MC is obtained via a fit to the HOBIT output (left plot). HF is enhanced in the electron jets via a requirement that the electron not come from a converson, and that a jet opposite the electron jet (not shown in the plots) have a SecVtx tag.
Figure 5 & 6.

The left figure shows the fit of light flavor and HF electron jets in MC to the electron jets in data. Due to the HF-enriching cuts (see above), approximately 75% of jets are HF. The right figure compares data vs. MC for the four most important inputs to HOBIT with the relative light and HF fractions in MC fixed. Reasonable agreement between data and MC is seen.

Back to the top

Validation of HOBIT performance in WH analysis

The HOBIT tagger was incorporated in the CDF WH -> lnubb analysis and so we have taken several distributions from the analysis to represent the performance of the tagger in the data.  The details of the WH Analysis can be found in the public note here.  Four of the most important HOBIT inputs are shown along with the HOBIT output distribution. Good agreement is seen between data and MC.
Figure 7 & 8.

Above, the left figure shows the leading jet transverse energy in WH events with double tight HOBIT tag. The right figure shows the leading jet eta distribution in WH events with double tight HOBIT tag.
Figure 9. The HOBIT distribution for the highest Et jet in W+2/3/4/5 jet events, data vs. MC.


Figure 10 & 11.

The highest (left) and second highest (right) track bness distribution for the highest Et jet in W+2/3/4/5 jet events, data vs. MC.

Figure 12 & 13.

The significance of the 3-d secondary vertex displacement (left) and its pseudo-ctau (right) for the highest Et jet in W+2/3/4/5 jet events, data vs. MC.

References

1. SecVtx Tagger - Phys. Rev. D 71 (2005) 052003, arxiv:hep-ex/0410041
2. RomaNN Tagger - Phys.Rev.D85:012002,2012. arxiv:1108.2060
3. BNess Tagger - Nucl. Instrum. Methods A 663, 37 (2011). FERMILAB-TM-2515-E-PPD. arxiv:1108.4738
4. Public HOBIT Document here

Comments/Questions - Michael Kirby