Simultaneous Template-Based Top Quark Mass Measurement
in the Lepton+Jets and Dilepton Channels Using 3.2 fb-1 of CDF Data
(The dilepton channel uses a new variable mT2.)

Hyunsu Lee1, Jian Tang1, and Young-Kee Kim1,2
1 The University of Chicago, and 2 Fermilab
Contact the authors

Combined Fit: Mtop = 171.7 +1.4-1.5 (stat.+JES) +/- 1.1 (syst.) GeV/c2 = 171.7 +1.8 -1.9 GeV/c2
= 171.7 +/- 1.0 (stat.) +/- 1.5 (syst.+JES) GeV/c2
Lepton+Jets Channel: Mtop = 172.2 +1.5-1.6 (stat.+JES) +/- 1.1 (syst.) GeV/c2 = 172.2 +/-1.9 GeV/c2
= 172.2 +/- 1.1 (stat.) +/- 1.5 (syst.+JES) GeV/c2
Dilepton Channel: Mtop = 169.3 +/- 2.7 (stat.) +/- 3.2 (syst.) GeV/c2 = 169.3 +/-4.2 GeV/c2
Dilepton using mT2 only: Mtop = 168.0 +4.8-4.0 (stat.) +/- 2.9 (syst.) GeV/c2 = 168.0 +5.6-5.0 GeV/c2

This analysis: Public Note
Anslysis with 2.7 fb-1: Public Webpage, Public Note
Anslysis with 1.9 fb-1: public Webpage, Public Note

Images below are saved as .gif or .png files. Click on any image to link to an .eps file


We present a measurement of the top quark mass simulatneously in the Lepton+Jets and Dilepton decay channels. We use a datasample with integrated luminosity 3.2fb-1 collected by the CDF II detector. The data sample consists of 524 Lepton+Jets event candidates and 236 Dilepton event candidates.
In the Lepton+Jets channel a chi-squared function is minimized to obtain reconstructed top mass mtreco for every event. The invariant mass of the jets coming from the hadronically decaying W boson mjj is used to reduce the dominant systematic effect arising from the jet energy scale.
Neutrino Weighting Algorithm is applied to all Dilepton channel events where we integrate over the unknown quantites in the kinematically underconstrain system to obtiain the reconstructed top mass mtNWA. We newly introduce mT2 variable to replace HT- the scalar sum of the momenta of jets and leptons and missing transverse energy which give better mass resolution.
Kernel density estimation (KDE) is used produce probability density functions that are two-dimensional in the observables. The two-dimensional distributions (mtreco; mjj) and (mtNWA; mT2) from data are compared to Monte Carlo to measure the top quark mass and the jet energy scale. The jet energy calibration from the Lepton+Jets channel is naturally applied to the Dilepton channel. We measure Mtop=171.7 +1.4-1.5 (stat.+JES) +/- 1.1 (syst.) GeV/c2. We also perform separate fits in Lepton+Jets channel yielding Mtop=172.2 +1.5-1.6 (stat.+JES) +/- 1.1 (syst.) GeV/c2 and Dilepton channel yielding Mtop=169.3 +2.7-2.7 (stat.) +/- 3.2 (syst.) GeV/c2. Note that the Dilepton channel only fit has no in-situ JES calibration. Because mT2 observable has very intresting for mass measurement of SUSY particle, we add one more Top mass measurement in the Dilepton channel which we only use mT2 as observable. We yield Mtop=168.0 +4.8-4.0 (stat.) +/- 2.9 (syst.) GeV/c2. It is the first measurement using mT2 in the real data.


mT2 was initially introudced to measure the mass of massive particles that decay into the final state including two invisible particles. Because LHC is going to ready for data taking, the mass measurement of new particle is very interesting. Although there are many interesting on mT2 to measure the mass of exotic particles, nobody apply this value in the real data. The ttbar system in the Dilepton channel have two invisible particles and have real data with well established background estimation. Therefore, the top mass measurement in the Dilepton channel using mT2 is very interesting. In addition, our usual mass measurement in the Dilepton channel used two observables which are mtNWA and HT but, HT has poor mass resolution we have the place to include mT2 in our measurement. We performed pseudoexperiments to get the statistical uncertainties of the top quark mass in the Dilepton channel. Below shows the statistical uncertainties when we use mtNWA(left), mT2(middle) and, HT(right).
NWA mt2 ht

It is clear that mT2 has better resolution than HT for the top quark mass measurement. We performed pseudoexperiments with the two dimensional input, below shows statistical uncertainties (left) and the RMS of mass measurement (right). We overlaid two cases, mtNWA and mT2 (blue), mtNWA and HT (red). We obtained ~10% improvement by replacing HT with mT2.

Event selection in the Lepton+Jets channel:

To select events in the Lepton+Jets channel where one W from the tops decays to a pair of hadrons, and the other W decays to a charged lepton (electron or muon) plus a neutrino, we require a well-identified electron or muon, large missing transverse energy and 4 jets, at least one of which is identified as arising from a b quark. We take advantage of different signal-to-background (S:B) and event shapes by splitting our sample into two non-overlapping subsamples, based on the number of jets with a b-tag (using CDF's secondary vertex tagger, SECVTX). Events with exactly one tag are required to have exactly 4 jets. In events with two or more tags, which have a higher S:B and more statistiacl power, we loosen the cut on the 4th jet and allow more than 4 jets. The event selection is summarized in the following table:

Number of b-tags
>= 2
Jets 1-3 Et threshold (GeV)
4th jet Et threshold (GeV)
Extra jets (GeV)
< 20
> 20
> 20
In addition we require that the chi-squared returned by the kinematic fit is smaller than 9 for both Lepton+Jets subsamples to further reduce the background fraction and to ensure that only well reconstructed ttbar events enter the analysis. Furthermore to avoid possible bias in the probability density functions we apply a boundary cut requiring that all events have 110 < mtreco < 350 GeV/c2 and 50 < mjj < 115 GeV/c2 for the single-tag subsample and 50 < mjj < 125 GeV/c2 for the double-tag subsample.

Event selection in Dilepton channel:

We design the selection to accept ttbar events where both W bosons decay into an electron or muon and neutrino pair. We use W+jets dataset which is triggered on a central electron or central muon. The selection criteria are summerised as follows
  • Two leptons (e or mu) with pT > 20GeV. One lepton has to be isolated
  • Two jets with transverse energy > 15 GeV, jets are corrected for differences in responce in different calorimeter regions and calorimeter nonlinearities
  • Missing transverse energy > 25 GeV
  • Z-veto incorporating missing ET significance cut
  • ET > 50 GeV if a lepton is closer than 20o in azimuth from the missing ET vector
  • HT > 200 GeV

We divide the dilepton sample into two sub samples based on the presence of a b-tag to enhance the statistical power of the method. As in the Lepton+Jets channel we apply a boundary cut requiring that 100 < mtNWA < 350 GeV/c2 and 20 < mT2 < 300 GeV/c2.

Top mass reconstruction and dijet mass reconstruction in the Lepton+Jets channel:

A chi-2 minimization is performed to reconstruct a top quark mass for each event. The fitter is based on the hypothesis that the event is ttbar: it contains W mass constraints on the hadronic and leptonic side and requires the two top masses in the event to be equal. Only the leading 4 jets are assigned to the four quark daughters from the top quark decay. The jet-parton assignment that yields the lowest chi2 after minimization is kept for further analysis, and the corresponding top mass (mt) is used in our templates. The distribution of mtreco for the two Lepton+Jets subsamples are shown below.
1tagmtreco 2tagmtreco

To measure the JES, mass templates of the W boson decaying hadronically mjj are also constructed in addition to the top mass templates. The chi2 fitter is not used to obtain mjj though events failing the chi2 cut are also not used to measuring the JES. In 2-tag events, there is only one dijet mass from among the leading 4 jets consistent with b-tagging (ie not tagged as a b). In 1-tag events, there are 3 dijet masses consistent with b-tagging. We take the single dijet mass closest to the well known W mass as the single value of mjj per event. Mjj for the 2 subsamples for 3 different values of the JES in the detector are shown below.
1tagwjj 2tagwjj

Top mass reconstruction in the Dilepton channel:

We use Neutrino Weighting Algorithm to reconstruct events in the Dilepton channel. Here there are not enough measured quantities to fully constrain the event. This is due to the presence of two neutrinos in the final state. We integrate over neutrino pseudorapidities taking the distribution from the Monte-Carlo simulation. The algorithm procedes as follows:
  • Assume the value of the top mass.
  • Choose a particular jet to b-quark assignment (there are two possibilities)
  • Assume neutrino pseudorapidities.
  • Using the world average masses of the W boson b quark and leptons we now can solve for the Px and Py of each of the neutrinos. Solutions might not exist for the assumed value of the top quark mass and values. When a solution exists we will have two solutions for each neutrino.
  • We form four weights comparing each combination of solutions to the measured missing transverse energy with a Gaussian weight. Since the correct combination is not known we sum the four weights.
  • We integrate over 1 and 2 obtaining the weight for the assumed top mass. The integration distribution for neutrino pseudorapidities is taken from the ttbar Monte Carlo and is a Gaussian with width approximately 1. The integration is performed by summing a grid of values with 0.2 spacing
  • We obtain the weight corresponding to the other jet to b-quark assignment
  • We sum the two weights. Now we have a handle on probability that the true top mass is the top quark mass we assumed.
  • We scan the top mass in units of 3GeV.
  • The maximum weight is found, as well as maximum weights of the two jet to b-quark assignments separately.
  • The scan is repeated succesively around the maxima until the step size of 0.03GeV is reached.
  • The assumed top mass which yields the highest weight is taken as the reconstructed top mass mtNWA

  • Below are shown distributions of mtNWA for three true top masses for the two Dilepton subsamples.

    0tagmtnwa tagged_mtnwa

    The distributions of mT2 are shown below.
    0tag_mt2 tagged_mt2

    Backgrounds for the Lepton+Jets channel

    The background sources and their expected fraction of the total background are given in the table below with expected signal and observed data. The backgrounds are dominated by real W boson production in association with high-pt jets. The absolute normalization of W+jets is determined from the data, but the relative normalization between the different flavor samples is taken from MC. The expected number of events for single-top and diboson background are taken from theoretical cross-sections and MC predictions. We assume 6.7 pb of ttbar cross section. The table below shows the expected backgrounds and signal.

    Backgrounds for the Dilepton channel

    The major background for the dilepton channel are Drell-Yan process, diboson production and Fakes -where a jet mimics a lepton.
    The Drell-Yan background is notoriously hard to model given the fact that the signal selection uses a Z-veto. We use more than 50 'matched' Alpgen+Pythia samples which cover on-peak and off-peak regions as well as associated light flavour and heavy flavour jet production. We remove events with heavy flavour jets generated by Pythia showering from light flavor samples and some heavy flavour samples.
    We model the fakes background using data. We select events from the W+jets dataset requiring one isolated lepton. We apply a dilepton veto to eliminate ttbar events. We require that a lepton object likely to be a fake be present. All other selection criteria are applied. Remaining events are reconstructed with NWA and form the fake background shape.
    We expect number of ttbar signal by assumption of 6.7 pb cross section. Expected backgrounds and signal are shown in the table below.

    Kernel Density Estimation:

    We use a non-parametric Kernel Density Estimate-based approach to forming probability density functions from fully simulated Pythia MC. The probability for an event with an observable x is given by the linear sum of contributions from all entries in the MC:
    KDE 2 image

    Here, f(x) is the probability to observe x given some MC sample with known mass and JES (or the background). The kernel function K is a normalized function that adds varying probability to a measurement at x depending on its distance from xi. The smoothing parameter h is a number that determines the width of the kernel. Larger values of h smooth out the density estimate, and smaller values of h keep most of the probability weight near xi. We use an adaptive method in which the value of h = h(f(x_i)). The peak of the distribution, we use smaller smoothing. In the tails of the distribution, where statistics are poor and we are sensitive to statistiacl fluctuation, we use a larger amount of smoothing. KDE can be expanded to two dimensions by multiplying together two kernels:
    KDE 4 image

    The two-dimensional density estimates for an input signal mass of 172 GeV/c2 and JES=0.0 for the Lepton+Jets subsamples are shown here:
    1tag2d 2tag2d

    In the Dilepton channel the MtNWA and mT2 variables are correlated. This is captured by the KDE technique as shown below:
    0tag2d tagged2d

    We apply this technique to obtain pdf for each Mtop, JES Monte Carlo sample that was generated as well as the background samples generated at a range of JES values. Note that the KDE techinque is applied separately on background subsamples, taking into account MC event weights and then added using relative subsample normalizations to form the background pdf's. The Lepton+Jets background pdf for JES=0.0 is shown below:
    1tag2d 2tag2d

    The plots below show the Dilepton subsample backgrounds:
    1tag2d 2tag2d

    Likelihood and Local Polynomial Smoothing

    We minimize the extended likelihood with respect to the top mass, JES and signal and background expectation to obtain the measurement as well as statistical uncertainty. The form of the likelihood for subsample k is shown below.
    where ns and nb are signal and background expectations and N is the number of events in the subsample, Psig is the signal probability density function and Pbg is the background probability density function. mi and yi denote mtreco and mjj or mtNWA and mT2 depending on the sample. nb0 is the a-priori background estimate and sigma_nb0 is the uncertainty on that estimate.
    Kernel density estimation allows only for calculation of probability density function at the values of the top mass and jet energy scale where Monte-Carlo samples are available. To evaluate pdf at arbitrary Mtop for each event we use local polynomial smoothing. A fit to a quadratic polynomial will be performed using the values of PDF calculated using the KDE method. The points near the required value have a higher weight than points away from the required point. Deweighting is performed using a 'tricubic' function with width of 10GeV for the Lepton+Jets samples and 15GeV for the Dilepton samples in the Mtop direction and 0.8 sigma_c in the JES direction. Value of the quadratic fit at the required (Mtop,JES) point is used as the value of PDF.

    Method validation

    To ensure that the method is unbiased and the estimate of statistical uncertainty is valid we perform ensamble tests. We repeatedly draw events from the signal and background model mimicking possible variations of signal and background numbers that may occur in data. A mass measurement is performed on each of these pseudo-datasets. Knowing the Mtop and JES of the dataset from which the signal events were drawn we can form residuals (M_top_fitted-M_top_MC) and pulls ((M_top_fitted-M_top_MC)/returned uncertainty) as well as similar quantities for the JES calibration. Ideal performance would yield 0 residual and pull distributions centered at 0 and with width 1. The results of the ensamble tests are shown below:
    residuals residuals
    residuals residuals
    residuals residuals
    residuals residuals
    residuals residuals The bias check was performed for 14 mass points at JES=0.0sigmac and for three mass points at non-zero JES. The color code for the JES values is provided in the legend below:
    The pull width for the Combined and Lepton+Jets fits departs from 1, thus we scale up the uncertainty returned from the fit by 2%. We only use the bias check performed at JES=0.0 to determine bias and pull width scaling.

    Systematic uncertainties

    The contributions to the systematic uncertainty for the Combined, Lepton+Jets and Dilepton fits are shown below. The dominant effect on the Combined and Lepton+Jets fit is the residual jet energy scale and generator systematics. We model the jet energy scale as a single parameter, which is an over-simplification resulting in the Residual JES uncertainty. The generator systematic is comming from comparing pseudoexperiments generated with Herwig and Pythia. Since the Dilepton-only measurement has no in-situ calibration the JES uncertainty becomes dominant for this channel.

    Fit and results

    We perform fit to the data using both Lepton+Jets and Dilepton channels and measure Mtop = 171.7 +1.4-1.5 (stat.+JES) +/- 1.1 GeV/c2 (syst) = 171.8 +1.8-1.9 GeV/c2. The likelihood contours for the combined fit are shown below:
    In the Lepton+Jets channel only fit we measure Mtop = 172.2 +1.5- 1.6 (stat.+JES) +/- 1.1 GeV/c2 (syst) = 172.2 +/- 1.9 GeV/c2. The likelihood contour plot for the Lepton+Jets only fit is shown below
    In the Dilepton only fit we obtain Mtop = 169.3 +/- 2.7 (stat.) +/- 3.2 GeV/c2 (syst )= 169.3 +-4.2 GeV/c2. The likelihood profile is shown below:
    In the Dilepton only using mT2 fit we obtain Mtop = 168.0 +4.8-4.0 (stat.) +/- 2.9 GeV/c2 (syst) = 168.0 +5.6-5.0 GeV/c2. The likelihood profile is shown below:
    We perform pseudoexperiments using observed number of events to evaluate the probability of obtaining the uncertainty found in data. Results are shown below.
    residuals residuals
    residuals residuals

    The reconstructed top mass distribution from the data with overlayed background and signal template fitted is depicted below.
    residuals residuals
    residuals residuals
    residuals residuals
    residuals residuals

    The comparison between data and estimation in the lepton jet channel for a couple of kinematic variables are shown here.
    residuals residuals
    residuals residuals residuals residuals
    residuals residuals residuals residuals
    residuals residuals residuals residuals
    residuals residuals residuals residuals

    The comparison between data and estimation in the dilepton channel for a couple of kinematic variables are shown here.

    residuals residuals

    residuals residuals residuals residuals
    residuals residuals residuals residuals
    residuals residuals residuals residuals

    F. Abe et al., Nucl. Instrum. Methods Phys. Res. A 271, 387 (1988).
    A. Abulenci et al., aXiv:0809.4808, submitted to Phys. Rev. D.
    A. Abulenci et al., Phys. Rev. D 73, 032003 (2006).
    A. Abulenci et al., Phys. Rev. Lett. 96, 022004 (2006).
    B. Abbott, et al., Phys. Rev. D 60, 052001 (1999).
    A. Abulenci, et al., Phys. Rev. D 73, 112006 (2006).
    Chris Lester and David Summers, Phys. Lett. B 463 page 99-103, 1999; Alan Barr, Christopher Lester, and Phil Stephens J. Phys. G29:2343-2363, 2003.
    M. Burns et al., arXiv:0810.5576.
    W. S. Cho et al., Phys. Rev. D 78, 034019 (2008).
    F. Abe, et al., Phys. Rev. D 45, 1448 (1992).
    T. Affolder, et al., Phys. Rev. D 64, 032002 (2001).
    A. Bhatti, et al., Nucl. Instrum. Methods Phys. Res. A 566, 375 (2006).

    Hyunsu Lee for the TMT group

    Last modified Feb 13, 2009