Back to NN xsec webpage Winter 2004 Questions from status report

Questions from status report, 02/19/2004


  • Q from Andy Hocker: Have checked data-MC agreement for the input variables Can you check correlations too ?.

    A: Have compared the distribution of the correlation coefficients in data with a mix of 10% ttbar MC events and 90% W+3p, looking in the the Nj=3 exclusive mode. The agreement between data and MC is good (CDF Note 6897).


  • Q from Igor Volobouev: Adding systematics quadratically - are any systematics correlated ?.

    A: Don't think this is any different from fitting 1 variable. When certain effects contribute both to the acceptance and the shape systematic the two contributions are 100% correlated and we add them linearly. On the other hand, systematics originating from different sources are considered uncorrelated. Please note that we calculated the systematic errors piece by piece, including all the components for the jet correction systematics. With two exceptions we find a general trend downward in going from single-variable Ht fit to a multiple-input NN fit. The two exceptions are the "out of cone" and "splash out" components of the jet energy corrections. The above two contribute with ~2% each, which is a small fraction of the overall systematic (~21%).


  • Q from Tony Vaiciulis: Could you do a multi-dimensional analysis instead ?.

    A: A multivariate likelihood fit: a straightforward approach but need more statistics for good performance. Also by using a single hidden node we tried a linearized version of the NN. Average fractional error: 16.8% (7 hidden nodes) versus 18.7% (1 hidden node). Systematic error: 19.1% (7 hidden nodes) versus 28.7% (1 hidden node). We interpret this as an indication that a NN approach is more powerful than a linear discriminant analysis.


  • Q from Jaco Konigsberg: Since you are not training on other background than the W3p sample, the NN shape will be different in those cases. Why don't you assign a systematic for this effect ?.

    A: We have already assigned a systematic for the QCD-fakes background. We were using the theoretical cross sections in adding the contribution of Wtau3p, dibosons, Z's, single-top to the overall W-like shape. We will add a systematic for doing this. The expected contribution to the systematic fraction is 2% and was calculated by by doing pseudo-experiments and fitting with/without the smaller backgrounds and taking half of the average difference. For a plot comparing the W3p NN-output shape to the NN shape of all other ewk backgrounds mixed appropriately look here.

    Questions received after preblessing, 03/04/2004



  • Q from Ken Bloom: Table 4 in the note has the rate of correct classification using one-variable nets. Presumably you made some cut on the NN output to determine the classification for each event. Was that choice of cut optimized?.

    A: The cut was placed at 0.5 since for calculating the numbers in this table we used a balanced set of events - equal number of signal/background events.


  • Q from Ken Bloom: Looking through the Table 5 in the note showing the correlations between different variables, it doesn't look like your choices are all that uncorrelated. So how did you choose the variables that you actually use? Could you have gotten a better net with different variables?

    A: We did try initially to construct the NN looking at the correlations between variables. After we did an iterative study on a large number of input variable combinations as detailed in the note, we found many different combinations providing comparatively good performance. As a result we abandoned this track and looked at the expected systematic error expected for each net. This procedure is time consuming so we studied 42 different NN with the number of input variables ranging from 1 to 20. The selection of each NN was based on these previous studies we made but also on guesswork. We see the NN-fit improving with respect the statistical/systematic error the more information we add. Certain combination of variables perform better especially with respect to the systematics. We do not see large variations in the fit fractional error for a given number of input variables. Based on Figure 5 in the note alone, we choose a 7 input NN as being reasonably close to the overall best performance we were able to obtain and still not very "complicated". There is certainly room for optimization here, but probably not much, considering the performance numbers we found for the larger NNs.


  • Q from Ken Bloom: The net was trained on a "balanced set of events" (page 10 in the note). The sample you are looking for ttbar in is definitely not balanced. Is this an issue ?.

    A: We are using only the shapes of the NN output distributions to fit the data. An unbalanced set of training events, say more ttbar events, will make the ttbar NN-output lie much closer to its target value of 1 but on the other hand will flatten-out the W+jets shape. We did not see any real gain in fit sensitivity by using an unbalanced set of training events.


  • Q from Ken Bloom: In Table 10 in the note, what means "5.3 - lept ID" and "0.78 - Acc"?.

    A: These are the acceptance components of the respective systematics: 0.78% for ISR and 5.3% for PDF, will add linearly to shape contribution.


  • Q from Ken Bloom: On the systematics, can you break out which ones are on normalization only and which affect the shape too ?. Is this an issue ?.

    A: Besides the two cases just mentioned mentioned we have the Acceptance, Luminosity and LeptonID scale factor terms which will contribute exclusively to the normalization. The Q^2, ewk. and ttbar-Generator factors affect only the shape. The QCD term is a sum of the QCD-model systematic and QCD-fraction systematic. Jet correction systematic have contributions both from the acceptance and shape, see page 37-38 of the preblessing talk . In general we handle each term in identical fashion with the Ht-fit systematics but use the NN shape instead.


  • Q from Ken Bloom: Training on 4k events. Is it enough ?.

    A: For our 7-in/7-hidd/1-out NN we have 71 free parameters that need to be trained. One "rule of thumb" states that one needs at least 10 times more training events than the number of adjustable parameters to properly train a NN. We believe we are safe from this point of view. By looking at the NN output in an sample independent from the training sample we see no evidence of over-training. So in the end we use the whole available statistics in order to get the MC shapes: 33000 ttbar events, 8300 W+3p events.


  • Q from Jason Nielsen/Ken Bloom: Can you show us the NN output in W+1 and W+2 jets, and demonstrate that you are modeling the non-ttbar regions well ?.

    A: Our NN has two input variables which are dijet quantities: minDijetDeltaR and minDijetMass. For this reason we can look only in the two jet bin. Since we do not have the third jet we set EtJ_{345} to 20.0 GeV. We compare electrons , CMUP and CMX muons with the appropriate W+2p MC samples using the NN trained in the 3 or more jets mode and see very good agreement.


  • Q from Ken Bloom: Print out of the NN function ?.

    A: The NN function can be found here, the definition of the variables and instruction for using it in a Root macro here.


    Questions received at the blessing, 03/18/2004



  • Q from Avi Yagil: Why do the systematics go down with the number of variables?.

    A: The NN shape has better separation for ttbar/wjets than other simple variables. We believe that as the separation gets better the systematic will decrease. In the limit of complete separation, the shape systematic should be zero.


  • Q from Avi Yagil: What about FSR and ISR systematics ?

    A: The systematic from ISR was included in our result. The FSR systematic is only partially covered by the Q^2 systematic. Will work on this.


  • Q from Avi Yagil: How do you take into account the correlation of all the variables-- that they are right in the MC?

    A: We have compared data to MC with respect to the variable correlations. Please see the answer given to Andy Hocker above.


  • Q from Tony Liss: Can you show us the shape plots but using the output fractions for the fit?.

    A: Plots comparing data to a mix of various MC's using the ttbar fraction from the NN fit (17.6% for Nj>=3 sample): Aplanarity, MaxJetEta, Ht, Etj345, MinDijetDeltaR, MinDijetMass and SumJetPz/SumJetEt. The data and MC histograms are normalized to equal area.


  • Q from Jeremy Lys: Cross sections are given for .ge.3 jets and for .ge.4 jets. Do those agree? I can't tell from the numbers given because of correlations.

    A: We made pseudoexperiments to see how fitting the Nj>=3 and Nj>=4 mode are correlated. Results can be found here, the red point indicating the actual fit results. The X-axis represents the fitted ttbar fraction in the NJ>=3 sample, the Y-axis represents the fitted ttbar fraction in the NJ>=4 sample. In each channel we use the appropiate set of NN shapes to make the fits. The number o events in pseudoexperiment fluctuate. Central ttbar fractions in both samples correspond to a ttbar x-sec of 7pb. The total average number of events is takean as in our data sample 519,188 for Nj>=3 and Nj>=4 respectively.


  • Q from Jeremy Lys: Does the NN fit (as in fig 16 of cdf 6897) give a satisfactory fit.? It looks reasonable, but some quantitative measure would be good.

    A: The relative error returned by the fit compared to distribution of fit fractional error from pseudoexperiments is shown here .




    Back to NN xsec webpage