Questions since the preblessing:

Question: Can you show the contamination of the ttbar and W+jets in region ABCD explicitly? (Takasumi Maruyama, Un-ki Yang)

Answer: Yes.
>= 3 Jet Before Correction
A B C D
Ele
309 1282 165 575
Muo
58 968 173 361
Total 367 2250 338 936
>= 3 Jet after Correction
A B C D
Ele 292.6 1280.1 138.5 575.0
Muo 47.1 967.1 144.5 361.0
Total
339.7 2247.3 283.0 936.0

>= 4 Jet, Before Correction
A B C D
Ele
40 135 42 130
Muo
7 129 44 80
Total 47 264 86 210
>= 4 Jet, After Correction
A B C D
Ele 36.6 134.4 33.2 130.0
Muo 4.8 128.5 34.3 80.0
Total 41.3 262.9 67.5 210.

Question: For systematics, how do you operate the stat uncertainties of shift? e.g. JES 8.3+-XX%?  (Takasumi Maruyama)

Answer:  Since we have such high statistics for the MC samples, the uncertainties on the acceptance systematics are negligible.  If we assume that the uncertainty from the shape systematic comes from the spread in the PE results.  For the >= 3 jet JES, we have

 
(Note: These values use jetCorr04a) Shift RMS
+1 sigma 7.72% 1.63%
-1 sigma -3.02 1.32%
total (+1 sig - -1 sig)/2)) 5.37% 1.05%


Question: Some of systematics in slide28 which should be same as Gen4 was dramatically changed. For example, other EWK was changed from 2.0% to 0.3%. (of course, it does not matther final answer though) These are all due to the MC stat?  Also, why ttbar generator uncertainty was increased twice?  (Takasumi Maruyama)

Answer:  In general, several of the systematics were estimated with Monte Carlo samples with small statistics.  In addition, changes in event reconstruction and selection, jet energy scales, etc. can lead to some changes in systematics.  Furthermore, some systematics, like the ISR/FSR systematics are now being estimated with samples that are different both in sample size and the parameters used to generate them.  All of these effects can contribute to changes in the systematics between Gen4 and Gen5, but as you observed, none of these changes have an impact on the overall systematic.  In the specific case of the electroweak backgrounds, it was discovered that there was a rounding error in the way the electroweak samples were being combined to form the templates.   Correcting this error changes the systematic from the electroweak shape from 0.3% to 1.0% in the >= 3-jet sample.  For the ttbar generator uncertainty, I had quoted the shape systematics as equal to the full shift from PE's while the Gen4 note used (shift/2).  I've changed my number to use the Gen4 definition.

Question:  What cross section do you get if you restrict yourself to the Gen4 data sample?  (Un-ki Yang)

Answer:  To study this, we divided our data sample into two parts:  one that used the same good run list that was used in Gen4 (version 4) and one that used the runs from the Gen5 good run (version 7) list that weren't on the Gen4 list.  The results of this study are shown below:

>= 3 jets
Sample
#Obs sigal fraction cross section (pb) stat err (pb)
Gen4 519 0.176 6.7 1.1
Gen5 (Full Sample)
936 0.158 6.0 0.8
Gen5 (Gen4 Runs Only)
501 0.143 5.2 1.1
Gen5 (Runs not in Gen4)
435 0.175 7.0 1.3


>= 4 jets
Sample #Obs signal fraction. cross section(pb) stat err (pb)
Gen4 118 0.473 7.5 1.6
Gen5 (Full Sample) 210 0.385 6.1 1.1
Gen5 (Gen4 Runs Only) 115 0.355 5.6 1.5
Gen5 (Runs not in Gen4) 95 0.418 6.8 1.7

In interpreting the results in the table above, it is helpful to remember a couple things:  The Gen4 results use a top mass of 175 GeV while all the Gen5 results use 178 GeV.  Also, major changes in the jet energy corrections and track reconstruction mean that events may not be reconstructed the same way.  In fact, the overlap in events selected in Gen4 and Gen5 from the same data sample is not 100%.  For the Gen4 run range, there are 519 events in the Gen4 >=3-jet same and 501 events in the Gen5 sample.  Of these, 408 are common to both.  There are 111 events that appear in the Gen4 sample that aren't in the Gen5 sample.  There are 93 events that appear in the Gen5 sample that don't appear in the Gen4 sample.  However, we have checked and the NN outputs for the common events between Gen4 and Gen5 are highly correlated (see here).

Question:  What is the dependence of this result on the top mass?  (Many)

Answer:  See the table below:
Mass [GeV]
Sigma [pb]
160 7.6
170 6.6
178 6.0
180 6.0
190 5.3


Question: Do you make sure that you shuffle the top and W+jet MC events as you train the network?  If you first feed all the top events and then you feed all the W+jet events, the NN could just memorize (select the right answer/output without really checking the kinematics) (John Strologas)

Answer:  Yes, the Jetnet code we use to train our neural network presents the events to the network in a random order during training.

Question: The 4-jet distributions looks better when you train for 1000 epochs... What if you stop at the lowest error within the 1000 epochs (and not exactly at 1000)?  What if you stop at the lowest error in 2000 epochs? (John Strologas)

Answer:  We do not understand what you mean when you say the shape looks "better."  For the most part, the specific shape of the neural network distribution for one training sample or another is arbitrary.  The relevant quantity is the ability of the neural network to separate signal from background.  To address the issue of whether any of these neural networks has a discernable difference in its ability to distinguish signal from background, we conducted pseudo-experiments and examined the distribution of fit fractional errors we obtain with several different neural nets as described below:

Below is a table comparing the fit fractional errors from the pseudo-experiments performed with each of these networks.


Fit Fractional Error from Pseudo Experiments
Neural Network
Median
68% Interval
Gen4
0.259
(0.214,0.323)
Default Gen5 (147)
0.254
(0.212,0.314)
Gen5 (1000)
0.251
(0.210,0.310)
Gen5 (2804)
0.252
(0.209,0.312)

We do not see any advantage to using a neural network that has been trained for a larger number of epochs, so we will stay with the network obtained with our default procedure.

Question: Even better, you should not limit yourself to a specific maximum number of epochs.  You could just keep training, until the error starts increasing (overtraining), and this is how you decide when to stop.  (John Strologas)

Answer:  There are a number of valid strategies for determining the stopping point for neural network training.  The approach we take is to perform an initial training of the neural network for 400 epochs.  We use this training to determine the epoch in the with the lowest testing error and retrain the network stopping at this.  This procedure was established during the previous round of analysis as a way to avoid overtraining.  For consistency, and because we see no advantage to continuing the training further, we will use the same procedure that we used previously.

Question: Do you use separate MC samples for training and for deciding when to stop training? (John Strologas)

Answer:  Yes, in addition to the training sample, we have a testing sample.  We stop the training at the epoch for which we have the lowest error in the testing sample.

Question: You wrote that you used the same number of signal/background events. Did you try to use the expected "realistic" mixture of signal and background events? If you use the correct a-priori probability for each event, then the output of the neural network could be interpreted as the Bayesian a-posteriori probability that a pattern is signal or background if the neural network uses a quadratic loss function for the training... (Svenja Richter, Jason Nielsen)

Answer:  We did not try using a more realistic mixture of signal and background events.  Because we are simply using the neural network to do a shape fit for the signal and background in the sample, the exact interpretation of the neural network output values as a probability is not so important.  It can be noted that our NN output values can be interpreted as the signal fraction in a sample with equal amounts of signal and background.  In other words, in a sample with equal amounts of signal and background, an event with a neural net output of 0.7 has a 70% chance of being a signal event.  It would be interesting to try the NN training with a more realistic sample mixture, but at this time we are limited by the available statistics for the backgrounds.

Question: In the both CDF notes (7562,6897) there is a lot of discussion about the selection of the variables. Naively, when one uses a NN method, all the measured variables in the ttbar l+jets events (~18) could be selected without to care about the correlations. When the NN system is trained it should select the best, in the linear approximation, use of these variables. For example I am not sure that HT variable is better than the use of all its components separately, the Et of the jets and Pt/Et of the leptons. Did you try this type of the NN separation?  (George Velev)

Answer: In determining which variables to use in the neural net, a large number of combinations were tried.  The complete list of variables explored is given in CDF note 6897, but as an example, you can see that the individual jet energies and the missing Et were considered separately, in addition to considering combination variables like Ht.  From Figure 5 of CDF Note 6897, you can see that there is little statistical advantage to using more than 7 variables and in some cases, including more variables results in a worse systematic error.  Some variables are more sensitive to certain large systematic errors (like the jet energy scale or Q2) and including these variables can sometimes lead to larger systematic errors.

Question: The aplanarity seems to have not very good separation power. For example Etj2+Etj3 seems to have better signal to bckg separation (comparing the picts. from 7562 page 7 top/right with CDF6897 page 9 bottom left).  (George Velev)

Answer:  Looking at Figure 6 from CDF Note 6802, you can see a detailed study of the separating power of individual variables.  Without a doubt, there are many variables (including Etj1 + Etj3) that are more sensitive individually than aplanarity.  However, there are two good reasons for including a variable like aplanarity in a neural network:  (1) Since it is a shape variable, it is sensitive to different systematic effects than the strictly energy based variables, and (2) although it may not have the best separation when considered alone, neural networks take into account correlations among the different variables; when combined with other variables, it may provide better separation.  It is difficult to see these things when looking at variables individually.  This is why we trained a large number of neural networks using a variety of variable combinations before selecting the final set of variables.

Question: My understanding is that you select the variables and NN parameters minimizing against the stat. error. What about optimization about the systematic errors? Or the total error? (George Velev)

Answer:  Both statistical and systematic errors were considered in selecting the input variables for the neural network.

Question: It is visible, specially in W+>=4 jets that, page 10 fig 3 left, CDF 7562, that the training sample perform better than testing sample at large epochs. This could be an indication for overtraining or/and for a small training sample. Which is the case? If you increase or decrease the training sample what changes on the picture? (George Velev)

Answer:  Our training sample should be adequate given the number of weights in the neural network.  Therefore, we believe this effect results from over training. It is for this reason that we stop the training at the epoch given the smallest training error before epoch = 400.  For the >= 3 jet case, the stopping epoch is 68.  For the >= 4 jet case, the stopping epoch is 147.

Question: On page 3 CDF7562 is written: “To alleviate the problems with double counting between the ME and the parton shower (what you mean fragmentation or hadronization, or both) only the W+N parton matrix element is used to model W+N jet bin.” I do not understand how this help. I think tt is better to mix W+1,2,3,4,5 jets and after this to select the bin. Do you have idea what kind of systematics create this shortcut? (George Velev)

Answer:  The issue is that when one combines a matrix element monte carlo with a parton showering algorithm, one has to be careful not to overcount in the regions of phase space that overlap between the two calculations.  Consider a W+3 jet event:  It is possible to generate such an event from a W+3p matrix element Monte Carlo.  However, it is also possible to generate such an event using a W + 2p matrix element with an additional jet arising from the parton shower.  There is an overlap in phase space between these two cases and including both results in double counting.  The best way to handle this would be to veto some events in the overlap region to prevent double counting (for example, to prevent the parton shower from populating a region accessible to the matrix element calculation).  Although two such approaches have been developed (MLM for ALPGEN+HERWIG and CKKW for MADGRAPH+PYTHIA) we lack Monte Carlo samples with sufficient statistics from either approach to use in this analysis.  As a compromise, we avoid double counting using the prescription that we will use only W+ N parton Monte Carlo to model W + >= N jet events.  The validity of this approach is demonstrated in CDF Notes 6802 and 6897 using W+1,2, and 3 jet (exclusive) events.

Question:  In final result you quote the fit from W+>=3 jets because of the better stat. error. I could image that if you divide the sample of the w+3jets and W+=>4 jets, these two sub-samples have a very different ratio signal/bck, you may gain additional statistical power. In addition I expect W+=>4 to have a better systematic error than W+>=3 and maybe better overall error (see the question below about systematics) (George Velev)

Answer:  It is true that the ratio of signal to background in the >= 4-jet sample is better than in the >= 3-jet sample.  However, the loss of statistics, coupled with the fact that the kinematic separation between signal and background in the 4-jet sample is worse than in the 3-jet sample makes the >=3 jet sample the best for doing this measurement.  In addition, it turns out that the systematics in the >=4-jet sample are indeed worse than in the >=3-jet case (more on this in response to the next question).  We are considering ways to fit simultaneously the ==3-jet and >= 4-jet samples for future iterations of this analysis.

Question: Naively I could expect that your systematic is smaller in W+>=4 jet mode than in W+>=3 mode. This is because first sample has better Sig/bg ratio and the shape effects, for example, should be smaller. It is opposite. Why? (George Velev)

Answer:  The systematics are dominated by the Q2 and the jet energy scale (JES).  The Q2 affects mainly the W+jets background and the JES affects both.  However, this is not a counting experiment.  It is a shape experiment, so effects that alter the shape of either the signal or the background can have an effect on the fit results.  The 4 or more jet requirement means that there are more total objects that are determined by the JES and the Q2, and hence more opportunity for correction.  With the HT analysis this was even more dramatic.  For example, when you have three jets to add a shift to you get less of a shift in HT than you do when you have 4 jets that are shifted.  One could extrapolate this to different kinematic variables and realize that the number of objects in the event matters.

Question: When you calculate the JES effect on the acceptance did you propagate back the shifted jet to level 4 to see if it passes the 15 GeV cut? (George Velev)

Answer: At all times in this analysis (making cuts, calculating kinematic quantities, etc.), jet energies are corrected using level 4 corrections.  For the JES systematics, the shift in the jet energy is calculated at level 4 using the procedure recommended by the jet energy corrections group (with jet energy corrections jetCorr04b).  Our jet energy systematic has two components: an acceptance part that is determined by whether the shifted jet passes the 15 GeV cut, and a shape part that depends on how the kinematic distributions change with the change in the jet energy scale.

Question: On the issue of the choice of 7 variables that go into the NN: In 6897 there is a detailed discussion on the choice of variables that go into the NN (including answers to questions on that subject). The conclusion is that adding real information helps, up to a point and 7 variables seem to be a reasonable choice. It also talks about how there is no optimal choice for which variables to use, different set giving similar performance. I think this is all fine, but I still want information as to why choosing those 7. Specially that in the PRD and 6802 there is a plot of the expected statistical sensitivity for each variable and the 7 chosen are not the 7 best, far from it. As you say, in the end it wont make much of a difference, but it would help to give an example of why some variables were used (for example aplanarity) based on correlations with other variables, etc. Or alternatively, giving an example of 2 sets of variables tried and showing how the performance is very similar, I don't think I saw this in 6897 or may be I missed it. (Veronique Boisvert)

Answer:  The final combination choosen was the combination of seven variables that gave the lowest expected systematic error.

Question: On the Maximum likelihood fit: I couldn't find the size of the MC samples for all of the backgrounds (although I'm sure it's there somewhere). Do you have enough statistics of your MC samples in each of your fitting bins? If you don't have 10x the amount of your data (347pb-1) for each of your MC/data samples representing each of your background (including other EWK background...) then you need to include the effect of finite MC samples in your maximum likelihood fit as described in: Barlow, Beeston, Computer Physics Communications 77 (1993) 219-228. (Veronique Boisvert)

Answer: Of the three templates used in the fit (signal, EWK backgrounds, and QCD), the only one that suffers from poor statistics (compared to the data) is the QCD template. However, in evaluating the systematics for the QCD template, we use two rather extreme variations: We compare the difference between our default template (non-isolated leptons from the data) and an alternate template (conversion electrons from the data), which also has limited statistics. The effects of statistical fluctuations between these two templates is included in this systematic. In addition, we vary the normalization of the QCD component by a factor of 0.5 and 2. Both of these extreme variations produce relatively small systematic effects and we believe that they are sufficient to represent the uncertainty.

Question: On the issue of the other EWK background: It is not included in the training of your NN because of statistics issue. You do get a NN output shape and assign a systematic based on including this in the fit or not. By the way, in the text you say it's 2% but in the table it's 0.3% (the Gen4 number was 2%), this is just a typo I assume, but it is really the case that the increased statistics between Gen4 and Gen5 is the reason for the significant reduction in this systematic? You also say that all the other EWK backg are normalized to the W background according to the theory cross sections of each of those, is it clear that the 0.3% covers the systematic uncertainty related to varying each of those cross sections and modifying the relative contribution of some of these other EWK processes? I'm also not completely sure that putting this contribution or not in the fit really represents the effect of not including these backgrounds in the training of the NN, but I would need to think about this some more. (Veronique Boisvert)

Answer:  The 2% number in text was a typo that has been corrected.  In addition. the 0.3% systematic has been updated to 1.0%.  It was underestimated due to a rounding error in the normalization of the EWK backgrounds when they were combined together.  If you compare the shape of the NN distribution for the W+3p MC to the shapes for all the other EWK backgrounds combined (Figure 6 in CDF Note 7563), you can see that the distributions have a very similar shape, so a 1% systematic seems reasonable.  Finally, it should be noted that the EWK shape systematic (evaluated by comparing the full set of EWK backgrounds to just the W+3p backgrounds), is not really meant to evaluate a systematic for omitting the other backgrounds from the NN training.  Because we are fitting to NN output templates, there is nothing wrong with including additional classes of data in the fit that weren't used for the training.  The 1% EWK shape systematic is included to account for the fact that we don't know the actual relative normalization between the W+3p background and all the other EWK backgrounds. To evaluate coming from this uncertainty in background normalizations, we take the extreme case of setting all the normalizations except the W+3p to zero.

Question: Issue of NN distribution looking funny for >=4 jets:  It does seem counterintuitive that "such" a difference in shape between Gen4 and Gen5 would have so little effect, as was shown in the preblessing talk. I would not expect a big difference, otherwise NN would be too dependent on the epoch at which the training stopped, but I would not have been surprised by a few % difference instead of a few tenth of a % difference. May be I want more information on how the PE were generated to come up with the plots on page 22 of the preblessing, since it's not just the shape of the templates used in the fit that need to be modified, the data decision will change if those shapes look different. (Veronique Boisvert)

Answer:  To make pseduo-experiements to evaluate the sensitivity of the various NN shapes, we randomly drew events from the signal and background distributions using the expected top cross section 6.1 pb for a top mass of 178 GeV and 347 pb-1 of data.  We then fit the distribution for these events using the templates.  The goal of these pseudo-experiments was to see if the difference in shapes made any difference in the ability of the fit to determine the signal fractions.  The result of the pseudo-experiments is that these shape difference don't matter for discriminating signal from background.  This is not that surprizing when you consider that despite the shape changes the ratios of signal and background above and below NN = 0.5 do not change much.

Question: Issue of the 4.4% for the generator systematic in >=4 jets:  I was out of town during the preblessing talk, looking at slide 30, do you mean that in Gen4 the effect of the difference between Pythia and Herwig was not taken into account and that is why 4.4% is greater than 1.4%? (Veronique Boisvert)

Answer:  The Gen4 table from which I took the ttbar generator systematic number did not include the effect of Herwig having >= 4 jets more often than Pythia (the Gen4 number in these tables was the same between >= 3 jets and >= 4 jets).  This explains the difference between the Gen4 and Gen5 numbers I showed.

Question: For both >=3 jets and >=4 jets using JetCorr04b, the acceptance uncertainty went down, but the shape uncertainty went up so that the overall change is very small. The prediction was that using JetCorr04b should give better error (full status talk), is this understood? (Veronique Boisvert)

Answer:  The expectation was that jetCorr04a was overestimating the uncertainty by a couple hundred MeV per jet.  Given this, we did not expect to see a large change by switching to jetCorr04b.  As expected, jetCorr04b gives a small improvement in the acceptance systematic.  At the same time, there is a small increase in the shape systematic.  Although this isn't what we naively would have expected, the effect of jet energy scale shifts, particularly on the background shape, is difficult to predict.  Because the change in shape systematic was relatively small between jetCorr04a and jetCorr04b, we did not make any further investigations.