Report of Level 2 Trigger Review (7 December 2001) Dear Colleagues, We thank the L2 trigger group for all their hard work and progress to date, and to the review committee for their prompt and succinct report, especially the chair, Mike Lindgren. We have summarized what we understand from the report, separated into three timescales: This is followed by the report of the committee. By January 1. The highest priority for CDF is to establish the "top" L2 crate as an operational system for physics running in January. This system will in general not be available for trigger development. Any special L2 running will be requested at the weekly Ops/TDSWG meeting. In order for the system to be operational: a. The L2 group must assure adequate spares -2 per board type at B0, one in "bottom" crate and one in a cabinet. We would like the trigger SPLs to make this an agenda item at the regular trigger meetings. b. The group must provide more depth in the pager/expert coverage at B0 for both hardware and software. 2. The existing baseline, as described at the review by Stephen Miller, will be used for initial physics running. No further work on backup solutions need be pursued at this time. 3. A high priority to complete this initial system is to complete the development of the alpha software. Spring 2002 4. Bus arbitration issues (with the software delay workarounds) may limit trigger rates by ~summer. a. Therefore, there should be an aggressive schedule for testing the new Magic Bus backplane. b. We will schedule a review of the backplane performance in early February by the usual engineering subgroup. Bob Dematt should arrange this review. c. Answers are needed by March so that course corrections can be made in a summer shutdown. d. In the mid-term, there should be an effort to try to establish the "bottom" L2 crate as a viable stand-alone system for further development effort (for example, the new Magic Bus). This will likely include the development of splitters for the L2 inputs. By Summer 2002 5. In the longer term, the test-stand system being developed by Ted Liu and Bill Ashmanskas should be aggressively pursued. This will allow completion of the development effort and longer-term maintenance of the full system. As recommended by the committee, we will schedule a workshop for the L2 group to discuss the specifications for this system --January 17th trigger meeting? Jeff and Nigel - ---------------------------------------------------------------------------- Level 2 trigger review report 14-dec-2001 An all day review of the CDF Level 2 trigger system was held on Friday, December 7. The Committee consisted of Aesook Byon-Wagner, Bob DeMaat, Bill Foster, Michael Lindgren(chair), Luciano Ristori(ex-officio), Harold Sanders, Michael Schmidt, Mel Shochet, Rick Van Berg, and Peter Wilson(ex-officio). The committee was formed and the charge given by the CDF operations department heads, Jeff Spalding and Nigel Lockyer. Because of the desire to "freeze" the system status on January 4, 2002 the committee report is required as soon as possible, so we have written the report in plain text, not formalized the language, and collected our comments in two catagories. The first is a compilation of reviewers general observations about the Level 2 trigger system or observations not directly addressed in the charge(such as the proposal to split signals to top and bottom crate). The second addresses each point in the charge received by the committee. The Committee would like to thank all the speakers, who clearly worked hard to prepare excellent presentations and were quite patient in answering our numerous questions. Overall it is clear that a number of people, those at the review and some who were not, have done a tremendous amount of work on the system this year, and have overcome a list of problems almost too daunting to contemplate, and nothing that follows in this report should take away from that. General Observations The committee really appreciates the progress that has been made over the summer, and in the last three months. The general consensus is that the present system, with ongoing work completed, is good enough to take us to next summer and probably beyond. So, we recommend the existing baseline, as described by Stephen Miller, be used for January - March running. An aggressive schedule for testing the new Magic Bus backplane is needed so that by March we know whether a major modification to our level-2 plans has to be carried out in the summer. Time would be needed to plan and build any new hardware. Some members felt that an alternative implementation of the arbitration circuit for the bus interface is necessary. If the conversion of arbitration signals from PECL to TTL levels be deemed necessary, FNAL can make arrangements to have their best R&D techs available to modify the boards. They would have to leave the traces, pads and vias in the best shape possible because another round of modifications is a distinct possibility. If it proves necessary to go to TTL arbitration, then the switch to the boards could occur during the summer shutdown, and testing could be carried out using cosmic rays. The material presented suggests that a small scale review on the magic bus (following up the previous magic bus review with engineers) should be held before the beginning of March, even if the testing is only half done. It would be good to set a reasonable deadline to examine test results, as well as go over the rest of the engineering modifications to be made for remaining boards. If there is no sign of improvement on this, one might have to consider what the long term plan should be (>5E31 or year 2003) by early summer. Though the focus of attention is the noise on the bus arbitration signals, this J3 backplane will be the first that we are using with power supplied directly via its own cable. This should not present a problem but a test should be conducted to ensure that our implementation doesn't inadvertently create a noise problem. The bit problems on the cluster crate backplane are a concern. Effort is needed to identify the source of the problem and determine whether a new backplane is needed. The operations department should also make a written priority list that details how the scheduled down time each week will be allocated to each detector group, especially to the level 2 trigger. In addition, it was stated that there will be approximately 1 eight hour beam shift allocated bi-weekly for detector studies. The special runs request process should be made at the Thursday trigger and detector operations meeting. The committee also felt that the long-term solution for testing should be the system being developed by Ted Liu and Bill Ashmanskas. The committee, given the present status of the project, also sees no immediate need to invest manpower in implementing a backup solution as presented by Bill. Members felt it looks feasible but would require extensive software and integration. Some members also felt the backup solution could have higher ultimate performance, but also felt the baseline L2 will meet CDF requirements long-term, because fundamental rate limits come from other elements in the system. A proposal for passive optical splitters and an LVDS splitter on the inputs, so that the top and bottom trigger crates receive the same signals, was made at the review. This was considered a very interesting suggestion by several of the committee members. This should be evaluated to determine if the production trigger system could still function reliably while such splitters are installed and routing a copy of the signals to the lower crate. If this works, it would greatly increase the range of tests that could be conducted in the lower crate, separate from the main system, and would have these effects: - makes the bottom crate a truly "hot spare" with switchover w/o recabling - allows parasitic use of "CDF as Pulser" for algorithm testing, etc. - can trigger on events where top & bottom crates disagree, etc. - reasonable to head in this direction for testing after "Jan 4th" - ensures the spares are really working. The committee was instructed that the trigger table and databases are not part of this review. However, it might be beneficial if a short "readiness review" of these areas is held at the beginning of January. Below are more point-by-point responses to the charge. L2 Hardware - - ------------- 1) Most of the milestones presented at the Sep 2001 collaboration meeting have been met but a couple of key items still have to be addressed. Two highest priorities are - complete the test of new magic bus backplane (and any mods needed for the boards) - have more than one fully tested and working spare board of each type at B0 2) Recommend the scheme that allows stable running at luminosities up to 5E31 by early January : The existing hardware with the old backplane will likely be too marginal to allow stable running at 5E31. It looks like it will be ok to 1 or 2 E31 however. Even if the goal of 5E31 would not be met by January, one should press hard to achieve this by summer. The committee felt that it is important to strongly encourage prioritising the tasks along the lines described by Stephen Miller at the end of the meeting. 3) Proposed paths for commissioning the rest of L2 : Having a fully loaded bottom crate will relieve some of the scheduling burden of remaining commissioning and software test activities as well as strengthening the spares situation. Dedicated test stations other than the CDF detector itself would be a tremendous advantage and should be pursued aggressively. ** committee should recommend list *** 1) As stated above in the general recommendations, the best hardware configuration for operation consistent with the prescribed trigger table, beginning early January, is what was presented in Stephen Miller's talk as the "current L2 crate configuration" - the upper crate with the old magic bus backplane, delayed readout, more than one alpha processor, SVTList, Clist, L1, IsoList, Tracklist, and the four Reces boards. While some committee members felt a brief effort should be devoted to trying to make the arbitration work, most felt that running after Jan 4th should start with the firmware arbitration, and that changes to the arbitration circuitry should be deferred until we have the opportunity to fully test solutions in the lower crate. 2) the readiness of "backup solutions" & 3) whether the "backup solutions" should be part of the baseline trigger for early January : In the early January time frame, the backup solutions will not provide enough benefits to be preferred over the present approach. It was generally felt that given the relatively good state of the L2 hardware it is probably true that the SVT style backup is not immediately essential and should not be a major focus of activity in the near future. The work done to date however, has helped advance the state of the overall detector in a very significant way and the value of that effort should not be underestimated. 4) adequate spares exist for all L2 subsystems : Short answer is no. Status of existing (fully tested and functional) spares is marginal. Having more than one fully tested and working spare board at B0 (one plugged in lower crate, one stored in cabinet) should be THE HIGHEST priority hardware activity before the "Freeze date" of Jan 4. The committee's understanding of the spare board situation is: L1 Interface has one spare and one in the "stuffing stage" with no way to estimate the schedule. Clist has no spares - well one "flaky VME", but hardly comforting, plus one in test. Tracklist - need three boards for system, have one spare plus two with known problems. IsoList - one spare - no indication of others being built or debugged. Reces - need 4 in system, have 2 spares. In summary, the only board with two good spares is the one not presently actually running - not very comforting. The total number of available spares is extremely thin for such a central system. At the risk of repeating itself, the committee recommends building at least two totally working spares per board type (more if more than one board is used in the system) as a very high priority. In addition, it was noted that if boards are needed for the lower crate, only one should be used of any type if there are only two spare boards. Also, if these boards are modified for use in the second crate then they should be considered separate from the spares. For any board for which only one spare exists, the equipment and expertise needed to repair any failure should be resident in B0. The committee was pleased to hear that the D0 "L2 Beta" circuit board upgrade (anticipated by Myron's design) is proceeding. CDF should make sure that the D0 boards retain the hooks needed to make them useful to CDF as well, as they ease spares concerns and provide an upgrade path for the future. 5) Plan for L2 hardware pager coverage : For each interface subsystem, the plan is reasonably defined, including specific names being identified. For alpha boards, the cluster finder, and overall system operation, there is only one name enlisted as the on-site expert. This is far short of being adequate, especially given that from January through summer, the L2 system will have to remain operational in the upper crate, with system testing occurring simultaneously in the lower crate. The hardware pager list for alpha boards, the cluster finder, and overall system operation should be firmed up before the end of this year, with a minimum 2 or preferably 3 experts (or to-be-experts), in order to support 24*7 running as well as stay on the remaining test schedule with the bottom crate. It is clear to the committee that it is not acceptable from an operations point of view to list a single student (Heather Ray) as the primary pager contact, with all other experts off site and on call through her. These experts should reside in the Fermilab area and should be provided either by Michigan or by adding a new institution with co-equal responsibility for L2 software and hardware maintenance. It is clear that CDF needs to find a way to add to the efforts of the two key invividuals - Monica Tecchio, who is the only person capable of debugging some of the hardware boards, and Stephen Miller on the alpha boards. It is clear to the committee that they have done great things, but also that they have some unique expertise, which should be passed on to others to assist in the final commissioning and long term operation of the L2 trigger system. The committee would like to add some strong positive words about Trigmon, which appears to be in good shape, and is a very useful tool. Here again, the knowledge needs to be distributed amongst more than one or two individuals. The committee acknowledges the fine work of Nate Goldschmidt and Matt Worcester. Software - - ---------- a) & b) The software has come a long way and there is now a small group of people with at least some familiarity with the code. L2 alpha software, including physics algorithm software, system support software, and testing procedures appear to some to probably be adequate for the redefined scope of the stable running in January. However, it is difficult to judge if the software is in reasonable shape, since while it is good that the basic algorithms are written, they have not been put together (we don't have an estimate yet of the level-2 execution time), nor is it clear that all of the needed infrastructure for creating, storing, and cataloging new trigger tables is in place. It is clear that as the complete system is being commissioned, more effort will be needed to complete the remaining algorithms and also to test written codes. Given the list of remaining tasks to be done and the enlisted L2 software writers, the manpower for the L2 software tasks probably should be increased by an additional ~2FTE. This will be needed in order to adequately support the evolving and expanding L2 software during the commissioning and system test period, until steady state data taking at reasonable luminosity (2E32?) has been reached. Therefore, the committee recommends that a software coordinator(s) who can dedicate a large fraction of effort on L2 software activities and who does not already have a large fraction of hardware responsibilities should be officially appointed. They should be charged with organizing and leading the software and software documentation efforts. It was also noted that the monitoring software seems adequate, and may be even fairly nice for stable running. Also, the web documentation is greatly improved, and Heather Ray should be praised for setting up the framework, but there is still a lot of work to be done to finish the documents to go into that framework. c) A list of names who will carry L2 software pagers was not provided, and in the committee's estimation from what was said, it does not appear coverage will be sufficient. There are more names of resident experts, but it is not really apparent that there are volunteer pager carriers. We need a plan and some commitment by people already working on the software. Test stations - - --------------- Work on the diagnostic hardware proposed by Ted Liu and Bill Ashmanskis should continue at full speed. Until this new hardware is available we will be limited to testing in the upper and lower crates with the detector as the only source of "test" data. 1) & 2) For now, there are no other alternatives but to plan on testing hardware modifications or software revisions when the CDF detector is available (ie. no beam) with cosmic rays, even if there is a fully functioning 2nd test crate, which will slow down the progress substantially and could jeopardize the schedule to meet the goal of a fully working L2 system (5E31) by the later part of 2002. The second test crate can be used to test the new version of the Magicbus backplane and can be used to test Magicbus transactions and VMEbus transactions. During infrequent special periods, cables can be moved from the top crate to the bottom crate to utilize the detector as a signal source for the crate. As mentioned at the beginning of the report, if there is an easy way of splitting inputs to two decision crates, it should be implemented as quickly as possible. While significant progress can doubtless be made using the present methods - especially in validating the performance of the new MB backplane or checking out at least a fraction of the spare input boards and certainly in verifying code changes for the Alpha, not everything can be checked (at least without copied data) and so some time will necessarily need to be set aside for "CDF as a pulser" tests once every few weeks. As per Myron's request, Nigel and Jeff should draft a formal plan for making and evaluating requests for CDF pulser time subject to other subsystem requirements. 3) For a long term solution, the committee thinks that the test station system proposed by Ted, combined with the 2nd test crate sounds like the perfect way to provide various types of simulated event data (different luminosity, trigger types, suspected failure modes etc.). Providing that this effort won't impact any activities needed to make the baseline system work, it should be strongly supported, and prototyping and testing work should go ahead at full speed. To insure efficient design and progress, the committee recommends: - Hold a design workshop by the rest of the Level 2 groups in order to ensure that what is built is safe to use and is capable of exercising all the important parts of the system in a realistic fashion. The PULSAR should be ready for a conceptual review in a month or less and the MMB could have an engineering/conceptual review at any time. - Hold a PCB review. - Give a clear set of milestones so that the effort will not lose its momentum. Charge to Committee 11/21/01 - - --------------------------------------------------------------------------- The goal for CDF is to begin Run-II physics data taking in early January, 2002. The running configuration of the CDF detector will be optimized during December 2001 and "frozen" for data taking by January 11,2002. We emphasize, for this review, that this includes the physics content of the trigger table and the L2 trigger hardware. The installation, development, and commissioning effort on the Level 2 trigger has been intense since Run-II officially began in March 2001. Great progress has been made and the yeomen efforts of a few dedicated individuals is to be lauded. However, we still have not commissioned and run the complete L2 system. In order to achieve a stable operational condition, commissioning of the trigger system needed for the physics program in 2002 must be completed by early January. By this time, the CDF DAQ system will be dedicated to data-taking and will no longer be available for further commissioning. Alternative plans must be developed to allow continued development of the full L2 functionality. The near term goal for CDF is to achieve a "to tape" efficiency of 80% and striving for 90% in the long term. Emphasis on operating and maintaining the existing trigger system will therefore take priority. The Director has informed us that these "CDF performance numbers" will be reported to OMB. - - ------------------------------------------------------------------- L2 Hardware - - ----------------------------------------------------------------- In order for us to achieve these goals, the review committee should 1) Review the status of L2 commissioning and determine whether we have achieved the milestones presented at the Sept. 2001 collaboration meeting. 2) Recommend the schemes for providing at a minimum jet, electron and SVT triggers at L2 in a way that allows stable running at luminosities up to 5E31 by early January. 3) Review the proposed paths for commissioning the rest of L2, any improvements required in the system to implement the full CDF4718 trigger table and in particular plans that allow testing outside of the CDF DAQ system. The review committee should recommend: 1) the best hardware configuration for operation appropriate for the prescribed trigger table beginning early January. 2) the readiness of "backup solutions" for the SVT and L1 interface boards including the existence of the necessary hardware and software. 3) whether the "backup solutions" should be part of the baseline trigger for early January. 4) adequate spares exist for all L2 subsystems, 5) a well defined plan for L2 hardware pager coverage for each subsytem including names. - - ----------------------------------------------------------------------- Software - - ---------------------------------------------------------------------- The review committee should: 1) Review the status, goals, and resources allocated to Level 2 software. (trigger table and databases are not part of this review as they were covered in a previous review) The review committee should determine whether: a) L2 alpha software, including physics algorithm software, system support software, and testing procedures are adequate for stable running and quick recovery from failures. b) monitoring and debugging software is sufficient. c) there is sufficient L2 software pager coverage including names. - - --------------------------------------------------------------- Test stations - - -------------------------------------------------------------- Subsequent commissioning should be done in parallel. This requires a parallel L2 decision crate with an alpha and interface boards. Inclusion of trigger hardware at after January 1, 2002 can be allowed only after adequest testing and rate tests have been performed and the necessary software is complete and tested. The Review committee should recommend 1) procedures for testing, commissioning and including the remaining trigger systems 2) Review plans for the "2nd test crate" in the system. 3) Review plans for the test system proposed by Ted Liu. Michael Lindgren mlindgre@cdfsga.fnal.gov - ------- End of Forwarded Message Michael Lindgren mlindgre@cdfsga.fnal.gov ------- End of Forwarded Message Michael Lindgren mlindgre@cdfsga.fnal.gov