From ashmansk@hep.uchicago.edu Wed Oct 3 16:33:59 2001 Date: Thu, 30 Aug 2001 17:45:06 -0500 (CDT) From: Bill Ashmanskas To: cdf-l2@fnal.gov Subject: straw man ideas for L2 diagnostic hardware I apologize that this message is neither as prompt nor as detailed as I originally intended. I am about to vanish for a week, and wanted to give people food for thought, which we can discuss at the workshop, or people can discuss by e-mail or in person without me before the workshop. One concern in debugging a long pipeline, especially when different stages have different designers, is isolating the cause of a given failure. Within SVT, an elaborate debugging scheme has been implemented, in which every board keeps track of the last 128K words at both its input and its output. This allows unwanted output of SVT to be traced back to its source, and as a side effect, it allows the various boards to serve as diagnostic data sources and data sinks for each other, making it possible to perform a realistic test of each board in a test stand or in situ without Run_Control. Even at this early stage, I think many person-years have been saved by this agressive diagnostic strategy. (One flaw in the SVT scheme is that the output of each pipeline stage is not easily captured into a L2 buffer for diagnostic readout. We now have a fix for this--a recently-built diagnostic board that can act as a straight-through cable whose side effect is to record the cable traffic event-by-event into a L2 buffer for readout. This could be used e.g. to check each SVT pipeline stage bit-for-bit against the simulation, offline.) I have been asked to think about diagnostic hardware that might provide similar functionality for the L2 decision crate. Since time is short, we would want to keep things as simple as possible, hopefully reusing existing diagnostic tools or at least reusing pieces of schematics from designs that have been shown to work. Probably many tools already exist that I just don't know about. Please feel free to clue me in. For testing interface boards, I think one wants a data source and data sink that could be either run standalone or driven by L1A. Matt, Ted, and I have had some success using SVT as a data source for the L2/track boards and using the Alpha itself as a data sink. We can currently send any pattern of data we wish into either the SVTlist board or the XTRPlist board, and we can insert gaps between words whereever we wish. Do we want to use Alphas in general for interface board data sinks? If there are plenty of spare Alphas, or will be eventually, then this seems like the easiest solution, though it may be an expensive part to dedicate to a test stand. If not, we may want to build (or revive, if it already exists) a board that would, as needed, drive the MBus control lines, receive MBus data into a FIFO, and make the data from the FIFO available via VME buffer, logic analyzer, or perhaps some "common diagnostic cable." How many of the other boards have data sources that can be easily configured to send test patterns? Can the real system do this in-situ? Can spare modules do this in a test stand? The two pieces of SVT that have "special" input formats are the L1 bits from FRED into XTFA and the fiber optic G-link signals into the Hit Finders. For the former, Simone Donati built a simple test board that translates SVT cable words into L1 cable words; he then uses any other SVT board to drive test patterns through the test board to the L1 cable. For testing Hit Finders, SVT used a versatile but mechanically delicate board from Fermilab, called the General System Test Module. A GSTM consists of several FIFOs and various daughter cards for G-link, TAXI, etc., both senders and receivers. Simone's board could easily be used to exercise the L1/L2 interface card, if this seems easier than programming FRED to send test patterns (e.g. bunch counter) to L2. An optimization or adaptation of the GSTM for SVT/L2 use would perhaps be smart enough to (re)transmit on L1A, read data into VME buffers, and service G-link, TAXI, and HotLink modules. Or one could reduce the FIFO/VME problem to a previously solved problem, by building a board that converts "common diagnostic cable" format to and from G-link, TAXI, and HotLink. One could then use existing SVT boards as data sources, data sinks, and VME buffers. The advantage of reusing SVT boards as sources/sinks is that a lot of test harness infrastructure already exists, so one would not need to spend time redeveloping it. The goal is to be able to validate (a) the input to every interface board, (b) the throughput of every interface board, (c) the MBus data flow, both in a test stand and in situ (perhaps by loading special software or plugging in a few diagnostic cables) when a problem is seen whose fix is not trivial. It is important to be able to record enough information in situ that one can reproduce the problem in a test stand. Making a problem happen repeatably in a test stand makes the iteration time much, much shorter when trying out new bug fixes. Feedback is very welcome, though I may not have time to read it until the morning of the workshop. -Bill