Hit Buffer FNAL tests logbook
20-24 July 2000:
run MHB_TEST_RANDOM on HB #9 and #8 in cooled crate.
crate is upper right, HB#9 (the ~bad one) in slot 8, hb#8 slot 21.
Merger0= #16/slot 7 Merger1 = #9/slot 16.
Ran fri/sat at 20MHz, then change quartz.
Both HB's at 25 MHz.
Also HB#15 is now in bottom/left crate slot 11 at 25 MHZ running simple data.
Got 3 errors in HB#9 (FSM stuck in bad state), ==> put it at 23MHZ.
Keep running (also move to final destination, slot 12).
25 July 2000:
got one single bit error in HB#9 after ~400K iter. Bad bit 20 in ouptut hit
(0 instead of 1). This is GDATA20 line out from MLDATA corresponding
to MAPADD16 line in input to MLDATA from HRVME.
Bit is the same in output data and output spy.
Rrunning the same data is OK.
Lower quartz to 22 MHz. And restart. HB #8 in slot 21 still no problem
at 25 MHz.
26 July 2000:
Still running..
27 July 2000:
PC is rebooted at 16:55 after 1.3M iter OK. #9@22MHz, #8@25MHz
Restart at 19:00 same clocks.
28 July 2000:
Still running.. At 11:35 400K additional iter.
NOTE: 100K iter = 200sec real time ==> 3400 sec soc far.
30 July 2000:
Still running.. At 19:00 1.79M additional iter since restarted on July 27.
Total with 22Mhz clock: 3.1Miter = 6200 sec so far.
Stop to debug new code.
31 July 2000:
restart hb #9@22Mhz+#8@25Mhz on b0svt01.
stop with 112K iter OK at 16:00 to test new code.
23 August 2000:
as of today managed to run successfully
hb_quick_test
program on all HB's
installed in B0 crates but the 2 in b0svt03 which has no CPU.
Still to do: hlm_test and random test.
14 October 2002:
Found that HB#14 in test stand has 20MHz clock.
Put in 25MHz. Passes HLM test OK, but random test fails after about 10K
events with bad bit<8> in one output hit word. Output spy is OK. Output
bit<8> is 1 instead of zero. This was seen already at HB production time,
when trying to find HB maximum frequency, it is a timing problem on output
chips OUT0-4 (bit 8 has more combinatorial logic then others because
on end event it has the parity).
But all other HB runs at 25, including on test stand.
The error when happens is almost always
reproducible (i.e. very seldom retrying the same data a
few times makes it pass).
Maybe is temperature in test stand
vs. trigger crates. Test stand room is very cold though.
Also notice that Franco soldered a test point on DS_ output
line on this HitBufer!
Keep looking.
So far random test runs failed after:
1: 2000 iterations. Fails 5 retries, then change seed
2: 22100 iterations. Change seed immediately
3: 9100 iterations. OK after 3 retries
4: 6900 iterations. Fails 20 retries, then change seed
5: 800 iterations. Fails 20 retries, then change seed
6: 1100 iterations. OK as soon as restarted
7: 7100 iterations. Fails 10 retries, then change seed
8: 36200 iterations. Fails 20 retries, then change seed
9: 7300 iterations. Fails 10 retries, then change seed
10: 5200 iterations. Fails 10 retries, then change seed
Give up here on October 15. Average fail rate is 1/9700 iterations.
One iteration is in average 2~400 words, so about 1/10^6 events.
Guess it will work OK in cooled crate.
26 October 2002:
Test HB#14 in cooled b0l2de01 crate using two mergers. Modify hb_random_test to work without SpyControl beacause SC was hanging the crate (do not know if bad SC from spare cabinet or backplane issues). Now HB #14 works.
12:00 (about) start random test
15:30 140K iterations with no problem
27 October 2002:
Keep testint HB#14 in cooled b0lede01 crate.
It turns out SC was without clock. Put it in.
07:45 Stop random test after 700K iter OK to use SC(#14 in slot 14 !)
15:50 280K iterations of random test OK.
28 October 2002:
09:50 End random test after 900K iter (1.6M total) so that Franco can replace wires that connect DS and EE to FP test points with resistors (to be used for SVT timing measurement)
10:40 Verify random test again after Franco put resistors: 22K iter
OK.
Stefano Belforte