Level 3 Online Status Reports
Back to index
Previous
Next
Issue 6
Monday, May 3, 1999
Event Flow
Reports
- Ilya found several problems while running small event builder systems:
- b0eb18 and b0eb19 as SCPU's consistently resulted in TIMEOUT's on
the converter node (i.e., packets which didn't arrive in the
allotted time).
- b0l3c03 also would get TIMEOUT's with any SCPU
- old processor nodes would occasionally die
- converter nodes still get hung up sometimes. One representative
hang showed that the hangup was in the OUTPUT state (waiting for
an event to be sent to the processor nodes).
- dribble of TIMEOUT's in general (generally < 0.1%)
- distribution of tardy fragments among SCPU's is close to uniform
- In all these cases the ATM driver doesn't report any errors or
dropped packets on reception.
- It was found that b0eb18's ATM fiber pair were swapped with
b0eb19's. When they were re-swapped, they worked fine.
- Steve confirmed that all the PC's had identical setups for their
ATM interfaces, and PCI bus scans gave identical hardware setups.
- Steve found [won't go into tests here, but this bit of news comes
from Tuesday] that the b0l3c03 ATM interface is broken, failing
loopback tests as well as tests in other PC's. A working ATM
interface can also be installed in b0l3c03 and still work
properly.
- The L3->CS interface was implemented as a separate l3_csl_socket
library; l3_cs_socket was changed back to its original form (no
network byte ordering on message headers). It has been used for
two l3_converter's to talk with one another. Tony is synchronizing
the consumer server code with it; its receiver uses one of the
lower-level routines directly, and this routine has different
parameters in the two versions.
Actions
- Jeff and Tony will work on the L3->CS interface.
- Suggested tests on event builder (Ilya):
- To make sure the packets are actually being received:
- compare number of ATM packets received with the number of events
received by the converter node.
- To see if the packets are simply being delayed:
- decrease packet size.
- increase the timeout window.
- loosen the rate limitation.
- Ilya, Steve, and Jeff will continue to shake out the bugs.
- Large system test.
Executable Interface
Reports
- Christoph has implemented SVX VRB reformatting, but there is still
some question about the format. He has received from Steve Nahn a
sample "real" event.
Actions
- Continue work on SVX VRB reformatting.
- Implement data generation on SCPU's through a VRB object, with
random variations in the minibank sizes. This can then be used
in the large system test.
Test Control
Reports
- The run control group is planning a quickie ROBIN implementation of
the Run 2 state model which can be used for tests. An IDL file will
be provided. This will require Level 3 and the Event Builder to
be checked and adjusted for compliance with the Run 2 state model.
- In lieu of an actual communications model, the run control group is
working with an abstracted model ("universal state machine receiver"),
inside of which can operate some CORBA, SmartSockets, etc.
- Steve has prepared his modifications to ILU to make it work under
VxWorks, but it hasn't been tested yet.
Actions
- Continue with ORBacus/IDL work.
Monitoring
Reports
- l3_converter status has been implemented. The program l3_dump_cstate
prints out the state array. This modification has also necessitated
a script, l3_kill_converter, which cleans up l3_converter processes
as well as the shared memory segment which is created to hold the
state array.
- Jeff and Mike discussed more ROOT monitoring issues.
Actions
- Ilya and Andreas will continue DIM investigations.
- Mike will continue on ROOT monitoring.
Physical
Reports
- "Evaluation shelf" didn't show up, but Jeff talked with Orlando Colon
in the CD, who is the resident expert on network cabling. There was
extensive discussion of the various solutions the CD has tried in its
several generations of PC farms, as well as of the different types
of cables, fibers, and patch panels. The short summary is that there
are two kinds of layouts:
- Central network switch rack, connected to computers via trunk cables.
- Subfarm intranets, with each subfarm connected to a local switch,
and each switch uplinked to the others.
The claim is that the subfarm intranet solution is cheaper and less
labor-intensive, but CD went with the central networking solution,
presumably because it is more flexible. The evaluation was done with
a 1000-node farm in mind.
Actions
- Jeff will compare the two layouts as they pertain to a
Level 3-sized system.
Back to index
Previous
Next
Jeff Tseng / MIT /
jtseng@fnal.gov