Level 3 Online Status Reports
Back to index
Previous
Next
Issue 12
Monday, July 12, 1999
Event Flow
Reports
- Steve reports that b0l3pcom1 actually has a VxWorks development license,
so we can cross-compile on it for the VME CPU's.
- Steve has implemented the "periodic push" ATM driver with a 5ms push
interval. The driver empties the received packet queue as part of a
"bottom half" to the (1ms) clock timer. This seems to work.
- However, it is noticed that some conditions (opening the ATM connection),
there is a long delay of greater than 80ms. Additional traces show that
the timer interrupt itself is not being called, so something in the
system is seriously misbehaving. The initial suspect is the ATM driver
code itself, but this needs to be confirmed, or else it may come to
bite us again later. Another side effect is that with enough of these
kinds of conditions, the system time loses ticks. Moreover, no scheduling
occurs during these times.
- The test wedge should be operational again by Tuesday. Over the weekend
a lot of software was broken, including the fact that the consumer code
couldn't be compiled for a while, and b0dau30 (the CVS repository) was
having trouble as well. Supposedly all of these software problems have
been resolved and are now awaiting the test wedge.
Actions
- Ilya will move the SCRAMNet module from the b0eb21 crate to a 1st
floor crate with another bypass switch.
- Steve will investigate the timer interrupt mask with Ron and Don.
Executable Interface
Reports
- Christoph has implemented version 1.2 of the reformatter, which is now
the default on b0l3pcom1.
- In answer to Kevin's queries about messages sent to the filter
executable, it is apparent that L3_RUN_STOP is used to change from
the active_run state to post_run. Once in the post_run state,
L3_RUN_END is used to end the run.
- As for event messages, there are two places where a msg_type-like
field resides: in the actual message header of the message which also
points to the global buffer which contains the data; and in the
beginning of the global buffer (the "WIN" header). The actual message
header contains the proper L3_RUN_EVENT message type, but apparently
the executable is receving a WIN header with the L3_CS_RUN_EVENT
message type. We should probably be using L3_CS-type messages only
for communications between the output node and the consumer server,
so we need to change this.
- Boris has implemented sample database routines. The accessors are divided
amongst classes which are specific to each of the components, i.e., there
are separate classes for SCPU information, scanner manager information,
etc.
- Documentation for Boris's packages can be found
here.
- Currently the database code only reads and writes strings. JDBC uses
different methods to access different data types, so strings are the
only simple option for generic fields. However, in the final
implementation, the fields will be accessed with individual methods
of the component objects, and these methods will know the actual type,
so the different types can be used.
- JDBC has been installed on b0l3pcom1. The Oracle client was already there.
The database code has been ported from b0dau30 (where it was originally
developed) to b0l3pcom1 and operated. Focus now shifts to using it in
the context of the event builder monitoring/control system.
Actions
- Jeff will change messages such that L3 messages are used within Level 3,
and L3_CS messages to the consumer server.
- Boris will provide a flat-file implementation of database access interface
Event Builder Monitoring/Control
Reports
- Ilya and Sasha have monitored the scanner manager using their new
ILU-based "scanner control." (...applause...)
- On its more complicated invocations, the new scanner control crashes
after about 6000 calls (simpler ones work indefinitely). Since the
crash complains about memory problems, a memory leak is suspected.
- The first step at integrating the database routines with the scanner
control will be simply to get the IOR of the scanner manager from the
database (according to Ilya, this is all scanner control needs right
now; sm_config is still read from a file).
Actions
- Ilya and Sasha will pursue scanner control crashes.
- Boris, Ilya, and Sasha will use the database access interface to start
up scanner control.
- Steve will look into temporary ROBIN-based run control proxies.
Level 3 Monitoring/Control
Reports
- Ivan has implemented a generic routing scheme whereby a node knows where
to send data given a particular destination (the old scheme only knew how
to send data up or downstream).
- Mike and Andreas have been making low-level monitoring primitives using
Mike's scheme of sending monitoring information as a string of keys and
values. The load average and memory information have been implemented
using the /proc "filesystem." Networking information a la netstat and
ifconfig would also be desired.
- If the /proc file merely dumps information from a system call, and the
system call is accessible to general users, it would be preferable to
use the system call.
- It was decided that a timestamp should accompany a package of monitoring
information, instead of individually timestamping each piece of
information as it is collected. Where other timestamps may be desired
(such as the last modification time of a log file), these would be added
by the individual monitoring primitive.
- xntp (for synchronized clocks) was made to work with b0l3pcom1 listening
to the Fermi server and b0l3010 and b0l3011 listening to b0l3pcom1.
This needs to be implemented throughout the the farm.
Actions
- Ivan will start up programs, like the event service, from within
the singleton request framework.
- Ivan will document relay code.
- Mike and Andreas will work on more monitoring primitives, while
also checking for system call equivalents.
- Andreas will install xntp throughout Level 3 farm.
Physical
Reports
- No progress on moving SCPU's to 1st floor. Still waiting for
infrastructure down there.
- A diagram for the floor cabling of Level 3 has been drawn up, but still
needs some details. It will be sent to John Elias for suggestions on
part numbers and other implementation details.
Actions
- Send floor cabling details to John.
- Talk with Rob about shelf cabling.
Back to index
Previous
Next
Jeff Tseng / MIT /
jtseng@fnal.gov