Level 3 Online Status Reports
Back to index
Previous
Next
Issue 8
Tuesday, May 25, 1999
ATM Stuff
Reports
- Tests which illustrate the behavior of concern:
(In all the following tests, the wait interval in l3recv is
4/3msec, with 2000 tries).
- 10 SCPU -> 1 CNV -> trash never has any late fragments.
- 10 SCPU -> 1 CNV -> 4 P (all off one CNV port) -> trash:
~1 bad, ~100000 good
- 10 SCPU -> 1 CNV -> 2 (2 P -> 1 OUT): 1 bad, 32379 good
(i.e., each CNV port is connected to 2 P's which forward their
data to two different OUT nodes)
- 10 SCPU -> 1 CNV -> 2 (2 P) -> 1 OUT: 68 bad, 20365 good
(2 P's hanging off each CNV port, but all 4 P's write to one
OUT node)
- 10 SCPU -> 1 CNV -> 4 P -> 1 OUT: 57 bad, 25300 good
- 10 SCPU -> 1 CNV -> 4 P -> 1 OUT (same as last configuration):
Ilya varied the rejection rate on the processor nodes.
- with no rejection (pass all events on to output node): ~0.23% late.
- ~0.2% late until rejecting about 20% of the events.
- late fraction drops to nearly nonexistent above 33% rejection.
- Some scaling tests:
- N SCPU -> 1 CNV -> 4 P (all hanging off one CNV port), each
SCPU sending at 1MB/s (12KB fragment size), gives late packets
for N > 4, and this "breakpoint" coincides with late packets
showing up. Ilya has calculated that the resulting event rate
is consistent with the observed fraction of late fragments
(since they cause l3recv to wait a long time for the next event).
- N SCPU -> 1 CNV -> 4 P (one CNV port), at 1MB/s and 24KB fragments,
"breaks" for N > 8.
- N SCPU -> 1 CNV -> 4 P (one CNV port), each sending at 0.3MB/s
(12KB fragment size), shows linear behavior up to the 10 SCPU's
tested.
- Observations on these EVB/L3 tests:
- In general, the more activity on the converter node, the more
chances for late fragments. Thus throttling the transmission rate
even more, or sending larger fragments, reduces the late fragment
rate.
- Possible hypothesis on why the output node increases the number
of late fragments: more kernal activity on the converter node,
which must re-send packets in response to collisions. This may
also explain the result that changing the rejection factor on the
processor nodes affects the late fraction on the converter node.
(How can we measure the frequency of collisions?)
- Driver investigations:
- tests using an i4515_tx_perf() loop on 10 VME CPU's sending to
10 simultaneous br3 loops on b0l3c04.
- No other activity on b0l3c04: no packet drops. All 1000000 packets
are logged by the driver (using "cat /proc/atm/devices") as well
as by br3 (each receiving process receives 100000 packets).
- Running ~djholm/benchmarks/tiny.static on b0l3c04 causes each
receiving process to drop packets (even across all processes).
However, both /proc/atm/devices and the loop count agree.
- "date" and logging in (ssh and/or rsh) also causes drops of a
similar kind.
- Running tiny, nice'd +10, induces no packet drops.
- Don and Ron have looked at the above problem using the kernel trace
facilities. They have been able to duplicate the results as well,
causing packet drops using "dd > /dev/null". The traces show the
periodic beats of br3 processes being scheduled. At a certain
point, dd takes over entirely for 50ms: there are even no
scheduler beats (which is supposed to be every 1ms). After this,
the br3 processes beat very quickly, apparently trying to read
backed-up packets. We don't know what dd is doing during those 50ms,
but it probably has to be a system function which evidently has such
high priority that it overrides even the system scheduler.
- Driver history from Don and Ron, with investigations by Steve:
- The previous packet drop tests, in January 1998, had been made
with the Fore 200 LE ATM-PCI adapter running the appropriate driver.
The current system uses another card with a different driver (IDT).
- The Fore driver was constructed in a pretty traditional way:
the card deposits data as it is received in local buffers; after
the entire AAL5 packet has been received, it is copied to a
"Socket Kernel" buffer (SKB). If, however, there are no more
SKB's to be allocated, then the packet is "dropped," and the
dropped counter is incremented.
- The IDT driver assembles the packet in an SKB which has been
"wrapped" around local buffers. This eliminates the memory copy
of the traditional driver. The code which increments the "dropped"
counter isn't called.
- Steve points out that event builder flow control should prevent there
ever being a circumstance in which all the driver buffers are used.
- Also, there are several interrupts used by the IDT driver:
- queue of buffers to be transmitted is 7/8 full
- small-buffer free-queue is empty
- large-buffer free-queue is empty
- queue of buffers that have received data is 7/8 full
- periodic monitoring (TMO) interrupt, which prints out dropped cell
count.
- A real-time l3_converter has been made to operate. The main problem
in getting it to run is to make sure that all polling loops have an
interval longer than a minimum, below which there is no actual delay
(which means the polling loop becomes a tight loop running in real-
time mode).
Other Stuff
Reports
- Steve has created a working VxWorks port of ILU.
- l3bg.tcl and evb.tcl have been integrated into a new tool called
l3evb.tcl.
- Fixing core dumps in the Level 3 executable necessitated a change
in the AC++ framework. Liz is testing this change.
- The wedge test is intended to send data from the wedge all the
way to consumers, which will display some histogram as the data
comes in. Data will also be written out to disk to be read by
AC++. In the tests last Friday, data was written out, but the
consumers could not read bigendian data (not compiled on PC's
yet). Also, the LRIH bank name was byte-swapped.
- Christoph has put his reformatter package under UPS.
- All the l3_converter reformatters have been reformulated in
object-style C and made a part of the reformatter package.
Actions
- Ilya and Sasha will start on scanner manager control via ILU.
- Jeff will make a new test L3FarmUser.
- Christoph will correct bank name byte order for wedge test.
Back to index
Previous
Next
Jeff Tseng / MIT /
jtseng@fnal.gov