From wolbers@fnal.gov Mon Mar  8 18:41:36 1999
Date: Sun, 06 Dec 1998 11:43:25 -0600
From: Stephen Wolbers <wolbers@fnal.gov>
To: Jim Fromm <fromm@doubleday>
Cc: chenyc@fnal.gov
Subject: Re: network throughput testing

Jim,

     I printed and read your mail message recently.  I think this is an
excellent start on understanding the throughput of the system.  I
thought of a few more measurements which I think would be useful,
especially to understand the total capacity of the Gigabit links.  

     1. Measure throughput with one I/O node talking to worker nodes.
        I would want to do this with 1,2,3,4,...14 worker nodes, i.e.
        up to 14 transfers going on simultaneously.  It should be the
        case that each transfer will use about 100 Mbits/sec and that
        the total will increase until something is saturated.  It 
        will be interesting to see what gets saturated and when.
    
     2. The same could be done with multiple I/O nodes talking to 
        multiple workers.  Something like:

                I/O      Worker
                 1         4
                 2         4
                 3         4
                 4         2

         This would have each I/O node sending to 4 workers at once, 
         except for the 4th I/O, which would send to only 2 workers.
         It measures in some sense the maximum that can be moved from
         I/O nodes to worker nodes simultaneously.

     3.  One could have one I/O nodes sending to 7 worker nodes and
         7 worker nodes sending to another I/O node, all simultaneously.
         This again tests a situation that is a maximum amount of 
         work the farm would be asked to do.

     4.  I think that some of the tests I listed in 2 and 3 could be 
         done for varying combinations of worker and I/O nodes and 
         either one or two transfers per worker node.  The idea is
         to characterize as much as possible the performance that is
         achievable with this set of hardware.

      I will be out of town Dec 8-11, but will be around the week after
that.  We can talk then about various possibilities and what is easy to
test.  You made an excellent start and most of the measurements up to
now are quite sensible.

Steve 

Jim Fromm wrote:
> 
> Enclosed is some benchmark testing that I did on the farm using netperf.
> There are some very strange things going on....
> 
> TEST I:
> Message size being sent is the same as the socket buffer size.  THe first test
> just tested TCP.  I did UDP, but I don't understand the results. The test
> results don't show anything surprising on the TCP side, buffer size of 2^16 is
> optimal.  Also as expected, if you use different send/receive buffer sizes you
> get poorer results.  Although not unexpected because I've seen other results,
> why is the rate only 1/5 of capacity?
> 
> FLUCTUATING BUFFER SIZES.  Message Size == Buffer Sizes
> PROTOCOL: TCP
> 
> IO --> IO
> 
> Message Size/
> Recv/Send Buffer size           TCP Rate
> 
> 2^16                            226.51
> 2^15                            183.63
> 2^14                            126.76
> 2^13                             83.80
> 2^12                             42.47
> 
> IO --> Worker
> 2^16                            93.78
> 2^15                            94.23
> 2^14                            94.19
> 2^13                            42.94
> 2^12                            36.26
> 
> Worker-->Worker
> 
> 2^16                            94.06
> 2^15                            94.60
> 2^14                            88.54
> 2^13                            60.39
> 2^12                            30.53
> 
> ---------------------------------------------------------------
> TEST II:
> 
> Constant buffer size (2^16), fluctuating message sizes.
> 
> This test attempted to see if a degradation in performance occurred as the size
> of the message decreased.
> 
> First, from IO node to IO (fnpcb to fnpcc).  TCP shows rates similiar to the
> above test for message sizes to about 2^8, then TCP overhead starts to take
> over and the rates dip.  Nothing surprising.  What is really weird is the UPD
> rates.  On all cases, we got about 0 receive throughput.  I didn't trust
> netperf at first, but after doing some other tests (ttcp), looking through
> netperf code, and having some others try the tests too (D.Holmgren), I think
> it really is reporting correctly.  What the hecks going on?  The send rate is
> just the rate at which the message is sent, recv rate is the rate at which
> they are received  on the remote host.  Notice the huge jump in send rates
> from 2^9 to 2^8.  I isolated it even further, and found that sending messages
> of size 375 resulted in send rates in the 150 range, but a message of size 376
> the rate plummets to 0.03.  What is magical about 376?  Also, if the UDP rates
> are so bad how are we able to use NFS, which seems to give reasonable rates.
> This will have to be investigated further.  Note that a large discrepency
> between send and receive rates means that packets are being lost.  In the 2^8
> and 2^9 case, virtually all packets are lost.
> 
> I then ran the UDP test in reverse, from worker to IO, and got reasonable
> numbers, with the possible exception of that spike in send rate at 2^8 causing
> 1/2 the packets to be lost.
> 
> IO->IO
> 
> Message Size                    TCP Rate        UPD Rate
>                                                 (send/recv)
> 2^15                            227.77
> 2^14                            224.12          0.08/0.05
> 2^13                            226.25          0.07/0.05
> 2^12                            227.75          0.09/0.07
> 2^11                            224.35          0.08/0.05
> 2^10                            229.28          0.07/0.05
> 2^9                             264.99          0.07/0.06
> 2^8                             200.44          169.95/0.03
> 2^7                             124.27          88.27/0.01
> 
> IO -> Worker
> 2^15                            93.80
> 2^14                            94.13           0.07/0.05
> 2^13                            93.76           0.08/0.08
> 2^12                            94.04           0.11/0.08
> 2^11                            93.75           0.11/0.08
> 2^10                            93.60           0.29/0.27
> 2^9                             93.19           0.08/0.05
> 2^8                             94.01           170.80/0.04
> 2^7                             93.51           91.45/0.03
> 
> Worker -> IO (UDP only)
> 2^15                                            96.03/95.51
> 2^14                                            95.79/94.79
> 2^13                                            95.83/95.34
> 2^12                                            95.74/95.26
> 2^11                                            94.28/94.28
> 2^10                                            93.91/92.88
> 2^9                                             88.53/88.02
> 2^8                                             146.32/78.67
> 2^7                                             41.26/41.16
> 
> Worker to Worker looks good - TCP rates remain near max and UDP gets good
> rates down to a message size of 2^8, where we see our send spike again, and
> then the send and recv rates fall off at 2^7.
> 
> Worker->Worker
> 2^15                            94.08
> 2^14                            94.09           96.05/96.05
> 2^13                            94.09           95.81/95.81
> 2^12                            94.10           95.72/95.72
> 2^11                            94.08           94.26/94.26
> 2^10                            93.92           93.88/93.88
> 2^9                             93.99           88.53/88.53
> 2^8                             94.00           144.74/42.81
> 2^7                             93.97           40.19/39.99
> 
> ---------------------------------------------------------------
> 
> The other test I did was a concurrency test, running several netperf
> concurrently and observing the rate.  I tested up to 6, and saw a constant
> throughput for TCP.
> 
> I also wanted to test the amount of CPU being used by the networking, but
> could not get it to work.  I got the following error message in netperf:
> create_looper: file creation; errno 17
> 
> --
> 
> -------------------------------------------------------------
> 
> Jim Fromm                         Fermi National Accelerator Laboratory
> fromm@fnal.fnal.gov               P.O. Box 500
> 630-840-8483                      MS 369
>                                   Batavia, IL 60510
> 
> -------------------------------------------------------------
> "I think Little League is wonderful. It keeps the kids out of the house" -
> Yogi Berra
> 
> --
> 
> -------------------------------------------------------------
> 
> Jim Fromm                         Fermi National Accelerator Laboratory
> fromm@fnal.fnal.gov               P.O. Box 500
> 630-840-8483                      MS 369
>                                   Batavia, IL 60510
> 
> -------------------------------------------------------------
> "I think Little League is wonderful. It keeps the kids out of the house" -
> Yogi Berra

