From djholm@fndaub Fri Mar 12 17:14:31 1999 Date: Fri, 08 Jan 1999 18:38:42 -0600 (CST) From: Don Holmgren To: linux-users@fnal.gov Subject: Measurements of NFS performance on Linux cluster Hi - I was surprised at the terrible performance quoted at yesterday's strategy meeting for NFS performance, as well as the recommendation of abandoning NFS for clusters. I've typically seen very acceptable results on the several Linux clusters I've adminstrated and/or used (pcfarms, rip, and the CDF Level 3 prototype). So, I did some quick measurements on the RIP cluster. The nodes were rip8, the NIS and NFS server, and rip7. Both are 400 MHz P-II nodes, connected with fast ethernet via a Foundry switch. Writing a 100 MB file: rip7:/raid$ time dd if=100MB.file of=~/100MB.copy bs=8k 12800+0 records in 12800+0 records out 0.02user 2.87system 0:29.54elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (299major+12minor)pagefaults 0swaps Here /raid is a 2-way stripe on rip7, and my home area is mounted from rip8 (i.e., where ~/100MB.copy goes). I took care to have the 100MB.file in rip7's cache. Also, note that I used a block size matching the read/write block size specified in the NFS mount. NFS writing rate (big file) = 3.38 MB/sec The local writing rate to the home area on rip8 was about 6 MB/sec. For a more rigorous test, I copied the linux kernel tree from rip7 to the NFS mounted home area on rip8: rip7:/usr/src$ cd linux rip7:/usr/src/linux$ du -s 36113 . rip7:/usr/src/linux$ find . -print | wc 2773 2773 68532 rip7:/usr/src/linux$ time cp -d -r . ~/linux/. 0.07user 2.96system 0:44.81elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (10056major+70minor)pagefaults 0swaps So, the copy of 2773 files with a total length of 36113K was achieved at a rate of 0.79 MB/sec. I believe that these are very acceptable rates and disagree with the recommendation to not use NFS. I am aware of very poor NFS writing performance from Linux nodes to other UNIX's, particularly OSF/1. However, all Linux clusters in my experience work well with NFS. There was also mention in the slides of NIS being immature and not recommended. Again, in my experience with all Linux clusters NIS has been trouble-free. Further, on some heterogenous clusters (SDSS, FNDAUx, HPPC) with IRIX NIS servers, I've not seen any problems with Linux NIS clients. So I can recommend NIS for Linux clusters as well. Don Holmgren From dane@fnal.gov Fri Mar 12 17:14:55 1999 Date: Fri, 08 Jan 1999 19:28:16 -0600 (EST) From: Dane Skow To: Don Holmgren Cc: linux-users@fnal.gov Subject: Re: Measurements of NFS performance on Linux cluster On Fri, 8 Jan 1999, Don Holmgren wrote: > Hi - > > I was surprised at the terrible performance quoted at yesterday's > strategy meeting for NFS performance, as well as the recommendation > of abandoning NFS for clusters. I've typically seen very acceptable > results on the several Linux clusters I've adminstrated and/or used > (pcfarms, rip, and the CDF Level 3 prototype). There was no recommendation to "abandon NFS for clusters" (though you are not the only one who seems to have come away with that impression). We were trying to be very open about current thinking and stick by the statement that NFS performance for traditional clustering with Linux is a point of concern. We will have to look at alternatives. I am VERY reluctant to deploy linux servers in single point of failure configurations until we have significantly more experience. This mandates a heterogenous approach. Since this was not a public meeting, much of this (public) mailing list is not aware of the presentation to which you refer. Suffice it to say that CD held a meeting to discuss present focus of Linux support efforts within the division and possible options for new initiatives. It is clear that undertaking all possible projects would quickly swamp the available staff. There will be a Unix Users Meeting (incorporating -and probably dominated by- the Linux Users) in the very near future where the general public will get an opportunity to present opinions/requests. > > So, I did some quick measurements on the RIP cluster. The nodes were > rip8, the NIS and NFS server, and rip7. Both are 400 MHz P-II nodes, > connected with fast ethernet via a Foundry switch. Thanks for the work. Published data is always a good thing. > (snipped text here) > > I believe that these are very acceptable rates and disagree with the > recommendation to not use NFS. I am aware of very poor NFS writing > performance from Linux nodes to other UNIX's, particularly OSF/1. However, > all Linux clusters in my experience work well with NFS. > This is consistent with my statement/position above. > There was also mention in the slides of NIS being immature and not recommended. > Again, in my experience with all Linux clusters NIS has been trouble-free. > Further, on some heterogenous clusters (SDSS, FNDAUx, HPPC) with IRIX NIS > servers, I've not seen any problems with Linux NIS clients. So I can > recommend NIS for Linux clusters as well. This is useful information. I understand that experience is not universal, but it is reasonable to request that the problem cases be discussed here as well... Dane Skow, Computing Division Operating Systems Department Head > Don Holmgren > From djholm@fndaub Fri Mar 12 17:15:24 1999 Date: Tue, 12 Jan 1999 18:35:38 -0600 (CST) From: Don Holmgren To: linux-users@fnal.gov Subject: Re: Measurements of NFS performance on Linux cluster As a result of discussions at yesterday's PCFARMS meeting, I've done some further NFS testing, this time between a Linux box (sdsslnx.fnal.gov), and an SGI Challenge (sdss.fnal.gov). sdsslnx and sdss are on the same 10 Mbps ethernet segment. Netperf numbers have a bit of noise because of traffic, but they average about (for TCP transfers): sdss <--> sdsslnx 7.0 Mbps = 0.88 MB/sec and range between 6.73 and 7.39 Mbps (6 tests). sdsslnx is part of the SDSS cluster, and via NIS it gets passwords, groups, and automounter maps. A striped developer's disk is NFS mounted from sdss; the NFS options in the automounter map do not set rsize/wsize values, which typically should be 8192 for optimum performance. Perhaps the following results would be somewhat better with this optimization. First, large file writing performance - as before, a 100 MB file is written: sdsslnx.fnal.gov:[59]% time dd if=/dev/zero of=100MB.file bs=1024k count=100 100+0 records in 100+0 records out 0.010u 7.550s 2:16.41 5.5% 0+0k 0+0io 85pf+0w This works out to 0.73 MB/sec = 5.9 Mbps. Native writing speed on sdss to the striped disk gives a rate of 4.79 MB/sec. This represents 84% of the available network bandwidth. Second, small files. As before, I copy the linux kernel source tree: sdsslnx.fnal.gov:[61]% cd /usr/src/linux sdsslnx.fnal.gov:[62]% find . -print | wc 2738 2738 67647 sdsslnx.fnal.gov:[63]% du -s 35394 . sdsslnx.fnal.gov:[64]% pwd /usr/src/linux-2.0.35 sdsslnx.fnal.gov:[65]% time cp -r -d . /usrdevel/s1/djholm/linux/. 0.200u 6.010s 2:06.77 4.8% 0+0k 0+0io 9833pf+0w So, 2738 files spanning 34.56 MB are written at an average rate of 0.27 MB/sec = 2.18 Mbps, or 31% of the available network bandwidth. Again, as on linux to linux NFS connections, these seem to be acceptable numbers. I don't have access to a linux box connected via a fast ethernet or better connection to an IRIX system. I would be very interested in seeing these tests repeated on such a setup. Don Holmgren From chenyc@fnal.gov Fri Mar 12 17:15:43 1999 Date: Fri, 15 Jan 1999 03:15:49 -0600 From: Yenchu Chen To: Don Holmgren Subject: Re: Measurements of NFS performance on Linux cluster Hi Don, Thanks for the information. That is very useful. Best regards, Yen-Chu Chen chenyc@fnal.gov (630) 840-8871 (experiment) (886)-(2) 2789-9681 (Inst. of Phys., Academia Sinica) From djholm@fndaub Fri Mar 12 17:15:55 1999 Date: Wed, 13 Jan 1999 12:35:04 -0600 (CST) From: Don Holmgren To: Yenchu Chen Cc: Don Holmgren Subject: Re: Measurements of NFS performance on Linux cluster > > Hi Don, > > Thank you very much for the information. At the meeting, I was > supprised that people said that NFS was bad even for Linux server and > Linux client. I am glad that you make it clear. > > I am thinking to use a cluster of PC's for physics analysis. I am not > sure though that if this is too premature. I will need big disk space > several hundred GB and a few tape drives. Do you think the network disk > is a good solution to have big disk space instead of using local SCSI or > fiber channel disk drive? > > Your recommendations/suggestions are more than welcome. > > Best regards, Yen-Chu Chen > chenyc@fnal.gov > (630) 840-8871 (experiment) > (886)-(2) 2789-9681 (Inst. of Phys., Academia Sinica) > > Hi Yen-Chu - Sorry about the delay - I remembered this AM that I'd forgotten to respond. Local SCSI disk will always be the fastest (local fiber channel disk is really SCSI underneath the sheets - so you'd get the same speed from a fiber channel disk, but pay a lot more for the interface for the computer). Typical high performance disks will now let you read/write at 12 MB/sec or more, and with striping you can push this to 30 to 40 MB/sec. However, lots of local SCSI disk on each node of a cluster is very expensive, harder to administrate, and inconvenient if you'd like to share data files. Most clusters will need to share disks. If you use NFS to share a big set of disks, you'll cut the read/write performance down to a few MB/sec at best (based on the quick tests I've run and sent to the mailing list). If that speed is acceptable, I think NFS is a good solution. It also lets different flavors of UNIX interoperate. We have seen a couple of anomalies with NFS on Linux, mostly related to inconsistent information from 'ls' commands. If you copy a file to an NFS disk, and immediately (w/in a second or so) execute 'ls', you sometimes don't see the file, or you see incorrect dates or lengths. Seconds later the correct information is returned (I think this is called "attribute cacheing, and it's a feature which can be adjusted or disabled). I've heard that in E871 something similar is done to check that data files have been moved (an "rcp" is done in a script which then does an "ls" to check sucess - this won't work all the time unless attribute cacheing is adjusted). Fibre channel disk looks very attractive, and it will probably be widely used in a couple of years. With FC, machines in a cluster can all be locally attached to the same set of fibre channel drives - so, all nodes would see high data rates. The problem is that currently there's no software which allows different machines to simultaneously mount the same file system off of a fibre channel disk and not get confused because of disk cacheing (when a node changes a file or a directory, that may not be written to the disk for a few seconds). Once shared filesystems are implemented for fibre channel, I think that FC will be the preferred solution. Don H.