Date: Fri, 17 Dec 1999 12:17:51 -0500 (EST) From: Andrew Gallatin <gallatin@cs.duke.edu> To: "Kenneth D. Merry" <ken@kdm.org> Cc: Matthew Dillon <dillon@apollo.backplane.com>, anderson@cs.duke.edu, Poul-Henning Kamp <phk@critter.freebsd.dk>, freebsd-current@FreeBSD.ORG Subject: Re: Serious server-side NFS problem Message-ID: <14426.25577.295630.812426@grasshopper.cs.duke.edu> In-Reply-To: <19991216205554.A20410@panzer.kdm.org> References: <199912160758.BAA87332@celery.dragondata.com> <199912160801.AAA50074@apollo.backplane.com> <14425.33053.359447.429215@grasshopper.cs.duke.edu> <199912170328.TAA57721@apollo.backplane.com> <19991216205554.A20410@panzer.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Kenneth D. Merry writes: > > > Another advantage with gigabit ethernet is that if you can do jumbo frames, > you can fit an entire 8K NFS packet in one frame. > > I'd like to see NFS numbers from two 21264 Alphas with GigE cards, zero > copy, checksum offloading and a big striped array on one end at least. I Well.. maybe this will work for you ;-) 2 21264 alphas (500MHz XP1000S), 640MB RAM, Myrinet/Trapeze using 64-bit Myrinet cards, 8K cluster mbufs, UDP checksums disabled (we can do checksum offloading at the receiver only). We have a 56K MTU. Using this setup, *without* zero copy, we get roughly 140MB/sec out of TCP: % netperf -Hbroil-my TCP STREAM TEST to broil-my : histogram Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 524288 524288 524288 10.01 1135.20 And about 900Mb/sec (112MB/sec) out of UDP using an 8k message size: % netperf -Hbroil-my -tUDP_STREAM -- -m 8192 UDP UNIDIRECTIONAL SEND TEST to broil-my : histogram Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 57344 8192 10.00 165619 0 1084.94 65535 10.00 137338 899.68 I have exported a local disk on broil-my and created a 512MB file (zot). Both machines have 640MB of ram and the test file is fully cached on the server. When reading the file from the client, I have found the best I can do is roughly 57MB/sec: # mount_nfs -a 3 -r 16384 boil-my:/var/tmp /mnt # dd if=/mnt/zot of=/dev/null bs=64k 8192+0 records in 8192+0 records out 536870912 bytes transferred in 9.658521 secs (55585209 bytes/sec) # umount /mnt # mount_nfs -a 3 -r 32768 boil-my:/var/tmp /mnt # if=/mnt/zot of=/dev/null bs=64k 8192+0 records in 8192+0 records out 536870912 bytes transferred in 9.513517 secs (56432433 bytes/sec) Emperically, it seems that -a 3 performs better than -a 2 or -a 4. Also, the bandwidth seems to max out with a 16k read size. Increasing much beyond that doesn't seem to help. Varying the number if nfsiods across between 2,4 & 20 doesn't seem to matter much. Running iprobe on the client (http://www.cs.duke.edu/ari/iprobe.html) shows us that we are spending: - 29.4% in bcopy -- this doesn't change a lot if I enable/disable vfs_ioopt. I suspect that this is from bcopy'ing data out of mbufs, not crossing the user/kernel boundary. In either case, there's not much that can be done to reduce this in a generic manner. - 5.5% tsleep (contention between nfsiods?) The "top" functions/components are: Name Count Pct Pct -- ----- --- --- kernel 4128 90.0 -------- bcopy_samealign_lp 1347 32.6 29.4 procrunnable 279 6.8 6.1 tsleep 256 6.2 5.6 Lidle2 195 4.7 4.3 m_freem 89 2.2 1.9 soreceive 73 1.8 1.6 lockmgr 63 1.5 1.4 brelse 60 1.5 1.3 vm_page_free_toq 55 1.3 1.2 ovbcopy 51 1.2 1.1 wakeup 43 1.0 0.9 acquire 42 1.0 0.9 bcopy_da_lp 42 1.0 0.9 nfs_request 41 1.0 0.9 ip_input 40 1.0 0.9 biodone 39 0.9 0.9 nfs_readrpc 38 0.9 0.8 vm_page_alloc 36 0.9 0.8 <...> ---------- /modules/tpz.ko 435 9.5 tpz.ko is the myrinet device driver. This is saying that the system spent 90% of its time in the static kernel, 9.5% in the device driver, and 0.5% in userland. The server is also close to maxed-out. I can provide an iprobe breakdown for it as well, and/or complete breakdowns for the client and server. Cheers, Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14426.25577.295630.812426>