Date: Mon, 3 Feb 2014 19:03:17 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: J David <j.david.lists@gmail.com> Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org> Subject: Re: Terrible NFS performance under 9.2-RELEASE? Message-ID: <320778540.2326389.1391472197902.JavaMail.root@uoguelph.ca> In-Reply-To: <CABXB=RRFfparjXm7_f6aaWHHbpUBoDWOLsjTyWdZmyKx3d2zAw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote: > On Sat, Feb 1, 2014 at 10:53 PM, Rick Macklem <rmacklem@uoguelph.ca> > wrote: > > Btw, if you do want to test with O_DIRECT ("-I"), you should enable > > direct io in the client. > > sysctl vfs.nfs.nfs_directio_enable=1 > > > > I just noticed that it is disabled by default. This means that your > > "-I" was essentially being ignored by the FreeBSD client. > > Ouch. Yes, that appears to be correct. > > > It also explains why Linux isn't doing a read before write, since > > that wouldn't happen for direct I/O. You should test Linux without > > "-I" > > and see if it still doesn't do the read before write, including a > > "-r 2k" > > to avoid the "just happens to be a page size" case. > > With O_DIRECT, the Linux client reads only during the read tests. > Without O_DIRECT, the Linux client does *no reads at all*, not even > for the read tests. It caches the whole file and returns > commensurately irrelevant/silly performance numbers (e.g. 7.2GiB/sec > random reads). > It looks like the "-U" option can be used to get iozone to unmount/remount the file system and avoid hits on the buffer cache. (Alternately, a single run after doing a manual dismount/mount might help.) > Setting the sysctl on the FreeBSD client does stop it from doing the > excess reads. Ironically this actually makes O_DIRECT improve > performance for all the workloads punished by that behavior. > > It also creates a fairly consistent pattern indicating performance > being bottlenecked by the FreeBSD NFS server. > > Here is a sample of test results, showing both throughput and IOPS > achieved: > > https://imageshack.com/i/4jiljhp > > In this chart, the 64k test is run 4 times, once with FreeBSD as both > client and server (64k), once with Linux as the client and FreeBSD as > the server (L/F 64k), once with FreeBSD as the client and Linux as > the > server (F/L 64k, hands down the best NFS combo), and once with Linux > as the client and server (L/L 64k). For reference, the native > performance of the md0 filesystem is also included. > > The TLDR version of this chart is that the FreeBSD NFS server is the > primary bottleneck; it is not being held back by the network or the > underlying disk. Ideally, it would be nice to see the 64k column for > FreeBSD client / FreeBSD server as high as or higher than the column > for FreeBSD client / Linux Server. (Also, the bottleneck of the F/L > 64k test appears to be CPU on the FreeBSD client.) > > The more detailed findings are: > > 1) The Linux client runs at about half the IOPS of the FreeBSD client > regardless of server type. The gut-level suspicion is that it must > be > doing twice as many NFS operations per write. (Possibly commit?) > > 2) The FreeBSD NFS server seems capped at around 6300 IOPS. This is > neither a limit of the network (at least 28k IOPS) nor the filesystem > (about 40k IOPs). > > 3) When O_DIRECT is not used (not shown), the excess read operations > pull from the same 6300 IOPS bucket, and that's what kills small > writes. > > 4) It's possible that the sharp drop off visible at the 64k/64k test > is a result of doubling the number of packets traversing the > TSO-capable network. > > Here's a representative top from the server while the test is > running, > showing all the nfsd kernel threads being utilized, and spare RAM and > CPU: > > last pid: 14996; load averages: 0.58, 0.17, 0.10 > up 5+01:42:13 04:02:24 > > 255 processes: 4 running, 223 sleeping, 28 waiting > > CPU: 0.0% user, 0.0% nice, 55.9% system, 9.2% interrupt, 34.9% > idle > > Mem: 2063M Active, 109M Inact, 1247M Wired, 1111M Buf, 4492M Free > > ARC: 930K Total, 41K MFU, 702K MRU, 16K Anon, 27K Header, 143K Other > > Swap: 8192M Total, 8192M Free > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > > 11 root 155 ki31 0K 32K RUN 1 120.8H 65.87% > idle{idle: cpu1} > > 11 root 155 ki31 0K 32K RUN 0 120.6H 48.63% > idle{idle: cpu0} > > 12 root -92 - 0K 448K WAIT 0 13:41 14.55% > intr{irq268: virtio_p} > > 1001 root -8 - 0K 16K mdwait 0 19:37 12.99% md0 > > 13 root -8 - 0K 48K - 0 10:53 6.05% > geom{g_down} > > 12 root -92 - 0K 448K WAIT 1 3:13 4.64% > intr{irq269: virtio_p} > > 859 root -4 0 9912K 1824K ufs 1 2:00 3.08% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:11 2.83% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:04 2.64% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 2:00 2.29% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 1 6:08 2.20% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 5:40 2.20% > nfsd{nfsd: master} > > 859 root -4 0 9912K 1824K ufs 1 2:00 1.95% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:50 1.90% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 2:47 1.66% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K RUN 0 2:13 1.66% > nfsd{nfsd: service} > > 13 root -8 - 0K 48K - 0 1:55 1.46% > geom{g_up} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:39 1.42% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 5:18 1.32% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 1:55 1.12% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:00 0.98% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 2:01 0.73% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 5:56 0.49% > nfsd{nfsd: service} > > > All of this tends to exonerate the client. So what would be the next > step to track down the cause of poor performance on the server-side? > Also, as an alternative to setting the 2 sysctls for the TCP DRC cache, you can simply disable it by setting the sysctl: vfs.nfsd.cachetcp=0 Many would argue that doing a DRC for TCP isn't necessary (it wasn't done in most NFS servers, including the old NFS server in FreeBSD). I have no idea if the Linux server does a DRC for TCP. And make sure you've increased your nfsd thread count. You can put a line like this in your /etc/rc.conf to do that: nfs_server_flags="-u -t -n 64" (sets it to 64, which should be plenty for a single client.) rick > Thanks! > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?320778540.2326389.1391472197902.JavaMail.root>