From owner-freebsd-net@FreeBSD.ORG Tue Feb 4 00:03:26 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0F35371; Tue, 4 Feb 2014 00:03:26 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A22CB16B5; Tue, 4 Feb 2014 00:03:25 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqQEAMst8FKDaFve/2dsb2JhbABZg0RXgwG6Qk+BI3SCJQEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBHASHXAgNrE+hTheBKY0IBgEBGzQHgm+BSQSJSYwOhAWQb4NLHjF8CBci X-IronPort-AV: E=Sophos;i="4.95,775,1384318800"; d="scan'208";a="92774089" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 03 Feb 2014 19:03:17 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DF47EB4026; Mon, 3 Feb 2014 19:03:17 -0500 (EST) Date: Mon, 3 Feb 2014 19:03:17 -0500 (EST) From: Rick Macklem To: J David Message-ID: <320778540.2326389.1391472197902.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: Terrible NFS performance under 9.2-RELEASE? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-net@freebsd.org, Garrett Wollman X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Feb 2014 00:03:26 -0000 J David wrote: > On Sat, Feb 1, 2014 at 10:53 PM, Rick Macklem > wrote: > > Btw, if you do want to test with O_DIRECT ("-I"), you should enable > > direct io in the client. > > sysctl vfs.nfs.nfs_directio_enable=1 > > > > I just noticed that it is disabled by default. This means that your > > "-I" was essentially being ignored by the FreeBSD client. > > Ouch. Yes, that appears to be correct. > > > It also explains why Linux isn't doing a read before write, since > > that wouldn't happen for direct I/O. You should test Linux without > > "-I" > > and see if it still doesn't do the read before write, including a > > "-r 2k" > > to avoid the "just happens to be a page size" case. > > With O_DIRECT, the Linux client reads only during the read tests. > Without O_DIRECT, the Linux client does *no reads at all*, not even > for the read tests. It caches the whole file and returns > commensurately irrelevant/silly performance numbers (e.g. 7.2GiB/sec > random reads). > It looks like the "-U" option can be used to get iozone to unmount/remount the file system and avoid hits on the buffer cache. (Alternately, a single run after doing a manual dismount/mount might help.) > Setting the sysctl on the FreeBSD client does stop it from doing the > excess reads. Ironically this actually makes O_DIRECT improve > performance for all the workloads punished by that behavior. > > It also creates a fairly consistent pattern indicating performance > being bottlenecked by the FreeBSD NFS server. > > Here is a sample of test results, showing both throughput and IOPS > achieved: > > https://imageshack.com/i/4jiljhp > > In this chart, the 64k test is run 4 times, once with FreeBSD as both > client and server (64k), once with Linux as the client and FreeBSD as > the server (L/F 64k), once with FreeBSD as the client and Linux as > the > server (F/L 64k, hands down the best NFS combo), and once with Linux > as the client and server (L/L 64k). For reference, the native > performance of the md0 filesystem is also included. > > The TLDR version of this chart is that the FreeBSD NFS server is the > primary bottleneck; it is not being held back by the network or the > underlying disk. Ideally, it would be nice to see the 64k column for > FreeBSD client / FreeBSD server as high as or higher than the column > for FreeBSD client / Linux Server. (Also, the bottleneck of the F/L > 64k test appears to be CPU on the FreeBSD client.) > > The more detailed findings are: > > 1) The Linux client runs at about half the IOPS of the FreeBSD client > regardless of server type. The gut-level suspicion is that it must > be > doing twice as many NFS operations per write. (Possibly commit?) > > 2) The FreeBSD NFS server seems capped at around 6300 IOPS. This is > neither a limit of the network (at least 28k IOPS) nor the filesystem > (about 40k IOPs). > > 3) When O_DIRECT is not used (not shown), the excess read operations > pull from the same 6300 IOPS bucket, and that's what kills small > writes. > > 4) It's possible that the sharp drop off visible at the 64k/64k test > is a result of doubling the number of packets traversing the > TSO-capable network. > > Here's a representative top from the server while the test is > running, > showing all the nfsd kernel threads being utilized, and spare RAM and > CPU: > > last pid: 14996; load averages: 0.58, 0.17, 0.10 > up 5+01:42:13 04:02:24 > > 255 processes: 4 running, 223 sleeping, 28 waiting > > CPU: 0.0% user, 0.0% nice, 55.9% system, 9.2% interrupt, 34.9% > idle > > Mem: 2063M Active, 109M Inact, 1247M Wired, 1111M Buf, 4492M Free > > ARC: 930K Total, 41K MFU, 702K MRU, 16K Anon, 27K Header, 143K Other > > Swap: 8192M Total, 8192M Free > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > > 11 root 155 ki31 0K 32K RUN 1 120.8H 65.87% > idle{idle: cpu1} > > 11 root 155 ki31 0K 32K RUN 0 120.6H 48.63% > idle{idle: cpu0} > > 12 root -92 - 0K 448K WAIT 0 13:41 14.55% > intr{irq268: virtio_p} > > 1001 root -8 - 0K 16K mdwait 0 19:37 12.99% md0 > > 13 root -8 - 0K 48K - 0 10:53 6.05% > geom{g_down} > > 12 root -92 - 0K 448K WAIT 1 3:13 4.64% > intr{irq269: virtio_p} > > 859 root -4 0 9912K 1824K ufs 1 2:00 3.08% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:11 2.83% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:04 2.64% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 2:00 2.29% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 1 6:08 2.20% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 5:40 2.20% > nfsd{nfsd: master} > > 859 root -4 0 9912K 1824K ufs 1 2:00 1.95% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:50 1.90% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 2:47 1.66% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K RUN 0 2:13 1.66% > nfsd{nfsd: service} > > 13 root -8 - 0K 48K - 0 1:55 1.46% > geom{g_up} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:39 1.42% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 5:18 1.32% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 1:55 1.12% > nfsd{nfsd: service} > > 859 root -8 0 9912K 1824K rpcsvc 0 2:00 0.98% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 2:01 0.73% > nfsd{nfsd: service} > > 859 root -4 0 9912K 1824K ufs 1 5:56 0.49% > nfsd{nfsd: service} > > > All of this tends to exonerate the client. So what would be the next > step to track down the cause of poor performance on the server-side? > Also, as an alternative to setting the 2 sysctls for the TCP DRC cache, you can simply disable it by setting the sysctl: vfs.nfsd.cachetcp=0 Many would argue that doing a DRC for TCP isn't necessary (it wasn't done in most NFS servers, including the old NFS server in FreeBSD). I have no idea if the Linux server does a DRC for TCP. And make sure you've increased your nfsd thread count. You can put a line like this in your /etc/rc.conf to do that: nfs_server_flags="-u -t -n 64" (sets it to 64, which should be plenty for a single client.) rick > Thanks! > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >