From owner-freebsd-net@FreeBSD.ORG Fri Jan 31 23:17:02 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 701F2D5C; Fri, 31 Jan 2014 23:17:02 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 21E5B1364; Fri, 31 Jan 2014 23:17:01 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqQEAKwu7FKDaFve/2dsb2JhbABZg0RXgwG6CU+BInSCJQEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBHASHXAgNrAChMBeBKY0BBwEBARo0B4JvgUkEiUmMDoQFkG+DSx4xewkXIg X-IronPort-AV: E=Sophos;i="4.95,760,1384318800"; d="scan'208";a="92176026" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 31 Jan 2014 18:16:45 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 79BA8B3EFE; Fri, 31 Jan 2014 18:16:45 -0500 (EST) Date: Fri, 31 Jan 2014 18:16:45 -0500 (EST) From: Rick Macklem To: J David Message-ID: <1622306213.1079665.1391210205488.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: Terrible NFS performance under 9.2-RELEASE? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-net@freebsd.org, Garrett Wollman X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jan 2014 23:17:02 -0000 J David wrote: > On Fri, Jan 31, 2014 at 1:18 AM, wrote: > > This is almost entirely wrong in its description of the non-offload > > case. > > Yes, you're quite right; I confused myself. GSO works a little > differently, but FreeBSD doesn't use that. > > > The whole mess is then passed on to the hardware for > > offload, if it fits. > > That's the point, NFS is creating a situation where it never fits. > It > can't shove 65k into 64k, so it ends up looping back through the > whole > output routine again for a tiny tail of data, and then the same for > the input routine on the other side. Arguably that makes rsize/wsize > 65536 negligibly different than rsize/wsize 32768 in the long run > because the average data output per pass is about the same (64k + 1k > vs 33k + 33k). Except, of course, in the case where almost all files > are between 32k and 60k. > You can certainly try "-o rsize=61440,wsize=61440" (assuming a 4K page size) for the mount, if you'd like. There is a bug (that is a 1 line patch I keep forgetting to put in) where, if you choose an rsize,wsize not an exact multiple of PAGE_SIZE, mmap'd files can get garbage from the partially valid pages. However, I'm pretty sure you are safe so long as you specify exact multiples of PAGE_SIZE. The default size is the size recommended by the NFS server, capped at MAXBSIZE. (Btw, Solaris10 recommends 256K and allows 1Mbyte. FreeBSD recommends and allows MAXBSIZE.) I'll admit I'm not convinced that the reduced overheads of using 61440 outweight the fact that the server file systems use blocksizes that are always a power of 2. Without good evidence that using 61440 is better, I wouldn't want the server recommending that. (And I don't know how NFS would know that it is sending on a TSO enabled interface.) rick > Please don't get me wrong, I'm not suggesting there's anything more > than a small CPU reduction to be obtained by changing this. Which is > not nothing if the client is CPU-limited due to the other work it's > doing, but it's not much. To get real speedups from NFS would > require > a change to the punishing read-before-write behavior, which is pretty > clearly not going to happen. > > > RPC responses will only get smushed together if > > tcp_output() wasn't able to schedule the transmit immediately, and > > if > > the network is working properly, that will only happen if there's > > more > > than one client-side-receive-window's-worth of data to be > > transmitted. > > This is something I have seen live in tcpdump, but then I have had so > many problems with NFS and congestion control that the "network is > working properly" condition probably isn't satisfied. Hopefully the > jumbo cluster changes will resolve that once and for all. > > Thanks! > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >