From owner-freebsd-fs@FreeBSD.ORG Wed Mar 19 00:07:02 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5295F4F1; Wed, 19 Mar 2014 00:07:02 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D9D10217; Wed, 19 Mar 2014 00:07:01 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqQEAEzfKFODaFve/2dsb2JhbABag0FXgwa3Y4ZrUYE/dIIlAQEBAwEBAQEgKyALGxgCAg0ZAikBCSYOBwQBHASHUAgNrgaiOheBKYxXCgEFAgEbNAeCb4FJBJVvhAmQfoNJITF7AR8i X-IronPort-AV: E=Sophos;i="4.97,681,1389762000"; d="scan'208";a="106769007" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 18 Mar 2014 20:06:52 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7BD69B403B; Tue, 18 Mar 2014 20:06:52 -0400 (EDT) Date: Tue, 18 Mar 2014 20:06:52 -0400 (EDT) From: Rick Macklem To: araujo@FreeBSD.org Message-ID: <459657309.24706896.1395187612496.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: review/test: NFS patch to use pagesize mbuf clusters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD Filesystems , Alexander Motin X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2014 00:07:02 -0000 Marcelo Araujo wrote: > > Hello Rick, > > > I have couple machines with 10G interface capable with TSO. > Which kind of result do you expecting? Is it a speed up in read? > Well, if NFS is working well on these systems, I would hope you don't see any regression. If your TSO enabled interfaces can handle more than 32 transmit segments (there is usually a #define constant in the driver with something like TX_SEGMAX in it and if this is >= 34 you should see very little effect). Even if your network interface is one of the ones limited to 32 transmit segments, the driver usually fixes the list via a call to m_defrag(). Although this involves a bunch of bcopy()'ng, you still might not see any easily measured performance improvement, assuming m_defrag() is getting the job done. (Network latency and disk latency in the server will predominate, I suspect. A server built entirely using SSDs might be a different story?) Thanks for doing testing, since a lack of a regression is what I care about most. (I am hoping this resolves cases where users have had to disable TSO to make NFS work ok for them.) rick > > I'm gonna make some tests today, but against 9.1-RELEASE, where my > servers are working on. > > > Best Regards, > > > > > > 2014-03-18 9:26 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca > : > > > Hi, > > Several of the TSO capable network interfaces have a limit of > 32 mbufs in the transmit mbuf chain (the drivers call these transmit > segments, which I admit I find confusing). > > For a 64K read/readdir reply or 64K write request, NFS passes > a list of 34 mbufs down to TCP. TCP will split the list, since > it is slightly more than 64K bytes, but that split will normally > be a copy by reference of the last mbuf cluster. As such, normally > the network interface will get a list of 34 mbufs. > > For TSO enabled interfaces that are limited to 32 mbufs in the > list, the usual workaround in the driver is to copy { real copy, > not copy by reference } the list to 32 mbuf clusters via m_defrag(). > (A few drivers use m_collapse() which is less likely to succeed.) > > As a workaround to this problem, the attached patch modifies NFS > to use larger pagesize clusters, so that the 64K RPC message is > in 18 mbufs (assuming a 4K pagesize). > > Testing on my slow hardware which does not have TSO capability > shows it to be performance neutral, but I believe avoiding the > overhead of copying via m_defrag() { and possible failures > resulting in the message never being transmitted } makes this > patch worth doing. > > As such, I'd like to request review and/or testing of this patch > by anyone who can do so. > > Thanks in advance for your help, rick > ps: If you don't get the attachment, just email and I'll > send you a copy. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to " freebsd-fs-unsubscribe@freebsd.org > " > > > > > -- > Marcelo Araujo > araujo@FreeBSD.org