From owner-freebsd-fs@FreeBSD.ORG Tue Mar 18 08:40:03 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A6530DFE for ; Tue, 18 Mar 2014 08:40:03 +0000 (UTC) Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com [IPv6:2a00:1450:400c:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3CCE9C76 for ; Tue, 18 Mar 2014 08:40:03 +0000 (UTC) Received: by mail-we0-f173.google.com with SMTP id w61so5592387wes.32 for ; Tue, 18 Mar 2014 01:40:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=0tfrQowP3LRNTZR3iaYTLGX50WGj0QBKBMaCVZCou3g=; b=Ud6i5Q8EpS4c840ad57GS+BzJlu+iiM2iVVGm1/5Nlt56OAd18oBd3vfa0spIaaLja KM2nNGrl8c0mWZu4zuMqZvYuOqKXijzbPBeIZatWcK/fnclr5CGbJ1VtvoE/r7yZoSXX WOXzq88J0ybrSeubBr7vzHUCofB8YuWvAdDQs5XZys4+3w0dlsoj97IY0izNljTdYSDd TxGVFi+iTfmqfpoZ8CcvRViOVLZOz6lwVjsPSa2HAKnk9iUo0W7GLZAtxDtrPs10BASi Z9Xk0ZBzCQEvp/yONeIPRBLGS7bc0D+ci42NQt2qax14hIfRq/2jlrUC7FhGo3URS8RX xbjw== X-Received: by 10.180.105.65 with SMTP id gk1mr13764184wib.12.1395132000830; Tue, 18 Mar 2014 01:40:00 -0700 (PDT) Received: from mavbook.mavhome.dp.ua ([134.249.139.101]) by mx.google.com with ESMTPSA id t5sm45255229wjw.15.2014.03.18.01.39.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 18 Mar 2014 01:39:59 -0700 (PDT) Sender: Alexander Motin Message-ID: <5328065D.60201@FreeBSD.org> Date: Tue, 18 Mar 2014 10:39:57 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Rick Macklem , FreeBSD Filesystems Subject: Re: review/test: NFS patch to use pagesize mbuf clusters References: <570922189.23999456.1395105983047.JavaMail.root@uoguelph.ca> In-Reply-To: <570922189.23999456.1395105983047.JavaMail.root@uoguelph.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Mar 2014 08:40:03 -0000 Hi. On 18.03.2014 03:26, Rick Macklem wrote: > Several of the TSO capable network interfaces have a limit of > 32 mbufs in the transmit mbuf chain (the drivers call these transmit > segments, which I admit I find confusing). > > For a 64K read/readdir reply or 64K write request, NFS passes > a list of 34 mbufs down to TCP. TCP will split the list, since > it is slightly more than 64K bytes, but that split will normally > be a copy by reference of the last mbuf cluster. As such, normally > the network interface will get a list of 34 mbufs. > > For TSO enabled interfaces that are limited to 32 mbufs in the > list, the usual workaround in the driver is to copy { real copy, > not copy by reference } the list to 32 mbuf clusters via m_defrag(). > (A few drivers use m_collapse() which is less likely to succeed.) > > As a workaround to this problem, the attached patch modifies NFS > to use larger pagesize clusters, so that the 64K RPC message is > in 18 mbufs (assuming a 4K pagesize). > > Testing on my slow hardware which does not have TSO capability > shows it to be performance neutral, but I believe avoiding the > overhead of copying via m_defrag() { and possible failures > resulting in the message never being transmitted } makes this > patch worth doing. > > As such, I'd like to request review and/or testing of this patch > by anyone who can do so. First, I've tried to find respective NIC to test: cxgb/cxgbe have limit of 36, and so probably unaffected, ixgb -- 100, igb -- 64, only on em I've found limit of 32. I run several profiles on em NIC with and without the patch. I can confirm that without the patch m_defrag() is indeed called, while with patch it is not any more. But profiler shows to me that very small amount of time (percents or even fractions) is spent there. I can't measure the effect (my Core-i7 desktop test system has only about 5% CPU load while serving full 1Gbps NFS over the em), though I can't say for sure that effect can't be there on some low-end system. I am also not very sure about replacing M_WAITOK with M_NOWAIT. Instead of waiting a bit while VM find a cluster, NFSMCLGET() will return single mbuf, as result, replacing chain of 2K clusters instead of 4K ones with chain of 256b mbufs. -- Alexander Motin