From owner-freebsd-fs@FreeBSD.ORG  Tue Mar 18 08:40:03 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A6530DFE
 for <freebsd-fs@freebsd.org>; Tue, 18 Mar 2014 08:40:03 +0000 (UTC)
Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com
 [IPv6:2a00:1450:400c:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3CCE9C76
 for <freebsd-fs@freebsd.org>; Tue, 18 Mar 2014 08:40:03 +0000 (UTC)
Received: by mail-we0-f173.google.com with SMTP id w61so5592387wes.32
 for <freebsd-fs@freebsd.org>; Tue, 18 Mar 2014 01:40:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=0tfrQowP3LRNTZR3iaYTLGX50WGj0QBKBMaCVZCou3g=;
 b=Ud6i5Q8EpS4c840ad57GS+BzJlu+iiM2iVVGm1/5Nlt56OAd18oBd3vfa0spIaaLja
 KM2nNGrl8c0mWZu4zuMqZvYuOqKXijzbPBeIZatWcK/fnclr5CGbJ1VtvoE/r7yZoSXX
 WOXzq88J0ybrSeubBr7vzHUCofB8YuWvAdDQs5XZys4+3w0dlsoj97IY0izNljTdYSDd
 TxGVFi+iTfmqfpoZ8CcvRViOVLZOz6lwVjsPSa2HAKnk9iUo0W7GLZAtxDtrPs10BASi
 Z9Xk0ZBzCQEvp/yONeIPRBLGS7bc0D+ci42NQt2qax14hIfRq/2jlrUC7FhGo3URS8RX
 xbjw==
X-Received: by 10.180.105.65 with SMTP id gk1mr13764184wib.12.1395132000830;
 Tue, 18 Mar 2014 01:40:00 -0700 (PDT)
Received: from mavbook.mavhome.dp.ua ([134.249.139.101])
 by mx.google.com with ESMTPSA id t5sm45255229wjw.15.2014.03.18.01.39.58
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 18 Mar 2014 01:39:59 -0700 (PDT)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <5328065D.60201@FreeBSD.org>
Date: Tue, 18 Mar 2014 10:39:57 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Rick Macklem <rmacklem@uoguelph.ca>, 
 FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject: Re: review/test: NFS patch to use pagesize mbuf clusters
References: <570922189.23999456.1395105983047.JavaMail.root@uoguelph.ca>
In-Reply-To: <570922189.23999456.1395105983047.JavaMail.root@uoguelph.ca>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Mar 2014 08:40:03 -0000

Hi.

On 18.03.2014 03:26, Rick Macklem wrote:
> Several of the TSO capable network interfaces have a limit of
> 32 mbufs in the transmit mbuf chain (the drivers call these transmit
> segments, which I admit I find confusing).
>
> For a 64K read/readdir reply or 64K write request, NFS passes
> a list of 34 mbufs down to TCP. TCP will split the list, since
> it is slightly more than 64K bytes, but that split will normally
> be a copy by reference of the last mbuf cluster. As such, normally
> the network interface will get a list of 34 mbufs.
>
> For TSO enabled interfaces that are limited to 32 mbufs in the
> list, the usual workaround in the driver is to copy { real copy,
> not copy by reference } the list to 32 mbuf clusters via m_defrag().
> (A few drivers use m_collapse() which is less likely to succeed.)
>
> As a workaround to this problem, the attached patch modifies NFS
> to use larger pagesize clusters, so that the 64K RPC message is
> in 18 mbufs (assuming a 4K pagesize).
>
> Testing on my slow hardware which does not have TSO capability
> shows it to be performance neutral, but I believe avoiding the
> overhead of copying via m_defrag() { and possible failures
> resulting in the message never being transmitted } makes this
> patch worth doing.
>
> As such, I'd like to request review and/or testing of this patch
> by anyone who can do so.

First, I've tried to find respective NIC to test: cxgb/cxgbe have limit 
of 36, and so probably unaffected, ixgb -- 100, igb -- 64, only on em 
I've found limit of 32.

I run several profiles on em NIC with and without the patch. I can 
confirm that without the patch m_defrag() is indeed called, while with 
patch it is not any more. But profiler shows to me that very small 
amount of time (percents or even fractions) is spent there. I can't 
measure the effect (my Core-i7 desktop test system has only about 5% CPU 
load while serving full 1Gbps NFS over the em), though I can't say for 
sure that effect can't be there on some low-end system.

I am also not very sure about replacing M_WAITOK with M_NOWAIT. Instead 
of waiting a bit while VM find a cluster, NFSMCLGET() will return single 
mbuf, as result, replacing chain of 2K clusters instead of 4K ones with 
chain of 256b mbufs.

-- 
Alexander Motin