From owner-freebsd-net@FreeBSD.ORG  Mon Jan 27 05:50:57 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A949CD66
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 05:50:57 +0000 (UTC)
Received: from mail-pb0-x229.google.com (mail-pb0-x229.google.com
 [IPv6:2607:f8b0:400e:c01::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7B5211298
 for <freebsd-net@freebsd.org>; Mon, 27 Jan 2014 05:50:57 +0000 (UTC)
Received: by mail-pb0-f41.google.com with SMTP id up15so5468494pbc.14
 for <freebsd-net@freebsd.org>; Sun, 26 Jan 2014 21:50:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=from:date:to:cc:subject:message-id:reply-to:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=Bw/K5LPfCjrQMxv+KYJgyeb0epoQ04Nt6bAuB1SjFps=;
 b=MyYnhd1dPuBQvMdoU0cdOv2MJ1qWTOlbQ/j8eqjueHtdo6MPKLsX2hFYrzYq2e+4wD
 j6JGLdLd1MklYq8wbPUj79E5aYtGuCp2zOhE/vkOY67yATVT5EtdTMISLQchaExM0CFh
 ozcB+LQKCyxIDPsDAsEyX6QJY921hLzzA/8HNP6qARBea4IBgtQoCEFjMk+LSjWHX0A3
 8dSs/x0yok+e62MUzRYoMCO2AuUP+HnofatjTqC7YQkwZzbBrt0A/dRD+6cyQ68hIiz9
 klOvj1Es1OPu2rkxYL9Gk55PX1qCRO2TzHMNKXchgKccqqoOW7Dog949ednxcVnLy0dx
 8nDQ==
X-Received: by 10.68.198.97 with SMTP id jb1mr28355539pbc.104.1390801856275;
 Sun, 26 Jan 2014 21:50:56 -0800 (PST)
Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249])
 by mx.google.com with ESMTPSA id e6sm28142111pbg.4.2014.01.26.21.50.52
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Sun, 26 Jan 2014 21:50:55 -0800 (PST)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
 Mon, 27 Jan 2014 14:50:47 +0900
From: Yonghyeon PYUN <pyunyh@gmail.com>
Date: Mon, 27 Jan 2014 14:50:47 +0900
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID: <20140127055047.GA1368@michelle.cdnetworks.com>
References: <52DC1241.7010004@egr.msu.edu>
 <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jan 2014 05:50:57 -0000

On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote:
> Adam McDougall wrote:
> > Also try rsize=32768,wsize=32768 in your mount options, made a huge
> > difference for me.  I've noticed slow file transfers on NFS in 9 and
> > finally did some searching a couple months ago, someone suggested it
> > and
> > they were on to something.
> > 
> I have a "hunch" that might explain why 64K NFS reads/writes perform
> poorly for some network environments.
> A 64K NFS read reply/write request consists of a list of 34 mbufs when
> passed to TCP via sosend() and a total data length of around 65680bytes.
> Looking at a couple of drivers (virtio and ixgbe), they seem to expect
> no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think
> (I don't have anything that does TSO to confirm this) that NFS will pass
> a list that is longer (34 plus a TCP/IP header).
> At a glance, it appears that the drivers call m_defrag() or m_collapse()
> when the mbuf list won't fit in their scatter table (32 or 33 elements)
> and if this fails, just silently drop the data without sending it.
> If I'm right, there would considerable overhead from m_defrag()/m_collapse()
> and near disaster if they fail to fix the problem and the data is silently
> dropped instead of xmited.
> 

I think the actual number of DMA segments allocated for the mbuf
chain is determined by bus_dma(9).  bus_dma(9) will coalesce
current segment with previous segment if possible.

I'm not sure whether you're referring to ixgbe(4) or ix(4) but I
see the total length of all segment size of ix(4) is 65535 so
it has no room for ethernet/VLAN header of the mbuf chain.  The
driver should be fixed to transmit a 64KB datagram.
I think the use of m_defrag(9) in TSO is suboptimal. All TSO
capable controllers are able to handle multiple TX buffers so it
should have used m_collapse(9) rather than copying entire chain
with m_defrag(9).

> Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters,
> so the mbuf count drops from 34 to 18.
> 

Could we make it conditional on size?

> If anyone has a TSO scatter/gather enabled net interface and can test this
> patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled
> and see what effect it has, that would be appreciated.
> 
> Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE
> clusters.
> 
> rick
> ps: If the attachment doesn't make it through and you want the patch, just
>     email me and I'll send you a copy.
>