From owner-freebsd-net@FreeBSD.ORG Mon Jan 27 05:50:57 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A949CD66 for ; Mon, 27 Jan 2014 05:50:57 +0000 (UTC) Received: from mail-pb0-x229.google.com (mail-pb0-x229.google.com [IPv6:2607:f8b0:400e:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7B5211298 for ; Mon, 27 Jan 2014 05:50:57 +0000 (UTC) Received: by mail-pb0-f41.google.com with SMTP id up15so5468494pbc.14 for ; Sun, 26 Jan 2014 21:50:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Bw/K5LPfCjrQMxv+KYJgyeb0epoQ04Nt6bAuB1SjFps=; b=MyYnhd1dPuBQvMdoU0cdOv2MJ1qWTOlbQ/j8eqjueHtdo6MPKLsX2hFYrzYq2e+4wD j6JGLdLd1MklYq8wbPUj79E5aYtGuCp2zOhE/vkOY67yATVT5EtdTMISLQchaExM0CFh ozcB+LQKCyxIDPsDAsEyX6QJY921hLzzA/8HNP6qARBea4IBgtQoCEFjMk+LSjWHX0A3 8dSs/x0yok+e62MUzRYoMCO2AuUP+HnofatjTqC7YQkwZzbBrt0A/dRD+6cyQ68hIiz9 klOvj1Es1OPu2rkxYL9Gk55PX1qCRO2TzHMNKXchgKccqqoOW7Dog949ednxcVnLy0dx 8nDQ== X-Received: by 10.68.198.97 with SMTP id jb1mr28355539pbc.104.1390801856275; Sun, 26 Jan 2014 21:50:56 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPSA id e6sm28142111pbg.4.2014.01.26.21.50.52 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 26 Jan 2014 21:50:55 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 27 Jan 2014 14:50:47 +0900 From: Yonghyeon PYUN Date: Mon, 27 Jan 2014 14:50:47 +0900 To: Rick Macklem Subject: Re: Terrible NFS performance under 9.2-RELEASE? Message-ID: <20140127055047.GA1368@michelle.cdnetworks.com> References: <52DC1241.7010004@egr.msu.edu> <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, Adam McDougall X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jan 2014 05:50:57 -0000 On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote: > Adam McDougall wrote: > > Also try rsize=32768,wsize=32768 in your mount options, made a huge > > difference for me. I've noticed slow file transfers on NFS in 9 and > > finally did some searching a couple months ago, someone suggested it > > and > > they were on to something. > > > I have a "hunch" that might explain why 64K NFS reads/writes perform > poorly for some network environments. > A 64K NFS read reply/write request consists of a list of 34 mbufs when > passed to TCP via sosend() and a total data length of around 65680bytes. > Looking at a couple of drivers (virtio and ixgbe), they seem to expect > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think > (I don't have anything that does TSO to confirm this) that NFS will pass > a list that is longer (34 plus a TCP/IP header). > At a glance, it appears that the drivers call m_defrag() or m_collapse() > when the mbuf list won't fit in their scatter table (32 or 33 elements) > and if this fails, just silently drop the data without sending it. > If I'm right, there would considerable overhead from m_defrag()/m_collapse() > and near disaster if they fail to fix the problem and the data is silently > dropped instead of xmited. > I think the actual number of DMA segments allocated for the mbuf chain is determined by bus_dma(9). bus_dma(9) will coalesce current segment with previous segment if possible. I'm not sure whether you're referring to ixgbe(4) or ix(4) but I see the total length of all segment size of ix(4) is 65535 so it has no room for ethernet/VLAN header of the mbuf chain. The driver should be fixed to transmit a 64KB datagram. I think the use of m_defrag(9) in TSO is suboptimal. All TSO capable controllers are able to handle multiple TX buffers so it should have used m_collapse(9) rather than copying entire chain with m_defrag(9). > Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters, > so the mbuf count drops from 34 to 18. > Could we make it conditional on size? > If anyone has a TSO scatter/gather enabled net interface and can test this > patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled > and see what effect it has, that would be appreciated. > > Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE > clusters. > > rick > ps: If the attachment doesn't make it through and you want the patch, just > email me and I'll send you a copy. >