From owner-freebsd-net@FreeBSD.ORG  Sun Mar 10 22:29:56 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 3DCA1C0B
 for <freebsd-net@freebsd.org>; Sun, 10 Mar 2013 22:29:56 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id 904E537C
 for <freebsd-net@freebsd.org>; Sun, 10 Mar 2013 22:29:55 +0000 (UTC)
Received: (qmail 97388 invoked from network); 10 Mar 2013 23:42:46 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <rmacklem@uoguelph.ca>; 10 Mar 2013 23:42:46 -0000
Message-ID: <513D0962.5080606@freebsd.org>
Date: Sun, 10 Mar 2013 23:29:54 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: Limits on jumbo mbuf cluster allocation
References: <1841214504.3736248.1362882169721.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <1841214504.3736248.1362882169721.JavaMail.root@erie.cs.uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org, Garrett Wollman <wollman@hergotha.csail.mit.edu>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Mar 2013 22:29:56 -0000

On 10.03.2013 03:22, Rick Macklem wrote:
> Garett Wollman wrote:
>> Also, it occurs to me that this strategy is subject to livelock. To
>> put backpressure on the clients, it is far better to get them to stop
>> sending (by advertising a small receive window) than to accept their
>> traffic but queue it for a long time. By the time the NFS code gets
>> an RPC, the system has already invested so much into it that it should
>> be processed as quickly as possible, and this strategy essentially
>> guarantees[1] that, once those 2 MB socket buffers start to fill up,
>> they
>> will stay filled, sending latency through the roof. If nfsd didn't
>> override the usual socket-buffer sizing mechanisms, then sysadmins
>> could limit the buffers to ensure a stable response time.
>>
>> The bandwidth-delay product in our network is somewhere between 12.5
>> kB and 125 kB, depending on how the client is connected and what sort
>> of latency they experience. The usual theory would suggest that
>> socket buffers should be no more than twice that -- i.e., about 256
>> kB.
>>
> Well, the code that uses sb_max_adj wasn't written by me (I just cloned
> it for the new server). In the author's defence, I believe SB_MAX was 256K when
> it was written. It was 256K in 2011. I think sb_max_adj was used because
> soreserve() fails for a larger value and the code doesn't check for such a failure.
> (Yea, it should be fixed so that it checks for a failure return from soreserve().
>   I did so for the client some time ago.;-)

We have TCP sockbuf size autotuning for some time now.  So explicitly
setting the size shouldn't be necessary anymore.

> Just grep for sb_max_adj. You'll see it sets a variable called "siz".
> Make "siz" whatever you want (256K sounds like a good guess). Just make
> sure it isn't > sb_max_adj.
 >
> The I/O sizes are limited to MAXBSIZE, which is currently 64Kb, although
> I'd like to increase that to 128Kb someday soon. (As you note below, the
> largest RPC is slightly bigger than that.)
>
> Btw, net.inet.tcp.{send/recv}buf_max are both 2Mbytes, just like sb_max,
> so those don't seem useful in this case?

These are just the limits for auto-tuning.

> I'm no TCP guy, so suggestions w.r.t. how big soreserve() should be set
> are welcome.

I'd have to look more at the NFS code to see what exactly is going on
and what the most likely settings are going to be.  Won't promise any
ETA though.

-- 
Andre