From owner-freebsd-hackers  Fri Nov 30 18:32:41 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from ussenterprise.ufp.org (ussenterprise.ufp.org [208.185.30.210])
	by hub.freebsd.org (Postfix) with ESMTP id A9D2337B405
	for <freebsd-hackers@FreeBSD.ORG>; Fri, 30 Nov 2001 18:32:36 -0800 (PST)
Received: (from bicknell@localhost)
	by ussenterprise.ufp.org (8.11.1/8.11.1) id fB12WY604716;
	Fri, 30 Nov 2001 21:32:34 -0500 (EST)
	(envelope-from bicknell)
Date: Fri, 30 Nov 2001 21:32:34 -0500
From: Leo Bicknell <bicknell@ufp.org>
To: Luigi Rizzo <rizzo@aciri.org>
Cc: Mike Silbersack <silby@silby.com>,
	Alfred Perlstein <bright@mu.org>, freebsd-hackers@FreeBSD.ORG
Subject: Re: TCP Performance Graphs
Message-ID: <20011130213234.A4327@ussenterprise.ufp.org>
Mail-Followup-To: Luigi Rizzo <rizzo@aciri.org>,
	Mike Silbersack <silby@silby.com>, Alfred Perlstein <bright@mu.org>,
	freebsd-hackers@FreeBSD.ORG
References: <20011130171418.B96592@ussenterprise.ufp.org> <Pine.BSF.4.30.0111301717290.10049-100000@niwun.pair.com> <20011130173033.G33041@iguana.aciri.org> <20011130203905.A2944@ussenterprise.ufp.org> <20011130174816.H33041@iguana.aciri.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20011130174816.H33041@iguana.aciri.org>; from rizzo@aciri.org on Fri, Nov 30, 2001 at 05:48:16PM -0800
Organization: United Federation of Planets
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

On Fri, Nov 30, 2001 at 05:48:16PM -0800, Luigi Rizzo wrote:
> On Fri, Nov 30, 2001 at 08:39:05PM -0500, Leo Bicknell wrote:
> > Note that if we implement a 'fair share' buffering scheme we would
> > never get a failure, which would be a good thing.  Unfortuantely
> > fair share is relatively complicated.
> 
> i don't get this. There is no relation among the max number
> of mbufs and their potential consumers, such as network interfaces,
> sockets, dummynet pipes, and others. And so it is unavoidable
> that even giving 1 mbuf each, you'll eventually fail an allocation.

Well, this is true.  If the number of sockets exceeds the number
of MBUF's you will run out, no matter how well you allocate them.
A corner case that should be handled delicately, no doubt, but one
much less likely to happen.  If each client was limited to one, or
even two MBUF's total throughput would be so slow that the admin
of the box would notice.  That, added to that fact that there are
thousands of MBUF's by default makes it nearly impossible that the
"ignorant sysadmin" (aka desktop it should just work user) would
run into this case.

So, I will rephrase.  I think a fair-share scheme would solve this
for at least 5 9's of the problem.

> But note that what you say about bad failures is not really true.
> Many pieces of the kernel now are pretty robust in the face of
> failures -- certainly dummynet pipes, and the "sis" and "dc" drivers

I'm my 'bad failures' is not so much that the box would crash or
otherwise completely break itself.  Rather my experience with
exhausing MBUF's is that you can experience a sort of "capture"
situation, where one or more busy connections can essentially starve
out inactive connections.  Those inactive connections may well
be your ssh session where you're trying to fix it.  Network
performance when MBUF's are exhausted is eratic at best, and at
worst completely stopped for a large number of processes on the
system today.

The nasty QoS word popped up when we talked about this before, that
a QoS scheme could insure some connections go MBUF's, or even if
there were more connections than MBUF's insure that connections
got two at a time in a 'round robin' fashion or some other sheme
to keep everything moving.

If I could redesign buffering (from a TCP point of view) from the
ground up I would:

 - Make the buffer size dymanic.  Perhaps not at interrupt, but
   in a "unified vm" network should be able to take resources if
   it is active.

 - Make the buffers dynamically track individual connections.

 - Implement a fair-share mechanism.

 - Provide instrumentation to track when connections are slowed
   for lack of MBUF's.

 - Provide tuning parameters and maybe QoS parameters to be able
   to manage total buffer usage, individual connection buffer
   usage, and connection priorities.

-- 
       Leo Bicknell - bicknell@ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/
Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message