From owner-freebsd-hackers  Sun May  5 18:41: 6 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22])
	by hub.freebsd.org (Postfix) with ESMTP id 603C037B403
	for <freebsd-hackers@freebsd.org>; Sun,  5 May 2002 18:41:04 -0700 (PDT)
Received: from pool0632.cvx21-bradley.dialup.earthlink.net ([209.179.194.122] helo=mindspring.com)
	by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #2)
	id 174XUr-0001rw-00; Sun, 05 May 2002 18:41:02 -0700
Message-ID: <3CD5DF0E.481BBFA4@mindspring.com>
Date: Sun, 05 May 2002 18:40:30 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Anthony Schneider <aschneid@mail.slc.edu>
Cc: Patrick Thomas <root@utility.clubscholarship.com>,
	freebsd-hackers@FreeBSD.ORG
Subject: Re: what causes a userland to stop, but allows kernel to continue ?
References: <20020505162455.K86733-100000@utility.clubscholarship.com> <20020505211731.A1386@mail.slc.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Anthony Schneider wrote:
> Livelock, maybe?  Is there some sort of internal kernel semaphore table which
> might be getting filled up or something?  I'd also like to find out more about
> this, but sadly, the machine is a remote one and I can't drop into ddb as
> suggested...
> Thanks you all very much.  Hope this information is of use.
> -Anthony.

More likely, you have run out of some non-renewable resource,
such as mbufs, and are in the midst of a deadly embrace deadlock
(e.g. as a result of having no mbufs to send responses or receive
acknowledgements which would free up mbufs currently held for TCP
sessions in progress, etc.).

The easies way to see this is to periodically record vmstat -m
and netstat -m output to a disk file, and sync, in order to make
sure that it's recorded at the time you must reset.

Then plot the information over time, up to the point of the failure,
and you will likely see the problem in gory detail.

If it is something like mbuf starvation, then you should clamp the
total number of sockets that are permitted to be open at half the
maximum window size divided into the number of mbufs available,
minus 10% for a reserve.


In general, the "tuning" page is broken; a number of the things it
suggests tuning via systctl at run time are not actually tunable at
run time, only at boot time.  Though at run time, they will remove
the top end limits, they will in fact not result in the reservation
of sufficient resource to meet those limits, as they would had they
been in effect at boot time, instead.

In particular, increasing the number of open files permitted by
modifying "maxfiles" via sysctl at runtime will not add to the
prereserved amount of tcpcb's, inpcb's, or socket structures,
all of which could leave you starving for one of these objects,
or the mbuf's needed to support them, at runtime.

It pays to understand the code before fiddling the numbers.  ;^).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message