From owner-freebsd-hackers Sun May 5 18:41: 6 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id 603C037B403 for ; Sun, 5 May 2002 18:41:04 -0700 (PDT) Received: from pool0632.cvx21-bradley.dialup.earthlink.net ([209.179.194.122] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #2) id 174XUr-0001rw-00; Sun, 05 May 2002 18:41:02 -0700 Message-ID: <3CD5DF0E.481BBFA4@mindspring.com> Date: Sun, 05 May 2002 18:40:30 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Anthony Schneider Cc: Patrick Thomas , freebsd-hackers@FreeBSD.ORG Subject: Re: what causes a userland to stop, but allows kernel to continue ? References: <20020505162455.K86733-100000@utility.clubscholarship.com> <20020505211731.A1386@mail.slc.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Anthony Schneider wrote: > Livelock, maybe? Is there some sort of internal kernel semaphore table which > might be getting filled up or something? I'd also like to find out more about > this, but sadly, the machine is a remote one and I can't drop into ddb as > suggested... > Thanks you all very much. Hope this information is of use. > -Anthony. More likely, you have run out of some non-renewable resource, such as mbufs, and are in the midst of a deadly embrace deadlock (e.g. as a result of having no mbufs to send responses or receive acknowledgements which would free up mbufs currently held for TCP sessions in progress, etc.). The easies way to see this is to periodically record vmstat -m and netstat -m output to a disk file, and sync, in order to make sure that it's recorded at the time you must reset. Then plot the information over time, up to the point of the failure, and you will likely see the problem in gory detail. If it is something like mbuf starvation, then you should clamp the total number of sockets that are permitted to be open at half the maximum window size divided into the number of mbufs available, minus 10% for a reserve. In general, the "tuning" page is broken; a number of the things it suggests tuning via systctl at run time are not actually tunable at run time, only at boot time. Though at run time, they will remove the top end limits, they will in fact not result in the reservation of sufficient resource to meet those limits, as they would had they been in effect at boot time, instead. In particular, increasing the number of open files permitted by modifying "maxfiles" via sysctl at runtime will not add to the prereserved amount of tcpcb's, inpcb's, or socket structures, all of which could leave you starving for one of these objects, or the mbuf's needed to support them, at runtime. It pays to understand the code before fiddling the numbers. ;^). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message