From owner-freebsd-stable@FreeBSD.ORG Wed Jan 10 19:21:15 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 11DAD16A55F for ; Wed, 10 Jan 2007 19:21:15 +0000 (UTC) (envelope-from junics-fbsdstable@atlantis.maniacs.se) Received: from mammoth.unixsh.net (mammoth.unixsh.net [195.35.83.67]) by mx1.freebsd.org (Postfix) with SMTP id 1E49213C4E6 for ; Wed, 10 Jan 2007 19:21:09 +0000 (UTC) (envelope-from junics-fbsdstable@atlantis.maniacs.se) Received: (qmail 237 invoked from network); 10 Jan 2007 19:21:08 -0000 Received: from localhost.maniacs.se (HELO ?192.168.0.34?) (127.0.0.1) by localhost.maniacs.se with SMTP; 10 Jan 2007 19:21:08 -0000 Message-ID: <45A53CA3.7070302@atlantis.maniacs.se> Date: Wed, 10 Jan 2007 20:21:07 +0100 From: Thomas Herrlin User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <1167247246.96863.23.camel@opus.cse.buffalo.edu> <4593DC34.2030308@atlantis.maniacs.se> <1167320870.52842.20.camel@opus.cse.buffalo.edu> <4593E790.8080509@delphij.net> <45A3BB4E.3@freebsd.org> In-Reply-To: <45A3BB4E.3@freebsd.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: "Bruce A. Mah" Subject: Re: FreeBSD 6.2-RC2 Available - networking zoneli freeze problem still exist. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jan 2007 19:21:15 -0000 Bruce A. Mah wrote: > If memory serves me right, LI Xin wrote: >> Ken Smith wrote: >>> On Thu, 2006-12-28 at 16:01 +0100, Thomas Herrlin wrote: >>>> It still runs networking daemons into a frozen zoneli state on >>>> heavy/(D)DOS network loads. Such processes cant be kill-9ed so there is >>>> no way to recover from it. (think frozen sshd and a very remote/headless >>>> server). >>>> See the stress test panic called 'Ran out of "128 Bucket" >>>> ' on the 6.2 >>>> todo list and my own latest test here: >>>> http://www.maniacs.se/~junics/temp/vmstat-z.txt >>>> This test was on a new 6.2-RC2 install with no zone limit tweaks nor any >>>> sbsize limits in /etc/login.conf. >>>> I just made a vm disk image with replication instructions, however Peter >>>> Holm have replicated it with his own tools so i have not bothered with >>>> it until now. >>> That problem is being worked on but won't be fixed for 6.2-REL. >>> Depending on how complex the fix winds up being it may be an Errata >>> candidate when the time comes. >> Perhaps we should mention some known workarounds in the errata >> documentation. E.g. raising nmbclusters limit, etc.? > > That's a good idea. Do you have more specifics (e.g. any particular > nmbclusters value, other workarounds, etc.)? > > Thanks, > > Bruce. > The most reliable way of avoiding zoneli according to my tests is setting an sbsize limit in /etc/login.conf to a value lower than the mbuf_cluster zone size limitation, note that there are 2048 bytes per cluster. (See vmstat -z for details) Or set the login.conf sbsize to a fraction of available RAM and combine this with the 0/unlimited setting as some recommend. Combining these two workarounds would probably be best, as setting mbuf to use unlimited ram for networking would cause a panic or freeze sooner or later anyway. I have not tested combining this yet as my system has been running stable for some time now with my current workarounds. Problems with sbsize limit: Setting sbsize in login.conf will lead to that some processes will run into a problem that they cannot allocate socket buffers in some extreme cases, however this will not affect overall system stability and that is my first priority. I have also thrown together a small executable that attempts local connection to its sshd with a the preliminary ssh handshake and that can be used with watchdogd -e parameter to reboot the box. This is mainly for headless/remote servers that MUST NOT have its sshd frozen. You can also read my mail to the fbsd-current list with the subject "Re: zonelimit livelock, some possable workarounds" /Thomas Herrlin