From owner-freebsd-stable@FreeBSD.ORG Fri Aug 1 16:10:00 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 89F3A106566B for ; Fri, 1 Aug 2008 16:10:00 +0000 (UTC) (envelope-from royce@alaska.net) Received: from iris.acsalaska.net (iris.acsalaska.net [209.112.173.229]) by mx1.freebsd.org (Postfix) with ESMTP id 624458FC19 for ; Fri, 1 Aug 2008 16:10:00 +0000 (UTC) (envelope-from royce@alaska.net) Received: from [10.0.102.101] (209-112-156-40-adslb0fh.acsalaska.net [209.112.156.40]) by iris.acsalaska.net (8.14.1/8.14.1) with ESMTP id m71G9xYH014479; Fri, 1 Aug 2008 08:09:59 -0800 (AKDT) (envelope-from royce@alaska.net) Message-ID: <48933557.8010904@alaska.net> Date: Fri, 01 Aug 2008 08:09:59 -0800 From: Royce Williams User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.16) Gecko/20080708 Thunderbird/2.0.0.16 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: freebsd-stable@FreeBSD.org References: <488638DA.7010005@alaska.net> <20080723053404.GA46617@eos.sc1.parodius.com> <4886D1E9.1090905@alaska.net> In-Reply-To: <4886D1E9.1090905@alaska.net> X-Enigmail-Version: 0.95.6 OpenPGP: url=http://www.tycho.org/royce/royce@alaska.net.asc X-Face: ">19[ShfDD9'g", GrH$'v:=qBVZdg.kXSBR6*ZC$am:D Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-ACS-Spam-Status: no X-ACS-Scanned-By: MD 2.63; SA 3.2.4; spamdefang 1.122 Cc: Subject: Re: 6.3-RELEASE-p3 recurring panics on multiple SM PDSMi+ X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Aug 2008 16:10:00 -0000 Royce Williams wrote, on 7/22/2008 10:38 PM: > Jeremy Chadwick wrote, on 7/22/2008 9:34 PM: >> On Tue, Jul 22, 2008 at 11:45:30AM -0800, Royce Williams wrote: >>> We have 10 SuperMicro PDSMi+ 5015M-MTs that are panic'ing every few >>> days. This started shortly after upgrade from 6.2-RELEASE to >>> 6.3-RELEASE with freebsd-update. >> We use the same hardware (board and chassis), and have no such problems >> running both RELENG_6 and RELENG_7. >> >> I don't think your issue is specific to the board or chassis. Kris's >> explanation makes a lot more sense. :-) > > Jeremy/Kris/Clifton - > > Looks like we have consensus. :-) Thanks, all of you, for your > helpful insight. > > I've bumped vm.kmem_size up to 400M on half of the affected boxes, > leaving the other half as a control group. I'll report back once I > have something to report. After having bumped up to 400M, a few boxes panic'd again yesterday. I caught a core, and it is "kmem_map too small", just as Kris suspected: Jul 31 15:38:05 [redacted] savecore: reboot after panic: kmem_malloc(4096): kmem_map too small: 419430400 total allocated The docs state that 400M should be plenty for systems up to 6G, but Kris said earlier in this thread that it's better to say 'increase until the pain stops'. :-) Accordingly, I have some some follow-up questions; hopefully, this will be useful to others. - What is a reasonable increment? (I'm trying 448M next). - What are the practical and hard maximums? - I suspect that it's worth trying to make kmem 'as big as I need, but no bigger', so that non-kernel memory is also maximized? - In a larger sense, if 400M is probably big enough for 6G systems, and these are 4G systems, should I be suspicious that 400M isn't cutting it? In other words, is there a point at which should I be looking for obvious places where the kernel is eating too much memory and reduce them, rather than feeding it more? For example, I recall now that a network guy in my group did some sysctl tuning relating to networking on these systems, and I see from man tuning(7) that a number of these tweaks (obviously) can cause increased kernel consumption. $ egrep -v '^#|^$' /etc/sysctl.conf | sort kern.corefile=/var/cores/%U/%N-%P.core kern.ipc.maxsockbuf=8388608 kern.ipc.maxsockets=32768 kern.ipc.nmbclusters=65535 kern.ipc.somaxconn=4096 kern.maxfiles=262144 kern.maxfilesperproc=65535 net.inet.ip.portrange.first=8192 net.inet.ip.portrange.hifirst=8192 net.inet.ip.portrange.hilast=65535 net.inet.ip.portrange.last=65535 net.inet.ipf.fr_tcpclosed=60 net.inet.ipf.fr_tcpclosewait=120 net.inet.ipf.fr_tcphalfclosed=300 net.inet.ipf.fr_udptimeout=120 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.msl=10000 net.inet.tcp.mssdflt=1460 net.inet.tcp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65535 vfs.nfs.iodmax=32 vfs.nfs.iodmin=8 My apologies for not including this sooner. I didn't think of it until yesterday, primarily because it had been fine under 6.2. In retrospect, that was bad reasoning. Royce -- Royce D. Williams - http://royce.ws/ Reason is a very light rider, and easily shook off. - Jonathan Swift