From owner-freebsd-smp@FreeBSD.ORG Tue Oct 10 01:38:49 2006 Return-Path: X-Original-To: freebsd-smp@freebsd.org Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9DC8416A407 for ; Tue, 10 Oct 2006 01:38:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id E422B43D49 for ; Tue, 10 Oct 2006 01:38:48 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id k9A1cjDM082407; Mon, 9 Oct 2006 21:38:46 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-smp@freebsd.org Date: Mon, 9 Oct 2006 21:20:12 -0400 User-Agent: KMail/1.9.1 References: <200610051544.03861.charles@idealso.com> In-Reply-To: <200610051544.03861.charles@idealso.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200610092120.12570.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [192.168.0.1]); Mon, 09 Oct 2006 21:38:46 -0400 (EDT) X-Virus-Scanned: ClamAV 0.88.3/2016/Mon Oct 9 12:58:54 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Charles Ulrich Subject: Re: FreeBSD 6.1 Instability X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Oct 2006 01:38:49 -0000 On Thursday 05 October 2006 15:44, Charles Ulrich wrote: > Greetings, >=20 > We have been running FreeBSD on our mail servers for about as long as I c= an=20 > remember. Recently, we decided to go SMP to handle increased mail load. A= fter=20 > assembling the hardware, installing the OS and software, and restoring al= l of=20 > our data, we noticed in testing that our first machine began hanging=20 > semi-regularly when it began processing lots of mail. Disabling SMP=20 > eliminated the hangs completely. We tried it all again on completely=20 > different hardware with exactly the same result. Our conclusion: somethin= gs's=20 > buggy in SMP. >=20 > Here are the symptoms. The machine hangs, and becomes completely=20 > unresponsive. =A0It looks like a deadlock. =A0It will sometimes respond t= o the=20 > power button and shut down (without being able to first sync and unmount= =20 > filesystems), and sometimes the power button event gets caught in the=20 > deadlock. =A0Sinceit's not actually a crash, there is no core dump or oth= er=20 > debugging information. In the most recent situation, it hung at different= =20 > points every time I tried to compile ezm3, after successfully compiling o= ther=20 > packages. >=20 > We're system administrators, not kernel hackers, so this is a plea for he= lp. I=20 > wouldn't know where to start, but I'm hoping someone can point me in the= =20 > right direction. We're also willing to give a (trustworthy) FreeBSD devel= oper=20 > root access to the test machine since it's just sitting idle right now. I= f=20 > you need to crash it, that's fine. We'll have people during normal busine= ss=20 > hours who know how to push a reset button. >=20 > Thanks for your time. Compile a debug kernel and include 'DDB' in the kernel. When it hangs, bre= ak into the debugger and type 'panic' to have it panic the machine and write out a = crash dump. Once you have the crash dump, download http://www.FreeBSD.org/~jhb/g= db/gdb6 and do this: $ kgdb /usr/obj/usr/src/sys/FOO/kernel.debug /var/crash/vmcore.X (where FOO is your kernel config file and X is the right vmcore file) Then do this: (gdb) source /path/to/gdb6 (gdb) ps =2E.. And reply with the output from the 'ps' command. =2D-=20 John Baldwin