From owner-freebsd-stable@FreeBSD.ORG Wed Aug 18 05:31:17 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 68F5916A4CE for ; Wed, 18 Aug 2004 05:31:17 +0000 (GMT) Received: from anduin.net (anduin.net [212.12.46.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2858643D5D for ; Wed, 18 Aug 2004 05:31:17 +0000 (GMT) (envelope-from ltning@anduin.net) Received: from mailnull by anduin.net with spam-scanned (Exim 4.34; FreeBSD) id 1BxJ2U-0003Zu-F1 for stable@freebsd.org; Wed, 18 Aug 2004 07:31:15 +0200 Received: from [217.8.136.185] (helo=[192.168.1.10]) by anduin.net with esmtp (Exim 4.34; FreeBSD) id 1BxJ2T-0003Zp-Os; Wed, 18 Aug 2004 07:31:10 +0200 In-Reply-To: <411E7B28.7050505@elischer.org> References: <26ECE35D-EDDB-11D8-945B-000D9335BCEC@anduin.net> <411E7B28.7050505@elischer.org> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-Id: Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Eirik_=D8verby?= Date: Wed, 18 Aug 2004 07:31:01 +0200 To: Julian Elischer X-Mailer: Apple Mail (2.619) X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on anduin.net X-Spam-Level: X-Spam-Status: No, hits=0.0 required=7.5 tests=none autolearn=no version=2.63 cc: stable@freebsd.org Subject: Re: How to find the cause of a hang X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2004 05:31:17 -0000 On 14. Aug 2004, at 22:50, Julian Elischer wrote: > Eirik =D8verby wrote: >> Hi all, >> I'm currently experiencing frequent (about once per week) hangs of a=20= >> server that is about 1500 kilometers away from me. I have a serial=20 >> cable on the box, and using minicom on the neighbor box I am now in=20= >> the kernel debugger - but I'm at a complete loss as to what to do to=20= >> figure out what is, in fact, wrong. >> Calling panic or boot doesn't work - it just stops at "syncing=20 >> disks..." and never actually reboots. I suspect something fishy going=20= >> on with disk I/O, but I can't be certain of that. >> The box responds to ping - until I call panic or boot - but no other=20= >> services are working. > > try capture a stack trace "tr" Looks like the box is "idle" > if you have KTR enabled do "show ktr" This is 4.x... As my follow-up msg on current@ indicated (yes i posted=20= to the wrong list initially ;) > do "ps" An insane number of cron processes.. Is it trying to run scheduled jobs=20= and fails because of the hang? > do show pcpu > show witness > show locks No workie on 4.x... > if you have a dump device define.. > call doadump No dump device here (if disk is the problem, it would be of no use I=20 guess), and i get undefined symbol anyway. > then to reboot.. > "call cpu_reset" Now that one is handy. ;) > The dump will appear after the next boot in /var/crash > if it's not big enough for a complete ram dump, symlink it to=20 > somewhere where there is enough room. (See above) > when you have all that.. let us know :-) I have what I have. See http://anduin.net/~ltning/debug.cap (it is too long to include in a mail... Got any clues for me? Thanks, /Eirik > > >> What can I do? I'm now at the db> prompt ... Help :) >> /Eirik >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to=20 >> "freebsd-current-unsubscribe@freebsd.org" > > > >