From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 14 00:01:22 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C49E016A402 for ; Fri, 14 Apr 2006 00:01:22 +0000 (UTC) (envelope-from ps@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8995443D46 for ; Fri, 14 Apr 2006 00:01:22 +0000 (GMT) (envelope-from ps@freebsd.org) Received: from [192.168.1.88] (64-142-76-135.dsl.static.sonic.net [64.142.76.135]) by elvis.mu.org (Postfix) with ESMTP id 645A41A3C19; Thu, 13 Apr 2006 17:01:22 -0700 (PDT) Message-ID: <443EE652.1020806@freebsd.org> Date: Thu, 13 Apr 2006 17:01:22 -0700 From: Paul Saab User-Agent: Thunderbird 1.5 (Macintosh/20051201) MIME-Version: 1.0 To: matthew@digitalstratum.com References: <443E95C1.4030404@digitalstratum.com> In-Reply-To: <443E95C1.4030404@digitalstratum.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: FreeBSD Crash without Errors, Warnings, or Panics X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Apr 2006 00:01:22 -0000 There are serious race conditions with amr in 6.0 that can cause serious hangs. I suggest you take the amr driver from RELENG_6 and try that. Matthew Hagerty wrote: > Greetings, > > I'm running 6.0-RELEASE-p5 on a Toshiba built server: dual Xeon Intel > motherboard with a LSILogic MegaRAID (amr0) controller. This machine > has been running for about 2 years now, and was very stable until I > updated from 5.3 to 5.4, and now 6.0. The crashing seems to be > totally random and I have had it crash in as little as 12 hours and as > long as 143 days. > > When the box goes down it does so in a strange way. First, it still > responds to network probes like ping (usually), however, all console > access is ignored. Also, some network ports still respond, like a > telnet to port 22 to test SSH will yield an SSH banner, but trying to > connect with SSH just hangs. Sometimes this is also true of the SMTP > server, but not always. This also makes it impossible for me to use > CARP to swap to the recently purchased spare machine, since the > network interface is generally still responding so CARP does not > detect a problem. > > My biggest problem with this is that there are *never* any console > messages or log entries in any logs, no warnings about disk failure, > buffer exhaustion, system failures, etc.. The machine simply seems to > stop responding and the only way to correct the problem is a hard reboot. > > A strange thing did happen yesterday though, I believe I caught the > box on the verge of failure. I was SSH'd in and did a ps to check > things out. There were about 100 of these entries: > > 55050 ?? D 0:00.00 postmaster: ipa ipa ::1(63061) startup > (postgres) > > The box runs a web-based app and connects to a local Postgres DB which > seemed to be unable to start new connections being requested by the > PHP scripts. At any rate, I stopped Apache and then tried to stop > Postgres which resulted in (or just happened to coincide with) the box > locking up and no longer responding to my SSH commands or attempts to > reconnect with SSH. I hardly think this is a Postgres problem, but > even if it was, a userland app should *not* be able to bring down a > box... > > Can anyone shed some light on this, give me some options to try? What > happened to kernel panics and such when there were serious errors > going on? The only glimmer of information I have is that *one* time > there was an error on the console about there not being any RAID > controller available. I did purchase a spare controller and I'm about > to swap it out and see if it helps, but for some reason I doubt it. > If a controller like that was failing, I would certainly hope to see > some serious error messages or panics going on. > > I have been running FreeBSD since version 1.01 and have never had a > box so unstable in the last 12 or so years, especially one that is > supposed to be "server" quality instead of the make-shift ones I put > together with desktop hardware. And last, I'm getting sick of my > Linux admin friends telling me "told you so! should have run > Linux...", please give me something to stick in their pie holes! > > Thanks, > Matthew > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" >