From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 14 03:50:58 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C19B16A408 for ; Fri, 14 Apr 2006 03:50:58 +0000 (UTC) (envelope-from ps@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2961443D45 for ; Fri, 14 Apr 2006 03:50:57 +0000 (GMT) (envelope-from ps@freebsd.org) Received: from [192.168.1.88] (64-142-76-135.dsl.static.sonic.net [64.142.76.135]) by elvis.mu.org (Postfix) with ESMTP id C38EA1A3C2C; Thu, 13 Apr 2006 20:50:57 -0700 (PDT) Message-ID: <443F1C21.4040109@freebsd.org> Date: Thu, 13 Apr 2006 20:50:57 -0700 From: Paul Saab User-Agent: Thunderbird 1.5 (Macintosh/20051201) MIME-Version: 1.0 To: matthew@digitalstratum.com References: <443E95C1.4030404@digitalstratum.com> <443EE652.1020806@freebsd.org> <443F184C.4030306@digitalstratum.com> In-Reply-To: <443F184C.4030306@digitalstratum.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: FreeBSD Crash without Errors, Warnings, or Panics X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Apr 2006 03:50:58 -0000 The amr driver was not MPSAFE in 5.4 (i think) so you would not have run into these problems. You should be able to just take the driver from RELENG_6 and use it on a released branch. If it doesn't compile, let me know and I'll generate you a tarball or diff that will work. We had major issues with amr at work until Scott Long and I (mostly Scott) helped iron out the stability issues with amr. You should either run RELENG_6 or take the driver from RELENG_6 and use that. You'll get the added benefit of being able to use the Linux management tools (megarc) to see the status of your raid. Matthew Hagerty wrote: > Did these serious race conditions exist in 5.4 also? This is not good > news and it would be nice if there was some place to find out about > what hardware drivers are considered stable for production servers. I > always assumed that this was the function of the supported hardware > page. Is this not the case? > > Can I pull just a certain driver from stable and use it with a release > branch (no even sure how I would do that)? Or will there be > dependency problems? I suppose I could run on stable until the driver > is fixed in a release branch, but I need this box up and online, and > I've always read that the stable branch is not the place for > production servers. > > Is there any place I read about the status and work being done on the > arm driver? > > Thanks, > Matthew > > > Paul Saab wrote: >> There are serious race conditions with amr in 6.0 that can cause >> serious hangs. I suggest you take the amr driver from RELENG_6 and >> try that. >> >> Matthew Hagerty wrote: >>> Greetings, >>> >>> I'm running 6.0-RELEASE-p5 on a Toshiba built server: dual Xeon >>> Intel motherboard with a LSILogic MegaRAID (amr0) controller. This >>> machine has been running for about 2 years now, and was very stable >>> until I updated from 5.3 to 5.4, and now 6.0. The crashing seems to >>> be totally random and I have had it crash in as little as 12 hours >>> and as long as 143 days. >>> >>> When the box goes down it does so in a strange way. First, it still >>> responds to network probes like ping (usually), however, all console >>> access is ignored. Also, some network ports still respond, like a >>> telnet to port 22 to test SSH will yield an SSH banner, but trying >>> to connect with SSH just hangs. Sometimes this is also true of the >>> SMTP server, but not always. This also makes it impossible for me >>> to use CARP to swap to the recently purchased spare machine, since >>> the network interface is generally still responding so CARP does not >>> detect a problem. >>> >>> My biggest problem with this is that there are *never* any console >>> messages or log entries in any logs, no warnings about disk failure, >>> buffer exhaustion, system failures, etc.. The machine simply seems >>> to stop responding and the only way to correct the problem is a hard >>> reboot. >>> >>> A strange thing did happen yesterday though, I believe I caught the >>> box on the verge of failure. I was SSH'd in and did a ps to check >>> things out. There were about 100 of these entries: >>> >>> 55050 ?? D 0:00.00 postmaster: ipa ipa ::1(63061) startup >>> (postgres) >>> >>> The box runs a web-based app and connects to a local Postgres DB >>> which seemed to be unable to start new connections being requested >>> by the PHP scripts. At any rate, I stopped Apache and then tried to >>> stop Postgres which resulted in (or just happened to coincide with) >>> the box locking up and no longer responding to my SSH commands or >>> attempts to reconnect with SSH. I hardly think this is a Postgres >>> problem, but even if it was, a userland app should *not* be able to >>> bring down a box... >>> >>> Can anyone shed some light on this, give me some options to try? >>> What happened to kernel panics and such when there were serious >>> errors going on? The only glimmer of information I have is that >>> *one* time there was an error on the console about there not being >>> any RAID controller available. I did purchase a spare controller >>> and I'm about to swap it out and see if it helps, but for some >>> reason I doubt it. If a controller like that was failing, I would >>> certainly hope to see some serious error messages or panics going on. >>> >>> I have been running FreeBSD since version 1.01 and have never had a >>> box so unstable in the last 12 or so years, especially one that is >>> supposed to be "server" quality instead of the make-shift ones I put >>> together with desktop hardware. And last, I'm getting sick of my >>> Linux admin friends telling me "told you so! should have run >>> Linux...", please give me something to stick in their pie holes! >>> >>> Thanks, >>> Matthew >>> >>> _______________________________________________ >>> freebsd-hackers@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to >>> "freebsd-hackers-unsubscribe@freebsd.org" >>> >