From owner-freebsd-stable@FreeBSD.ORG Wed Dec 29 15:53:53 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9138516A4CE; Wed, 29 Dec 2004 15:53:53 +0000 (GMT) Received: from schubert.byrnehq.com (dsl-33-12.dsl.netsource.ie [213.79.33.12]) by mx1.FreeBSD.org (Postfix) with ESMTP id B038543D1F; Wed, 29 Dec 2004 15:53:52 +0000 (GMT) (envelope-from freebsd-current@byrnehq.com) Received: from localhost (mauer.directski.com. [212.147.140.194]) by schubert.byrnehq.com (8.13.1/8.13.1) with ESMTP id iBTFrsGO016408; Wed, 29 Dec 2004 15:53:55 GMT (envelope-from freebsd-current@byrnehq.com) Date: Wed, 29 Dec 2004 15:53:44 +0000 From: Tony Byrne Organization: ByrneHQ X-Priority: 3 (Normal) Message-ID: <1694776352.20041229155344@byrnehq.com> To: Scott Long In-Reply-To: <41D2C32C.7090803@freebsd.org> References: <187186864.20041229111855@byrnehq.com> <41D2C32C.7090803@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ByrneHQ-SA-Hits: 1.455 X-Scanned-By: MIMEDefang 2.49 on 192.168.10.254 cc: freebsd-stable@freebsd.org Subject: Re[2]: MegaRAID 'Bad Slot' Kernel message and crash. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Tony Byrne List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2004 15:53:53 -0000 Hello Scott, Wednesday, December 29, 2004, 2:46:04 PM, you wrote: SL> I've been seeing this problem recently too. I believe that there is SL> some sort of timing bug/race in the driver, but I haven't been able to SL> figure it out yet. It also seems to be related to panic from the block SL> layer that point to commands being completed twice. To be clear with SL> your observations, are you saying that 4.10-RELEASE is behaving the same SL> or differently than 4.10-STABLE? We tried 5.3 just after RELEASE, if I recall correctly, but had updated our sources and rebuilt world before running our tests. Under 5.3 we wedged the controller a number of times in the space of 3 days each with a "bad slot" kernel message. Once we decided that 5.3 was not a going to cut it for us, we downgraded to 4.10-STABLE (circa 16th Nov) and re-ran our tests, this time we couldn't wedge the system. The server has been in production on 4.10-STABLE for about a month and yesterday was the first "bad slot" wedge we've seen. I'd hate to think that we can now look forward to a monthly trip to the hosting facility to hard reset the box :-( Regards, Tony. -- Tony Byrne