From owner-freebsd-stable@FreeBSD.ORG  Wed Dec 29 15:53:53 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9138516A4CE; Wed, 29 Dec 2004 15:53:53 +0000 (GMT)
Received: from schubert.byrnehq.com (dsl-33-12.dsl.netsource.ie
	[213.79.33.12])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id B038543D1F; Wed, 29 Dec 2004 15:53:52 +0000 (GMT)
	(envelope-from freebsd-current@byrnehq.com)
Received: from localhost (mauer.directski.com. [212.147.140.194])
	by schubert.byrnehq.com (8.13.1/8.13.1) with ESMTP id iBTFrsGO016408;
	Wed, 29 Dec 2004 15:53:55 GMT
	(envelope-from freebsd-current@byrnehq.com)
Date: Wed, 29 Dec 2004 15:53:44 +0000
From: Tony Byrne <freebsd-current@byrnehq.com>
Organization: ByrneHQ
X-Priority: 3 (Normal)
Message-ID: <1694776352.20041229155344@byrnehq.com>
To: Scott Long <scottl@freebsd.org>
In-Reply-To: <41D2C32C.7090803@freebsd.org>
References: <187186864.20041229111855@byrnehq.com>
 <41D2C32C.7090803@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ByrneHQ-SA-Hits: 1.455
X-Scanned-By: MIMEDefang 2.49 on 192.168.10.254
cc: freebsd-stable@freebsd.org
Subject: Re[2]: MegaRAID 'Bad Slot' Kernel message and crash.
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: Tony Byrne <freebsd@byrnehq.com>
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Dec 2004 15:53:53 -0000

Hello Scott,

Wednesday, December 29, 2004, 2:46:04 PM, you wrote:

SL> I've been seeing this problem recently too.  I believe that there is
SL> some sort of timing bug/race in the driver, but I haven't been able to
SL> figure it out yet.  It also seems to be related to panic from the block
SL> layer that point to commands being completed twice.  To be clear with
SL> your observations, are you saying that 4.10-RELEASE is behaving the same
SL> or differently than 4.10-STABLE?

We tried 5.3 just after RELEASE, if I recall correctly, but had updated our
sources and rebuilt world before running our tests.  Under 5.3 we wedged
the controller a number of times in the space of 3 days each with a
"bad slot" kernel message.

Once we decided that 5.3 was not a going to cut it for us, we
downgraded to 4.10-STABLE (circa 16th Nov) and re-ran our tests,
this time we couldn't wedge the system.  The server has been in
production on 4.10-STABLE for about a month and yesterday was the
first "bad slot" wedge we've seen.  I'd hate to think that we can now
look forward to a monthly trip to the hosting facility to hard reset the
box :-(

Regards,

Tony.

-- 
Tony Byrne