Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jan 2005 17:35:16 -0800 (PST)
From:      Doug White <dwhite@gumbysoft.com>
To:        Tony Byrne <tonyb@byrnehq.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: MegaRAID 'Bad Slot' Kernel message and crash.
Message-ID:  <20050113172415.E13904@carver.gumbysoft.com>
In-Reply-To: <1433078378.20050111134014@byrnehq.com>
References:  <1433078378.20050111134014@byrnehq.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 11 Jan 2005, Tony Byrne wrote:

> Basically, after some amount of uptime the kernel will emit a "amr0:
> Bad slot x completed" message and pretty soon after this the box goes into a
> partially unresponsive state forcing us to reboot it.  So far the only
> thing triggering the problem is the nightly jobs, where the amount of
> IO is higher than during the day.

scottl has been able to reproduce this on a U320 controller he has. I only
have U160 equipment and can't get the txn rate up high enough to reproduce
the issue.  The driver needs KTR instrumentation so we can see where the
bad slot is popping up from.  The "bad slot" message appears when the
controller returns completion for a command that had already completed.

The amr driver has several other issues and is in dire need of an
overhaul. Unfortunately LSI has not been forthcoming with documentation,
so Scott and I are pretty much scratching our heads without knowing where
to go.

This is in 5.X and HEAD, at least.  I can't comment on 4.x.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050113172415.E13904>