Date: Sun, 22 Apr 2001 09:34:49 -0500 From: ryan beasley <ryanb@goddamnbastard.org> To: Mike Smith <msmith@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: AMI MegaRAID (428 series; Enterprise 1200?) + 4-STABLE (2001.12.14) -> hard lock w/o ability to dump Message-ID: <20010422093448.A12688@bjorn.goddamnbastard.org> In-Reply-To: <200104202056.f3KKuKf02398@mass.dis.org>; from msmith@freebsd.org on Fri, Apr 20, 2001 at 01:56:20PM -0700 References: <20010420120801.B9227@bjorn.goddamnbastard.org> <200104202056.f3KKuKf02398@mass.dis.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 20, 2001 at 01:56:20PM -0700, Mike Smith wrote:
>
> It does sound like you have a filesystem lock cascade going on, which
> would be explained by a lost I/O. There are other possibilities though.
>
At this point, I'm open to the idea of divine intervention. (Well, not
really, but .... ;)
>
> No, you're on the wrong track here; the driver doesn't have a dump
> routine, so you can't dump to it under any circumstances.
>
Thanks much for the correction. That'll definitely school me for not
looking at a shred of sources before posting. I was never aware that
such drivers (without dump routines) existed.
> You need to update your firmware first; UF82-166 is what you want. It's
> possible that this isn't the problem, but I can't offer you any help until
> you do upgrade, as I don't have a defect listing for AMI's firmware.
I upgraded from uc77 to uf82 Friday evening ~6pm CST. I talked to AMI
support shortly afterwards and found out that uc77 wasn't one of their
specific firmware revs; that release must've been a Dell release. (AMI
support guy said that AMI firmware revs, at least for this card, only
start as "us" or "uf", not "uc".)
... anywho, it's now at
amr0: <AMI MegaRAID> port 0xe480-0xe4ff irq 18 at device 10.0 on pci2
amr0: <Series 428> Firmware UF82, BIOS 1.66, 128MB RAM
I woke up to a page ~5am CST today to find it in the same locked state.
I called a panic and the machine was back in a few minutes. As it
stands now, I'm looking at the possibilities of the RAID controller
itself or the RAM installed on it; I'm hoping the driver's OK. (There
would probably be a lot more posts about this if said driver wasn't.)
It's just odd, coming from the I-don't-really-know-that-much-about-
controllers- and-their-related-drivers community, that no errors are sent
to the console of a SCSI timeout considering that the kernel itself is
still flying high. (I must re-emphasize my lack of understanding for the
kernel and I/O; too many fires to fight -> not much time for research.)
Current plan: I'm going to take a look inside the machine Monday and
look at the possibility of pulling out the CD-ROM drive and replacing it
with a decent 4G disk attached to an on-board Adaptec controller. I'm
hoping that this will give me the chance to generate some useful crash
dumps. <grin>
/me thanks both Mike and this list for collective time stolen for
reading/thinking/responding purposes.
- ryan
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010422093448.A12688>
