Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Apr 2001 09:34:49 -0500
From:      ryan beasley <ryanb@goddamnbastard.org>
To:        Mike Smith <msmith@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: AMI MegaRAID (428 series; Enterprise 1200?) + 4-STABLE (2001.12.14) -> hard lock w/o ability to dump
Message-ID:  <20010422093448.A12688@bjorn.goddamnbastard.org>
In-Reply-To: <200104202056.f3KKuKf02398@mass.dis.org>; from msmith@freebsd.org on Fri, Apr 20, 2001 at 01:56:20PM -0700
References:  <20010420120801.B9227@bjorn.goddamnbastard.org> <200104202056.f3KKuKf02398@mass.dis.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 20, 2001 at 01:56:20PM -0700, Mike Smith wrote:
> 
> It does sound like you have a filesystem lock cascade going on, which 
> would be explained by a lost I/O.  There are other possibilities though.
> 

At this point, I'm open to the idea of divine intervention.  (Well, not
really, but .... ;)

> 
> No, you're on the wrong track here; the driver doesn't have a dump 
> routine, so you can't dump to it under any circumstances.
> 

Thanks much for the correction.  That'll definitely school me for not
looking at a shred of sources before posting.  I was never aware that
such drivers (without dump routines) existed.

> You need to update your firmware first; UF82-166 is what you want.  It's
> possible that this isn't the problem, but I can't offer you any help until
> you do upgrade, as I don't have a defect listing for AMI's firmware.

I upgraded from uc77 to uf82 Friday evening ~6pm CST.  I talked to AMI
support shortly afterwards and found out that uc77 wasn't one of their
specific firmware revs; that release must've been a Dell release.  (AMI
support guy said that AMI firmware revs, at least for this card, only
start as "us" or "uf", not "uc".)

... anywho, it's now at
	amr0: <AMI MegaRAID> port 0xe480-0xe4ff irq 18 at device 10.0 on pci2
	amr0: <Series 428> Firmware UF82, BIOS 1.66, 128MB RAM

I woke up to a page ~5am CST today to find it in the same locked state.
I called a panic and the machine was back in a few minutes.  As it
stands now, I'm looking at the possibilities of the RAID controller
itself or the RAM installed on it; I'm hoping the driver's OK.  (There
would probably be a lot more posts about this if said driver wasn't.)

It's just odd, coming from the I-don't-really-know-that-much-about-
controllers- and-their-related-drivers community, that no errors are sent
to the console of a SCSI timeout considering that the kernel itself is
still flying high.  (I must re-emphasize my lack of understanding for the
kernel and I/O; too many fires to fight -> not much time for research.)

Current plan: I'm going to take a look inside the machine Monday and
look at the possibility of pulling out the CD-ROM drive and replacing it
with a decent 4G disk attached to an on-board Adaptec controller.  I'm
hoping that this will give me the chance to generate some useful crash
dumps.  <grin>

/me thanks both Mike and this list for collective time stolen for
    reading/thinking/responding purposes.

  - ryan

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010422093448.A12688>