From owner-freebsd-scsi  Fri Mar 16 15:52:39 2001
Delivered-To: freebsd-scsi@freebsd.org
Received: from mail.wolves.k12.mo.us (mail.wolves.k12.mo.us [207.160.214.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id BB3C237B718; Fri, 16 Mar 2001 15:52:34 -0800 (PST)
	(envelope-from cdillon@wolves.k12.mo.us)
Received: from mail.wolves.k12.mo.us (cdillon@mail.wolves.k12.mo.us [207.160.214.1])
	by mail.wolves.k12.mo.us (8.9.3/8.9.3) with ESMTP id RAA28052;
	Fri, 16 Mar 2001 17:52:32 -0600 (CST)
	(envelope-from cdillon@wolves.k12.mo.us)
Date: Fri, 16 Mar 2001 17:52:32 -0600 (CST)
From: Chris Dillon <cdillon@wolves.k12.mo.us>
To: James FitzGibbon <jfitz@FreeBSD.ORG>
Cc: <scsi@FreeBSD.ORG>, <msmith@FreeBSD.ORG>
Subject: Re: Mylex eXtremeRAID 2000 timeout/hang
In-Reply-To: <20010316173716.E11769@ehlo.com>
Message-ID: <Pine.BSF.4.32.0103161713530.26609-100000@mail.wolves.k12.mo.us>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, 16 Mar 2001, James FitzGibbon wrote:

> We are trying to install a Mylex eXtreme 2000 card with a Dell
> Powervault 12 drive SCA housing.  The drives in the array are
> numbered 0-5 and 8-13. The backplane of the array is id 15.
>
> During the kernel probe, we see the message
>
> mly0: drive at 03:15 not responding
>
> five times after the "waiting 15 seconds for SCSI devices to spin
> up" message, and then nothing else.  The system doesn't hang, but
> it never goes anywhere from there. This is with F/W 6.00-00 and
> BIOS 6.00-01.

How long did you wait?  I have a similar problem with an AcceleRAID
170 in -STABLE where I have to wait several minutes before it finally
gets around to booting.  I get the following, with about 30 to 40
seconds in between each "error":

Waiting 7 seconds for SCSI devices to settle
mly0: physical device 0:6  sense data received
mly0:   sense key 5  asc 00  ascq 00
mly0:   info 00000000  csi 00000000
mly0: physical device 0:6  sense data received
mly0:   sense key 5  asc 00  ascq 00
mly0:   info 00000000  csi 00000000
mly0: physical device 0:6  sense data received
mly0:   sense key 5  asc 00  ascq 00
mly0:   info 00000000  csi 00000000
mly0: physical device 0:6  sense data received
mly0:   sense key 5  asc 00  ascq 00
mly0:   info 00000000  csi 00000000
mly0: physical device 0:6  sense data received
mly0:   sense key 5  asc 00  ascq 00
mly0:   info 00000000  csi 00000000
da0 at mly0 bus 1 target 0 lun 0
da0: <RAID 1 online > Fixed Direct Access SCSI-3 device
da0: 17480MB (35799040 512 byte sectors: 255H 63S/T 2228C)
Mounting root from ufs:/dev/da0s1a

Device 0:6:0 is the enclosure management device (the "backplane", I
guess) in the SuperMicro SuperServer 6040, which I think is available
separately as their CSE-031 drive enclosure.  IIRC, the actual
enclosure management device is the QLogic GEM.

I also had a problem after a recent upgrade to 4.3-BETA where if I
left my extra drive in the chassis, I would get the following error
just before the 0:6:0 errors:

(probe32:mly0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0
(probe32:mly0:0:2:0): error code 0

Device 0:2:0 was a spare drive (not configured as a hot spare, IIRC,
just sitting there completely unconfigured, waiting for me to do
something with it, or sacrifice itself as a warm spare if another
drive died).  After waiting for the previously mentioned device 0:6:0
errors to go by, the system would panic immediately afterwards:

Fatal trap 18: integer divide fault while in kernel mode
mp_lock = 00000002; cpuid = 0; lapic.id = 00000000
instruction pointer     = 0x8:0xc01477f1
stack pointer           = 0x10:0xff806e04
frame pointer           = 0x10:0xff806e1c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          =  <- SMP: XXX
trap number             = 18
panic: integer divide fault
mp_lock = 00000002; cpuid = 0; lapic.id = 00000000
boot() called on cpu#0

syncing disks...
done
Uptime: 4m23s
mly0: flushing cache...done

On a hunch I removed the unconfigured disk from the drive enclosure
and the system booted just fine... Mike? :-)

P.S.: I'm getting stuff in my dmesg buffer from _previous_ boots...
I've never seen a system do that.  That's how I could cut/paste that
panic. :-) Is that a bug, or a feature?  Its a nice feature (which my
other 4.2-STABLE boxes don't seem to have).  Some of the information
seems to get corrupted (mixed-up might be a better explanation) around
the time of a panic, for example:

[...snip...]
mly0: physical device 0:6  sense data received
mly0:   secuous mode disabled


Fatal trap 18: integer divide fault while in kernel mode
[...snip...]

All new dmesg info is just fine, of course, and all "normal" reboot
sequences (without a powerdown) seem to preserve the old dmesg info
perfectly.  Too bad more of my boxes don't exhibit this feature. :-)


-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net
   FreeBSD: The fastest and most stable server OS on the planet.
   For IA32 and Alpha architectures. IA64, PPC, and ARM under development.
   http://www.freebsd.org



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message