Date: Wed, 30 Jun 1999 19:27:26 +0200 (CEST) From: Wilko Bulte <wilko@yedi.iaf.nl> To: ken@plutotech.com (Kenneth D. Merry) Cc: jgreco@ns.sol.net, scsi@FreeBSD.ORG Subject: Re: FreeBSD panics with Mylex DAC960SX Message-ID: <199906301727.TAA00581@yedi.iaf.nl> In-Reply-To: <199906292300.RAA29666@panzer.kdm.org> from "Kenneth D. Merry" at "Jun 29, 1999 5: 0:50 pm"
next in thread | previous in thread | raw e-mail | index | archive | help
As Kenneth D. Merry wrote ... > Joe Greco wrote... > > Hello, > > > > First, cool stuff in 3.X! Hats off to you guys. > > > > I have one minor issue that I am hoping is a simple fix. > > > > I'm using Mylex DAC960SX SCSI-to-SCSI RAID controllers on an ASUS P2B-DS > > motherboard, off of the onboard SCSI controller. This is a neat gadget > > that makes a bunch of drives look like a single SCSI target. > > > > Now... here's the problem. The unit takes a while to start up (~60s) > > from power on, and until it reports "STARTUP COMPLETE", FreeBSD blows > > chunks when trying to access it. > > > > In particular, when the Mylex freaks out and thinks half its disks are > > dead (duh forgot to power them on), the startup sequence never completes, > > and FreeBSD will sit there doing boot-panic-boot-panic-etc. This is not > > very gracious, and is a bit irritating since the serial console I need to > > talk to the Mylex is on the box... > > > > So, my _real_ issue is the following panic: > > [ ... ] > > > da1 at ahc0 bus 0 target 1 lun 0 > > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device > > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled > > da1: A > > de0: autosense failed: cable problem? > > swapon: adding /dev/da0s1b as swap device > > Automatic reboot in progress... > > /dev/rda0s1a: FILESYSTEM CLEAN^M; SKIPPING CHECK > > S > > ^M/dev/rda0s1a: > > clean, 138968 frFee (296 frags, 1a7334 blocks, 0.2t% fragmentation)a > > l trap 18: integer divide fault while in kernel mode > > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > > instruction pointer = 0x8:0xf014a681 > > stack pointer = 0x10:0xfa66b9d8 > > frame pointer = 0x10:0xfa66ba00 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, def32 1, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 18 (fsck) > > interrupt mask = <- SMP: XXX > > trap number = 18 > > panic: integer divide fault > > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > > boot() called on cpu#1 > > > > syncing disks... done > > (da1:ahc0:0:1:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0 > > (da1:ahc0:0:1:0): NOT READY > > Automatic reboot in 15 seconds - press a key on the console to abort > > Rebooting... > > cpu_reset called on cpu#1 > > cpu_reset: Stopping other CPUs > > cpu_reset: Restarting BSP > > cpu_reset_proxy: Grabbed mp lock for BSP > > cpu_reset_proxy: Stopped CPU 1 > > > > I apologize for not reproducing this on a 3.2R box but I assure you that > > it also panics in fsck on 3.2R in what appears to be an identical manner. > > The panic does seem to be caused by fsck - I can enter single user mode > > just fine. > > > > My guess is that the integer divide fault results from the device reporting > > a size of zero (strictly a guess though!). Normally, size is reported as > > > > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device > > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled > > da1: 138928MB (284524544 512 byte sectors: 255H 63S/T 17710C) > > > > but during all of these crash-boots, the third line is > > > > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device > > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled > > da1: A > > That should probably read "Attempt to query device size failed ...." > > You may be losing characters over the serial console or something. > > > If I can provide further information to assist in tracking down this bug, > > please let me know. > > My first guess is that it's happening during the open() routine, for some > reason. That's why fsck seems to cause the problem. > > You're probably right about the device returning a size of zero. It isn't > immediately clear to me why the open routine would cause a panic, *unless* > the Mylex unit returns good status for the read capacity command, but > returns a capacity of 0. Although this definitely a bogus response I don't see the point in panic-ing the machine. An offensive message on the console, by all means. A panic? This remark assumes you are not booting from the raid of course :) -- | / o / / _ Arnhem, The Netherlands - Powered by FreeBSD - |/|/ / / /( (_) Bulte WWW : http://www.tcja.nl http://www.freebsd.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906301727.TAA00581>