Date: Wed, 30 Jun 1999 14:09:12 -0500 (CDT) From: Joe Greco <jgreco@ns.sol.net> To: wilko@yedi.iaf.nl (Wilko Bulte) Cc: ken@plutotech.com, scsi@freebsd.org Subject: Re: FreeBSD panics with Mylex DAC960SX Message-ID: <199906301909.OAA85863@aurora.sol.net> In-Reply-To: <199906301727.TAA00581@yedi.iaf.nl> from Wilko Bulte at "Jun 30, 1999 7:27:26 pm"
next in thread | previous in thread | raw e-mail | index | archive | help
> > You're probably right about the device returning a size of zero. It isn't > > immediately clear to me why the open routine would cause a panic, *unless* > > the Mylex unit returns good status for the read capacity command, but > > returns a capacity of 0. > > Although this definitely a bogus response I don't see the point in panic-ing > the machine. An offensive message on the console, by all means. A panic? > > This remark assumes you are not booting from the raid of course :) Couldn't boot from it 'til it was ready (which it isn't, which leads to this entire problem). Okay, anyways, ddb output. I really have no clue what I'm doing with the kernel debugger so if I did anything stupid and you need other data, let me know what to do. I put the camcontrol statement and then a fsck -p into root's .profile so that it'd be a bit easier to manage this little show. changing root device to dda0 at ahc0 bus 0 target 0 lun 0 da0: <SEAGATE ST34371W 0484> Fixed Direct Access SCSI-2 device da0: 40.0MB/s transfers (20.0MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 4148MB (8496884 512 byte sectors: 255H 63S/T 528C) a0s1a da1 at ahc0 bus 0 target 1 lun 0 da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled da1: A Enter full pathname of shell or RETURN for /bin/sh: erase ^H, kill ^U, intr ^C /sbin/camcontrol cmd -n da -u 1 -v -c 25 0 0 0 0 0 0 0 0 0 -i 8 i4 i4 camcontrol: error sending command (pass1:ahc0:0:1:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 (pass1:ahc0:0:1:0): NOT READY end of camcontrol /dev/rda0s1a: cFILESYSTEM CLEANk; SKIPPING CHECKpS 2% fragmentationlean, 127256 f1ree (296 frags, c15870 blocks, 0. ) Fatal trap 18: integer divide fault while in kernel mode mp_lock = 00000002; cpuid = 0; lapic.id = 01000000 instruction pointer = 0x8:0xf014e637 stack pointer = 0x10:0xfa6399d8 frame pointer = 0x10:0xfa639a00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 10 (fsck) interrupt mask = <- SMP: XXX kernel: type 18 trap, code=0 Stopped at dscheck+0xbb: idivl 0x18(%ecx),%eax db> tracede0: autosense failed: cable problem? /u dscheck(f67b3ae8,f182dd00) at dscheck+0xbb dastrategy(f67b3ae8,0,fa639a01,f181fc00,f181fccc) at dastrategy+0x56 dsinit(f01f746b,20d0c,f01205d4,fa639c90,f181fccc) at dsinit+0x52 dsopen(f01f746b,20d0c,2000,0,f181fccc) at dsopen+0x8e daopen(20d0c,1,2000,fa61b4c0,0) at daopen+0x2a1 spec_open(fa639e2c,fa639e00,f01ae21d,fa639e2c,fa639ea0) at spec_open+0x161 spec_vnoperate(fa639e2c,fa639ea0,f01712ca,fa639e2c,0) at spec_vnoperate+0x15 ufs_vnoperatespec(fa639e2c,0,fa639f94,fa61b4c0,f016879e) at ufs_vnoperatespec+0x15 vn_open(fa639f00,1,140,fa61b4c0,f021527c) at vn_open+0x3e2 open(fa61b4c0,fa639f94,8097140,1,804ac68) at open+0xad syscall(27,27,804ac68,1,efbfdbb4) at syscall+0x187 Xint0x80_syscall() at Xint0x80_syscall+0x4c db> Now, based on some trace printf's I sprinkled in dscheck, it looks to me like I get as far as if (bp->b_bcount % ssp->dss_secsize) goto bad_bcount; around line #191 of kern/subr_diskslice.c. (you can see the "ckpt1c" interspersed with other output if you look carefully). It does not hit the printf() right after that, so I am guessing that ssp->dss_secsize is probably zero. Gah, I feel like I'm programming a Windows box. :-) Okay, I've added a little extra code in there. Now we do } else { printf("ckpt1c\n"); if (! ssp->dss_secsize) { printf("Whoa!\n"); printf("ssp->dss_first_bsd_slice=%d\n", ssp->dss_first_bsd_slice); printf("ssp->dss_nslices=%d\n", ssp->dss_nslices); printf("ssp->dss_oflags=%d\n", ssp->dss_oflags); printf("ssp->dss_secmult=%d\n", ssp->dss_secmult); printf("ssp->dss_secshift=%d\n", ssp->dss_secshift); printf("ssp->dss_secsize=%d\n", ssp->dss_secsize); goto bad; } if (bp->b_bcount % ssp->dss_secsize) goto bad_bcount; Ahh. It doesn't crash now. changing root device tda0 at ahc0 bus 0 target 0 lun 0 da0: <SEAGATE ST34371W 0484> Fixed Direct Access SCSI-2 device da0: 40.0MB/s transfers (20.0MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 4148MB (8496884 512 byte sectors: 255H 63S/T 528C) o da0s1a da1 at ahc0 bus 0 target 1 lun 0 da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled da1: A Enter full pathname of shell or RETURN for /bin/sh: erase ^H, kill ^U, intr ^C /sbin/camcontrol cmd -n da -u 1 -v -c 25 0 0 0 0 0 0 0 0 0 -i 8 i4 i4 camcontrol: error sending command (pass1:ahc0:0:1:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 (pass1:ahc0:0:1:0): NOT READY end of camcontrol /dev/rda0s1a: cFILESYSTEM CLEANk; SKIPPING CHECKpS 2% fragmentationlean, 127256 f1ree (296 frags, c15870 blocks, 0. ) Whoa! ssp->dss_first_bsd_slice=0 ssp->dss_nslices=2 ssp->dss_oflags=0 ssp->dss_secmult=0 ssp->dss_secshift=-1 ssp->dss_secsize=0 da1: error reading primary partition table reading fsbn 0 Can't open /dev/rda1s1e: Input/output error /dev/rda1s1e: CAN'T CHECK FILE SYSTEM. /dev/rda1s1e: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. /dev/rda0s1h: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/rda0s1h: clean, 232316 free (132 frags, 29023 blocks, 0.1% fragmentation) /dev/rda0s1e: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/rda0s1e: clean, 51021 free (8125 frags, 5362 blocks, 4.1% fragmentation) /dev/rda0s1f: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/rda0s1f: clean, 112455 free (239 frags, 14027 blocks, 0.2% fragmentation) /dev/rda0s1g: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/rda0s1g: clean, 1308689 free (321 frags, 163546 blocks, 0.0% fragmentation) THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: /dev/rda1s1e (/news) # de0: autosense failed: cable problem? Now if I wait for just a bit, # /sbin/camcontrol cmd -n da -u 1 -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4" 284524543 512 Okay, well, I don't know what the hell the correct fix is, but this will hopefully light a bulb in some SCSI guru's head. The panic, I would think, has _got_ to be fixed. If anyone has a great suggestion on how I can make this work properly, that's good too. ... Joe ------------------------------------------------------------------------------- Joe Greco - Systems Administrator jgreco@ns.sol.net Solaria Public Access UNIX - Milwaukee, WI 414/342-4847 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906301909.OAA85863>