Date: Tue, 18 Nov 2003 10:00:23 -0700 From: "Justin T. Gibbs" <gibbs@scsiguy.com> To: Kristian Vlahovicek <kristian@icgeb.org>, aic7xxx@freebsd.org Cc: kajan@icgeb.org Subject: Re: aic7xxx problems Message-ID: <2211640000.1069174822@aslan.btc.adaptec.com> In-Reply-To: <1348.140.105.9.78.1069171077.squirrel@www.icgeb.org> References: <1348.140.105.9.78.1069171077.squirrel@www.icgeb.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> the disks are all on the same channel, first we had them as 1 standalone > (ID0) and 4 in RAID0 and one day the machine just froze, reporting > errors in the aic7xxx driver (unfortunately no log of that remained). > The disks were surface tested with the adaptec on-card SCSI utility and > came out without errors. The bad block test in the BIOS is unfortunately, not always conclusive. > We did the OS upgrade, and installed the 2.4.20-8smp kernel (RH9 > vanilla), the machine boots with aic7xxx dumping the whole lot of > messages (see below). Was this supposed to be in your attachment? The attachment is empty. > An attempt to upgrade kernel to newer version (still the RH9 updated > 2.4.20-20.9smp) results in boot process stopping just before the SCSI > loading. We tried to update teh aic7xxx driver to v6.3.0 from Justin's > website and that one again resulted in complete unability to boot (same > stop point as above). Did you do this via RPM or by building the driver from source? 6.3.3 is the latest RPM version on my website. > Note that both kernels 2.4.20-20.9 and the > 2.4.20-8 boot in single processor version with both the original aic7xxx > drivers and the v6.3.0 DO boot. I will have to see the messages, but you may be experiencing interrupt routing problems with the newer kernels. Playing with APIC and ACPI settings may allow an SMP kernel to boot correctly. > What we see currently are RAID messages about kicking disks from array > due to IO erors (they look like hardware errors even though surface > tests do not give anything!): Unfortunately, the BIOS scan is rarely conclusive. Media errors may only present themselves when the disk is at a higher temperature. The single sector reads performed by the BIOS are not a sufficiently high load to elevate drive temperature to something similar to that of an active server. > ----------- > Nov 18 11:50:33 hydra kernel: SCSI disk error : host 0 channel 0 id 1 > lun 0 return code = 8000002 Nov 18 11:50:33 hydra kernel: Info > fld=0x4005, Current sd08:11: sense key Hardware Error Nov 18 11:50:34 > hydra kernel: I/O error: dev 08:11, sector 26908288 Nov 18 11:50:34 > hydra kernel: raid5: Disk failure on sdb1, disabling device. Operation > continuing on Unfortunately, the ASC and ASCQ codes are not given to better qualify the error, but this is a *hardware* not medium error. This typically means that the device believes one of its components has failed. > 2 devices > ---------- > Nov 18 15:38:45 hydra kernel: scsi0: ERROR on channel 0, id 8, lun 0, > CDB: Read (10) 00 03 27 23 0f 00 00 f8 00 > Nov 18 15:38:45 hydra kernel: Info fld=0x327231e, Current sd08:41: sense > key Medium Error Nov 18 15:38:45 hydra kernel: I/O error: dev 08:41, > sector 52896472 Nov 18 15:38:45 hydra kernel: raid5: Disk failure on > sde1, disabling device. Operation continuing on > 3 devices This is a typical medium error. -- Justin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2211640000.1069174822>