FreeBSD Mail Archives

Date:      Tue, 18 Nov 2003 10:00:23 -0700
From:      "Justin T. Gibbs" <gibbs@scsiguy.com>
To:        Kristian Vlahovicek <kristian@icgeb.org>, aic7xxx@freebsd.org
Cc:        kajan@icgeb.org
Subject:   Re: aic7xxx problems
Message-ID:  <2211640000.1069174822@aslan.btc.adaptec.com>
In-Reply-To: <1348.140.105.9.78.1069171077.squirrel@www.icgeb.org>
References:  <1348.140.105.9.78.1069171077.squirrel@www.icgeb.org>

> the disks are all on the same channel, first we had them as 1 standalone
> (ID0) and 4 in RAID0 and one day the machine just froze, reporting
> errors in the aic7xxx driver (unfortunately no log of that remained).
> The disks were surface tested with the adaptec on-card SCSI utility and
> came out without errors.

The bad block test in the BIOS is unfortunately, not always conclusive.

> We did the OS upgrade, and installed the 2.4.20-8smp kernel (RH9
> vanilla), the machine boots with aic7xxx dumping the whole lot of
> messages (see below).

Was this supposed to be in your attachment?  The attachment is empty.

> An attempt to upgrade kernel to newer version (still the RH9 updated
> 2.4.20-20.9smp) results in boot process stopping just before the SCSI
> loading. We tried to update teh aic7xxx driver to v6.3.0 from Justin's
> website and that one again resulted in complete unability to boot (same
> stop point as above).

Did you do this via RPM or by building the driver from source?
6.3.3 is the latest RPM version on my website.

> Note that both kernels 2.4.20-20.9 and the
> 2.4.20-8 boot in single processor version with both the original aic7xxx
> drivers and the v6.3.0 DO boot.

I will have to see the messages, but you may be experiencing interrupt
routing problems with the newer kernels.  Playing with APIC and ACPI
settings may allow an SMP kernel to boot correctly.

> What we see currently are RAID messages about kicking disks from array
> due to IO erors (they look like hardware errors even though surface
> tests do not give anything!):

Unfortunately, the BIOS scan is rarely conclusive.  Media errors may
only present themselves when the disk is at a higher temperature.  The
single sector reads performed by the BIOS are not a sufficiently high
load to elevate drive temperature to something similar to that of an
active server.

> -----------
> Nov 18 11:50:33 hydra kernel: SCSI disk error : host 0 channel 0 id 1
> lun 0 return code = 8000002 Nov 18 11:50:33 hydra kernel: Info
> fld=0x4005, Current sd08:11: sense key Hardware Error Nov 18 11:50:34
> hydra kernel:  I/O error: dev 08:11, sector 26908288 Nov 18 11:50:34
> hydra kernel: raid5: Disk failure on sdb1, disabling device. Operation
> continuing on

Unfortunately, the ASC and ASCQ codes are not given to better qualify
the error, but this is a *hardware* not medium error.  This typically
means that the device believes one of its components has failed.

>  2 devices
> ----------
> Nov 18 15:38:45 hydra kernel: scsi0: ERROR on channel 0, id 8, lun 0,
> CDB: Read (10) 00 03 27 23 0f  00 00 f8 00
> Nov 18 15:38:45 hydra kernel: Info fld=0x327231e, Current sd08:41: sense
> key Medium Error Nov 18 15:38:45 hydra kernel:  I/O error: dev 08:41,
> sector 52896472 Nov 18 15:38:45 hydra kernel: raid5: Disk failure on
> sde1, disabling device. Operation continuing on
>  3 devices

This is a typical medium error.

--
Justin

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2211640000.1069174822>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation