Date: Tue, 6 Feb 2001 21:54:48 +1300 From: "Mark Ibell" <marki@paradise.net.nz> To: "3Phase" <Phase3@worldnet.att.net> Cc: <freebsd-questions@FreeBSD.ORG> Subject: Re: SCSI parity error Message-ID: <003b01c0901a$82fcaa00$0101a8c0@evileye> References: <004301c08ff0$96e0c5d0$0101a8c0@evileye> <04d601c09006$05377d20$4fa0480c@sisyphus2>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "3Phase" <Phase3@worldnet.att.net> To: "Mark Ibell" <marki@paradise.net.nz> Cc: <freebsd-questions@FreeBSD.ORG> Sent: Tuesday, February 06, 2001 7:26 PM Subject: Re: SCSI parity error > > ----- Original Message ----- > From: "Mark Ibell" <marki@paradise.net.nz> > To: <freebsd-questions@freebsd.org> > Sent: Monday, February 05, 2001 07:55 PM > Subject: SCSI parity error > > > > Hi, > > > > We've just experienced a nasty server crash on a system running > 4.1-RELEASE. > > The drive configuration is 2 x Quantum Atlas 10k2 drives running off an > > Adaptec 2940U2W controller. The relevant log entries are listed below. Any > > ideas what could have caused this - both disks appear to check out ok > > according to the SCSI BIOS 'Verify Media' option. > > > > Cheers, > > Mark > > > > > > (da1:ahc0:0:6:0): parity error detected in Data-in phase. SEQADDR(0x166) > > SCSIRATE(0x93) > > ahc0:A:6: unknown scsi bus phase 0. Attempting to continue > > ahc0: WARNING no command for scb 0 (cmdcmplt) > > QOUTPOS = 195 > > ahc0: WARNING no command for scb 96 (cmdcmplt) > > QOUTPOS = 196 > > ... > > ahc0: WARNING no command for scb 6 (cmdcmplt) > > QOUTPOS = 219 > > (da1:ahc0:0:6:0): SCB 0x13 - timed out while idle, SEQADDR == 0xb > > (da1:ahc0:0:6:0): Queuing a BDR SCB > > (da1:ahc0:0:6:0): Bus Device Reset Message Sent > > (da1:ahc0:0:6:0): no longer in timeout, status = 34c > > ahc0: Bus Device Reset on A:6. 1 SCBs aborted > > (da0:ahc0:0:5:0): SCB 0x8c - timed out while idle, SEQADDR == 0xa > > (da0:ahc0:0:5:0): Queuing a BDR SCB > > (da0:ahc0:0:5:0): Bus Device Reset Message Sent > > (da0:ahc0:0:5:0): no longer in timeout, status = 34b > > ahc0: Bus Device Reset on A:5. 7 SCBs aborted > > ... > > Parity usually means hardware. Are they 10k RPM drives? > Are they separate or are you using them as a virtual volume? > What was it doing when it crashed, loafing or heavy use? Yeah, they are 10k RPM drives. They are used as a vinum stripe with softupdates enabled. Crashed during a full backup, just after the daily cron jobs (~2:10am). > > Cheap test: > Get a radio, find a frequency and listen to the machine. > > Give the drives a repetative task and you should be able to > 'hear' each sub-system operate when it reads/writes data. > > Walk away with the radio. > > If you can hear it down the hall it has RF problems. > If it sounds 'different' sometimes you have a problem but > error correction is masking it. > > Assuming it's been running okay for a while, check the usual > suspects like loose connections, sockets, terminators, cables, > heat, and good power. No one tripped over the cord or used it > as a shin-detector? It's been running fine (without a single crash) for months, although we have just added an IDE disk to boot off, whereas before we were booting of one of the SCSI's. Could this possibly have caused the heat inside the case to rise just enough to wreak havoc? > > -3P > > > Thanks, Mark To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?003b01c0901a$82fcaa00$0101a8c0>