From owner-freebsd-questions Tue Feb 6 0:56:19 2001 Delivered-To: freebsd-questions@freebsd.org Received: from deborah.paradise.net.nz (deborah.paradise.net.nz [203.96.152.32]) by hub.freebsd.org (Postfix) with ESMTP id 4ECFA37B503 for ; Tue, 6 Feb 2001 00:55:59 -0800 (PST) Received: from evileye (203-79-68-188.apx0.paradise.net.nz [203.79.68.188]) by deborah.paradise.net.nz (8.10.1/8.10.1) with SMTP id f168tp517918; Tue, 6 Feb 2001 21:55:51 +1300 (NZDT) Message-ID: <003b01c0901a$82fcaa00$0101a8c0@evileye> From: "Mark Ibell" To: "3Phase" Cc: References: <004301c08ff0$96e0c5d0$0101a8c0@evileye> <04d601c09006$05377d20$4fa0480c@sisyphus2> Subject: Re: SCSI parity error Date: Tue, 6 Feb 2001 21:54:48 +1300 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ----- Original Message ----- From: "3Phase" To: "Mark Ibell" Cc: Sent: Tuesday, February 06, 2001 7:26 PM Subject: Re: SCSI parity error > > ----- Original Message ----- > From: "Mark Ibell" > To: > Sent: Monday, February 05, 2001 07:55 PM > Subject: SCSI parity error > > > > Hi, > > > > We've just experienced a nasty server crash on a system running > 4.1-RELEASE. > > The drive configuration is 2 x Quantum Atlas 10k2 drives running off an > > Adaptec 2940U2W controller. The relevant log entries are listed below. Any > > ideas what could have caused this - both disks appear to check out ok > > according to the SCSI BIOS 'Verify Media' option. > > > > Cheers, > > Mark > > > > > > (da1:ahc0:0:6:0): parity error detected in Data-in phase. SEQADDR(0x166) > > SCSIRATE(0x93) > > ahc0:A:6: unknown scsi bus phase 0. Attempting to continue > > ahc0: WARNING no command for scb 0 (cmdcmplt) > > QOUTPOS = 195 > > ahc0: WARNING no command for scb 96 (cmdcmplt) > > QOUTPOS = 196 > > ... > > ahc0: WARNING no command for scb 6 (cmdcmplt) > > QOUTPOS = 219 > > (da1:ahc0:0:6:0): SCB 0x13 - timed out while idle, SEQADDR == 0xb > > (da1:ahc0:0:6:0): Queuing a BDR SCB > > (da1:ahc0:0:6:0): Bus Device Reset Message Sent > > (da1:ahc0:0:6:0): no longer in timeout, status = 34c > > ahc0: Bus Device Reset on A:6. 1 SCBs aborted > > (da0:ahc0:0:5:0): SCB 0x8c - timed out while idle, SEQADDR == 0xa > > (da0:ahc0:0:5:0): Queuing a BDR SCB > > (da0:ahc0:0:5:0): Bus Device Reset Message Sent > > (da0:ahc0:0:5:0): no longer in timeout, status = 34b > > ahc0: Bus Device Reset on A:5. 7 SCBs aborted > > ... > > Parity usually means hardware. Are they 10k RPM drives? > Are they separate or are you using them as a virtual volume? > What was it doing when it crashed, loafing or heavy use? Yeah, they are 10k RPM drives. They are used as a vinum stripe with softupdates enabled. Crashed during a full backup, just after the daily cron jobs (~2:10am). > > Cheap test: > Get a radio, find a frequency and listen to the machine. > > Give the drives a repetative task and you should be able to > 'hear' each sub-system operate when it reads/writes data. > > Walk away with the radio. > > If you can hear it down the hall it has RF problems. > If it sounds 'different' sometimes you have a problem but > error correction is masking it. > > Assuming it's been running okay for a while, check the usual > suspects like loose connections, sockets, terminators, cables, > heat, and good power. No one tripped over the cord or used it > as a shin-detector? It's been running fine (without a single crash) for months, although we have just added an IDE disk to boot off, whereas before we were booting of one of the SCSI's. Could this possibly have caused the heat inside the case to rise just enough to wreak havoc? > > -3P > > > Thanks, Mark To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message