From owner-freebsd-questions Mon Feb 5 22:28:20 2001 Delivered-To: freebsd-questions@freebsd.org Received: from mtiwmhc26.worldnet.att.net (mtiwmhc26.worldnet.att.net [204.127.131.51]) by hub.freebsd.org (Postfix) with ESMTP id B9B9537B4EC for ; Mon, 5 Feb 2001 22:28:01 -0800 (PST) Received: from sisyphus2 ([12.72.160.215]) by mtiwmhc26.worldnet.att.net (InterMail vM.4.01.03.10 201-229-121-110) with SMTP id <20010206062800.YIMP6585.mtiwmhc26.worldnet.att.net@sisyphus2>; Tue, 6 Feb 2001 06:28:00 +0000 Message-ID: <04d601c09006$05377d20$4fa0480c@sisyphus2> Reply-To: "3Phase" From: "3Phase" To: "Mark Ibell" Cc: References: <004301c08ff0$96e0c5d0$0101a8c0@evileye> Subject: Re: SCSI parity error Date: Mon, 5 Feb 2001 22:26:49 -0800 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ----- Original Message ----- From: "Mark Ibell" To: Sent: Monday, February 05, 2001 07:55 PM Subject: SCSI parity error > Hi, > > We've just experienced a nasty server crash on a system running 4.1-RELEASE. > The drive configuration is 2 x Quantum Atlas 10k2 drives running off an > Adaptec 2940U2W controller. The relevant log entries are listed below. Any > ideas what could have caused this - both disks appear to check out ok > according to the SCSI BIOS 'Verify Media' option. > > Cheers, > Mark > > > (da1:ahc0:0:6:0): parity error detected in Data-in phase. SEQADDR(0x166) > SCSIRATE(0x93) > ahc0:A:6: unknown scsi bus phase 0. Attempting to continue > ahc0: WARNING no command for scb 0 (cmdcmplt) > QOUTPOS = 195 > ahc0: WARNING no command for scb 96 (cmdcmplt) > QOUTPOS = 196 > ... > ahc0: WARNING no command for scb 6 (cmdcmplt) > QOUTPOS = 219 > (da1:ahc0:0:6:0): SCB 0x13 - timed out while idle, SEQADDR == 0xb > (da1:ahc0:0:6:0): Queuing a BDR SCB > (da1:ahc0:0:6:0): Bus Device Reset Message Sent > (da1:ahc0:0:6:0): no longer in timeout, status = 34c > ahc0: Bus Device Reset on A:6. 1 SCBs aborted > (da0:ahc0:0:5:0): SCB 0x8c - timed out while idle, SEQADDR == 0xa > (da0:ahc0:0:5:0): Queuing a BDR SCB > (da0:ahc0:0:5:0): Bus Device Reset Message Sent > (da0:ahc0:0:5:0): no longer in timeout, status = 34b > ahc0: Bus Device Reset on A:5. 7 SCBs aborted > ... Parity usually means hardware. Are they 10k RPM drives? Are they separate or are you using them as a virtual volume? What was it doing when it crashed, loafing or heavy use? Cheap test: Get a radio, find a frequency and listen to the machine. Give the drives a repetative task and you should be able to 'hear' each sub-system operate when it reads/writes data. Walk away with the radio. If you can hear it down the hall it has RF problems. If it sounds 'different' sometimes you have a problem but error correction is masking it. Assuming it's been running okay for a while, check the usual suspects like loose connections, sockets, terminators, cables, heat, and good power. No one tripped over the cord or used it as a shin-detector? -3P To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message