From owner-freebsd-questions  Mon Feb  5 22:28:20 2001
Delivered-To: freebsd-questions@freebsd.org
Received: from mtiwmhc26.worldnet.att.net (mtiwmhc26.worldnet.att.net [204.127.131.51])
	by hub.freebsd.org (Postfix) with ESMTP id B9B9537B4EC
	for <freebsd-questions@FreeBSD.ORG>; Mon,  5 Feb 2001 22:28:01 -0800 (PST)
Received: from sisyphus2 ([12.72.160.215]) by mtiwmhc26.worldnet.att.net
          (InterMail vM.4.01.03.10 201-229-121-110) with SMTP
          id <20010206062800.YIMP6585.mtiwmhc26.worldnet.att.net@sisyphus2>;
          Tue, 6 Feb 2001 06:28:00 +0000
Message-ID: <04d601c09006$05377d20$4fa0480c@sisyphus2>
Reply-To: "3Phase" <Phase3@worldnet.att.net>
From: "3Phase" <Phase3@worldnet.att.net>
To: "Mark Ibell" <marki@paradise.net.nz>
Cc: <freebsd-questions@FreeBSD.ORG>
References: <004301c08ff0$96e0c5d0$0101a8c0@evileye>
Subject: Re: SCSI parity error
Date: Mon, 5 Feb 2001 22:26:49 -0800
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


----- Original Message -----
From: "Mark Ibell" <marki@paradise.net.nz>
To: <freebsd-questions@freebsd.org>
Sent: Monday, February 05, 2001 07:55 PM
Subject: SCSI parity error


> Hi,
>
> We've just experienced a nasty server crash on a system running
4.1-RELEASE.
> The drive configuration is 2 x Quantum Atlas 10k2 drives running off an
> Adaptec 2940U2W controller. The relevant log entries are listed below. Any
> ideas what could have caused this - both disks appear to check out ok
> according to the SCSI BIOS 'Verify Media' option.
>
> Cheers,
> Mark
>
>
> (da1:ahc0:0:6:0): parity error detected in Data-in phase. SEQADDR(0x166)
> SCSIRATE(0x93)
> ahc0:A:6: unknown scsi bus phase 0.  Attempting to continue
> ahc0: WARNING no command for scb 0 (cmdcmplt)
> QOUTPOS = 195
> ahc0: WARNING no command for scb 96 (cmdcmplt)
>  QOUTPOS = 196
> ...
> ahc0: WARNING no command for scb 6 (cmdcmplt)
> QOUTPOS = 219
> (da1:ahc0:0:6:0): SCB 0x13 - timed out while idle, SEQADDR == 0xb
> (da1:ahc0:0:6:0): Queuing a BDR SCB
> (da1:ahc0:0:6:0): Bus Device Reset Message Sent
> (da1:ahc0:0:6:0): no longer in timeout, status = 34c
> ahc0: Bus Device Reset on A:6. 1 SCBs aborted
> (da0:ahc0:0:5:0): SCB 0x8c - timed out while idle, SEQADDR == 0xa
> (da0:ahc0:0:5:0): Queuing a BDR SCB
> (da0:ahc0:0:5:0): Bus Device Reset Message Sent
> (da0:ahc0:0:5:0): no longer in timeout, status = 34b
> ahc0: Bus Device Reset on A:5. 7 SCBs aborted
> ...

Parity usually means hardware. Are they 10k RPM drives?
Are they separate or are you using them as a virtual volume?
What was it doing when it crashed, loafing or heavy use?

Cheap test:
Get a radio, find a frequency and listen to the machine.

Give the drives a repetative task and you should be able to
'hear' each sub-system operate when it reads/writes data.

Walk away with the radio.

If you can hear it down the hall it has RF problems.
If it sounds 'different' sometimes you have a problem but
error correction is masking it.

Assuming it's been running okay for a while, check the usual
suspects like loose connections, sockets, terminators, cables,
heat, and good power.  No one tripped over the cord or used it
as a shin-detector?

-3P


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message