FreeBSD Mail Archives

Date:      Tue, 6 Feb 2001 21:54:48 +1300
From:      "Mark Ibell" <marki@paradise.net.nz>
To:        "3Phase" <Phase3@worldnet.att.net>
Cc:        <freebsd-questions@FreeBSD.ORG>
Subject:   Re: SCSI parity error
Message-ID:  <003b01c0901a$82fcaa00$0101a8c0@evileye>
References:  <004301c08ff0$96e0c5d0$0101a8c0@evileye> <04d601c09006$05377d20$4fa0480c@sisyphus2>


----- Original Message -----
From: "3Phase" <Phase3@worldnet.att.net>
To: "Mark Ibell" <marki@paradise.net.nz>
Cc: <freebsd-questions@FreeBSD.ORG>
Sent: Tuesday, February 06, 2001 7:26 PM
Subject: Re: SCSI parity error


>
> ----- Original Message -----
> From: "Mark Ibell" <marki@paradise.net.nz>
> To: <freebsd-questions@freebsd.org>
> Sent: Monday, February 05, 2001 07:55 PM
> Subject: SCSI parity error
>
>
> > Hi,
> >
> > We've just experienced a nasty server crash on a system running
> 4.1-RELEASE.
> > The drive configuration is 2 x Quantum Atlas 10k2 drives running off an
> > Adaptec 2940U2W controller. The relevant log entries are listed below.
Any
> > ideas what could have caused this - both disks appear to check out ok
> > according to the SCSI BIOS 'Verify Media' option.
> >
> > Cheers,
> > Mark
> >
> >
> > (da1:ahc0:0:6:0): parity error detected in Data-in phase. SEQADDR(0x166)
> > SCSIRATE(0x93)
> > ahc0:A:6: unknown scsi bus phase 0.  Attempting to continue
> > ahc0: WARNING no command for scb 0 (cmdcmplt)
> > QOUTPOS = 195
> > ahc0: WARNING no command for scb 96 (cmdcmplt)
> >  QOUTPOS = 196
> > ...
> > ahc0: WARNING no command for scb 6 (cmdcmplt)
> > QOUTPOS = 219
> > (da1:ahc0:0:6:0): SCB 0x13 - timed out while idle, SEQADDR == 0xb
> > (da1:ahc0:0:6:0): Queuing a BDR SCB
> > (da1:ahc0:0:6:0): Bus Device Reset Message Sent
> > (da1:ahc0:0:6:0): no longer in timeout, status = 34c
> > ahc0: Bus Device Reset on A:6. 1 SCBs aborted
> > (da0:ahc0:0:5:0): SCB 0x8c - timed out while idle, SEQADDR == 0xa
> > (da0:ahc0:0:5:0): Queuing a BDR SCB
> > (da0:ahc0:0:5:0): Bus Device Reset Message Sent
> > (da0:ahc0:0:5:0): no longer in timeout, status = 34b
> > ahc0: Bus Device Reset on A:5. 7 SCBs aborted
> > ...
>
> Parity usually means hardware. Are they 10k RPM drives?
> Are they separate or are you using them as a virtual volume?
> What was it doing when it crashed, loafing or heavy use?

Yeah, they are 10k RPM drives.
They are used as a vinum stripe with softupdates enabled.
Crashed during a full backup, just after the daily cron jobs (~2:10am).

>
> Cheap test:
> Get a radio, find a frequency and listen to the machine.
>
> Give the drives a repetative task and you should be able to
> 'hear' each sub-system operate when it reads/writes data.
>
> Walk away with the radio.
>
> If you can hear it down the hall it has RF problems.
> If it sounds 'different' sometimes you have a problem but
> error correction is masking it.
>
> Assuming it's been running okay for a while, check the usual
> suspects like loose connections, sockets, terminators, cables,
> heat, and good power.  No one tripped over the cord or used it
> as a shin-detector?

It's been running fine (without a single crash) for months, although we have
just added an IDE disk to boot off, whereas before we were booting of one of
the SCSI's.

Could this possibly have caused the heat inside the case to rise just enough
to wreak havoc?

>
> -3P
>
>
>

Thanks,
Mark



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?003b01c0901a$82fcaa00$0101a8c0>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation