From owner-freebsd-stable  Thu Jun 18 09:52:09 1998
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id JAA05249
          for freebsd-stable-outgoing; Thu, 18 Jun 1998 09:52:09 -0700 (PDT)
          (envelope-from owner-freebsd-stable@FreeBSD.ORG)
Received: from pau-amma.whistle.com (s205m64.whistle.com [207.76.205.64])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA05121
          for <stable@freebsd.org>; Thu, 18 Jun 1998 09:52:03 -0700 (PDT)
          (envelope-from dhw@whistle.com)
Received: (from dhw@localhost)
	by pau-amma.whistle.com (8.8.7/8.8.7) id JAA09995
	for stable@freebsd.org; Thu, 18 Jun 1998 09:51:32 -0700 (PDT)
	(envelope-from dhw)
Date: Thu, 18 Jun 1998 09:51:32 -0700 (PDT)
From: David Wolfskill <dhw@whistle.com>
Message-Id: <199806181651.JAA09995@pau-amma.whistle.com>
To: stable@FreeBSD.ORG
Subject: Re: whee, ahc0 kernel panic
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>Date: Wed, 17 Jun 1998 17:03:17 -0500
>From: "Matthew D. Fuller" <fullermd@futuresouth.com>

>On Wed, Jun 17, 1998 at 02:42:21PM -0700, Chris Timmons scribbled:

>> Well, I am immediately suspicious of the Micropolis firmware and or
>> hardware.  You might also look back in the -stable archives for comments
>> about "Rev E" versions of Adaptec boards not working so well - is your
>> card brand new? 

>> > ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on
>                                                ^^^^^
>No, it's one of the older ones.  Could be the Microp, though; it's given some trouble in the past.  I'm curious as to why the errors chose to pop up at me at a time when the drive wasn't doing anything, as opposed to the times it was involved in, say, a buildworld.  Weird...

Not sure this will be immediately useful, but it may be (at least)
"interesting" as a data point.

I have a server running 2.2.6-R; has a couple of SCSI controllers, and
we've been having some difficulties.  Here are relevant excerpts
from /var/log/messages:

CPU: Pentium Pro (199.31-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x617  Stepping=7
  Features=0xf9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV>
real memory  = 33554432 (32768K bytes)
avail memory = 30621696 (29904K bytes)
Probing for devices on PCI bus 0:
chip0 <Intel 82440FX (Natoma) PCI and memory controller> rev 2 on pci0:0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:1:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:1:1
pci0:1:2: Intel Corporation, device=0x7020, class=serial, subclass=0x03 int d irq 12 [no driver assigned]
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 12 on pci0:9:0
ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
ahc0 waiting for scsi devices to settle
ahc0:A:0: refuses WIDE negotiation.  Using 8bit transfers
(ahc0:0:0): "QUANTUM FIREBALL1080S 1Q09" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 1042MB (2134305 512 byte sectors)
(ahc0:1:0): "Quantum XP31070W L912" type 0 fixed SCSI 2
sd1(ahc0:1:0): Direct-Access 1075MB (2203480 512 byte sectors)
(ahc0:2:0): "MICROP 4691WS T171" type 0 fixed SCSI 2
sd2(ahc0:2:0): Direct-Access 8681MB (17780058 512 byte sectors)
(ahc0:6:0): "PLEXTOR CD-ROM PX-6XCS 2.06" type 5 removable SCSI 2
cd0(ahc0:6:0): CD-ROM can't get the size
...
ahc1 <Adaptec 2940 SCSI host adapter> rev 3 int a irq 11 on pci0:11:0
ahc1: aic7870 Single Channel, SCSI Id=7, 16 SCBs
(ahc1:0:0): "HP C3725S 6039" type 0 fixed SCSI 2
sd3(ahc1:0:0): Direct-Access 2047MB (4194058 512 byte sectors)
(ahc1:1:0): "HP C3725S 6039" type 0 fixed SCSI 2
sd4(ahc1:1:0): Direct-Access 2047MB (4194058 512 byte sectors)
(ahc1:4:0): "HP C1533A 9406" type 1 removable SCSI 2
st0(ahc1:4:0): Sequential-Access density code 0x24, variable blocks, write-enabled
(ahc1:5:0): "HP C1533A 9503" type 1 removable SCSI 2
st1(ahc1:5:0): Sequential-Access density code 0x24,  drive empty
...

Later, we got this (also from /var/log/messages):

sd1(ahc0:1:0): parity error during Data-In phase.
sd1(ahc0:1:0): ABORTED COMMAND asc:48,0
sd1(ahc0:1:0):  Initiator detected error message received
, retries:4
sd1(ahc0:1:0): parity error during Data-In phase.
sd1(ahc0:1:0): ABORTED COMMAND asc:48,0
sd1(ahc0:1:0):  Initiator detected error message received
, retries:4
...
sd1(ahc0:1:0): ABORTED COMMAND info:0x1700c0 asc:47,0 SCSI parity error
, retries:4
sd1(ahc0:1:0): parity error during Message-In phase.
Unexpected busfree.  LASTPHASE == 0xa0
SEQADDR == 0xa0
sd1(ahc0:1:0): parity error during Data-In phase.
sd1(ahc0:1:0): ABORTED COMMAND asc:48,0
sd1(ahc0:1:0):  Initiator detected error message received
, retries:4
sd1(ahc0:1:0): ABORTED COMMAND info:0x1a0049 asc:47,0 SCSI parity error
, retries:4
sd1(ahc0:1:0): parity error during Data-In phase.
sd1(ahc0:1:0): ABORTED COMMAND asc:48,0
sd1(ahc0:1:0):  Initiator detected error message received
, retries:4
spec_getpages: I/O read error
vm_fault: pager input (probably hardware) error, PID 320 failure
pid 320 (bash), uid 1052: exited on signal 11 (core dumped)
sd1(ahc0:1:0): parity error during Data-In phase.
sd1(ahc0:1:0): ABORTED COMMAND asc:48,0
sd1(ahc0:1:0):  Initiator detected error message received
, retries:4


Now, the above -- which did *not* give me a "Warm and fuzzy" feeling --
occurred after I had made various attempts at re-cabling the devices in
question.  Since things had (obviously) *not* gone at all well, and since
the Micropolis drive was the "new" hardware (to the machine), I powered
the box off, disconnected the Micropolis drive (the Quantum XP31070W was
providing the termination for that leg), hacked /etc/fstab a little, and
brought the system back up.

Just this morning, I saw:

sd1(ahc0:1:0): parity error during Data-In phase.
sd1(ahc0:1:0): ABORTED COMMAND asc:48,0
sd1(ahc0:1:0):  Initiator detected error message received
, retries:4

And ahc0:1:0 is the Quantum XP31070W.  The Micropolis drive is
completely disconnected from the machine.

I would *hope* that this would minimize its ability to affect the
behavior of the machine....  :-(

If that's true, something else is going wrong....  :-(

I'm reasonably open to suggestions, though experimentation is tricky for
this case.  I'm thinking that maybe the ahc0 is a problem (as
configured)...?

david
-- 
David Wolfskill		UNIX System Administrator
dhw@whistle.com		voice: (650) 577-7158	pager: (650) 371-4621

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message