From owner-freebsd-stable Thu Jun 18 09:52:09 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id JAA05249 for freebsd-stable-outgoing; Thu, 18 Jun 1998 09:52:09 -0700 (PDT) (envelope-from owner-freebsd-stable@FreeBSD.ORG) Received: from pau-amma.whistle.com (s205m64.whistle.com [207.76.205.64]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA05121 for ; Thu, 18 Jun 1998 09:52:03 -0700 (PDT) (envelope-from dhw@whistle.com) Received: (from dhw@localhost) by pau-amma.whistle.com (8.8.7/8.8.7) id JAA09995 for stable@freebsd.org; Thu, 18 Jun 1998 09:51:32 -0700 (PDT) (envelope-from dhw) Date: Thu, 18 Jun 1998 09:51:32 -0700 (PDT) From: David Wolfskill Message-Id: <199806181651.JAA09995@pau-amma.whistle.com> To: stable@FreeBSD.ORG Subject: Re: whee, ahc0 kernel panic Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >Date: Wed, 17 Jun 1998 17:03:17 -0500 >From: "Matthew D. Fuller" >On Wed, Jun 17, 1998 at 02:42:21PM -0700, Chris Timmons scribbled: >> Well, I am immediately suspicious of the Micropolis firmware and or >> hardware. You might also look back in the -stable archives for comments >> about "Rev E" versions of Adaptec boards not working so well - is your >> card brand new? >> > ahc0 rev 0 int a irq 11 on > ^^^^^ >No, it's one of the older ones. Could be the Microp, though; it's given some trouble in the past. I'm curious as to why the errors chose to pop up at me at a time when the drive wasn't doing anything, as opposed to the times it was involved in, say, a buildworld. Weird... Not sure this will be immediately useful, but it may be (at least) "interesting" as a data point. I have a server running 2.2.6-R; has a couple of SCSI controllers, and we've been having some difficulties. Here are relevant excerpts from /var/log/messages: CPU: Pentium Pro (199.31-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x617 Stepping=7 Features=0xf9ff real memory = 33554432 (32768K bytes) avail memory = 30621696 (29904K bytes) Probing for devices on PCI bus 0: chip0 rev 2 on pci0:0:0 chip1 rev 1 on pci0:1:0 chip2 rev 0 on pci0:1:1 pci0:1:2: Intel Corporation, device=0x7020, class=serial, subclass=0x03 int d irq 12 [no driver assigned] ahc0 rev 0 int a irq 12 on pci0:9:0 ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs ahc0 waiting for scsi devices to settle ahc0:A:0: refuses WIDE negotiation. Using 8bit transfers (ahc0:0:0): "QUANTUM FIREBALL1080S 1Q09" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 1042MB (2134305 512 byte sectors) (ahc0:1:0): "Quantum XP31070W L912" type 0 fixed SCSI 2 sd1(ahc0:1:0): Direct-Access 1075MB (2203480 512 byte sectors) (ahc0:2:0): "MICROP 4691WS T171" type 0 fixed SCSI 2 sd2(ahc0:2:0): Direct-Access 8681MB (17780058 512 byte sectors) (ahc0:6:0): "PLEXTOR CD-ROM PX-6XCS 2.06" type 5 removable SCSI 2 cd0(ahc0:6:0): CD-ROM can't get the size ... ahc1 rev 3 int a irq 11 on pci0:11:0 ahc1: aic7870 Single Channel, SCSI Id=7, 16 SCBs (ahc1:0:0): "HP C3725S 6039" type 0 fixed SCSI 2 sd3(ahc1:0:0): Direct-Access 2047MB (4194058 512 byte sectors) (ahc1:1:0): "HP C3725S 6039" type 0 fixed SCSI 2 sd4(ahc1:1:0): Direct-Access 2047MB (4194058 512 byte sectors) (ahc1:4:0): "HP C1533A 9406" type 1 removable SCSI 2 st0(ahc1:4:0): Sequential-Access density code 0x24, variable blocks, write-enabled (ahc1:5:0): "HP C1533A 9503" type 1 removable SCSI 2 st1(ahc1:5:0): Sequential-Access density code 0x24, drive empty ... Later, we got this (also from /var/log/messages): sd1(ahc0:1:0): parity error during Data-In phase. sd1(ahc0:1:0): ABORTED COMMAND asc:48,0 sd1(ahc0:1:0): Initiator detected error message received , retries:4 sd1(ahc0:1:0): parity error during Data-In phase. sd1(ahc0:1:0): ABORTED COMMAND asc:48,0 sd1(ahc0:1:0): Initiator detected error message received , retries:4 ... sd1(ahc0:1:0): ABORTED COMMAND info:0x1700c0 asc:47,0 SCSI parity error , retries:4 sd1(ahc0:1:0): parity error during Message-In phase. Unexpected busfree. LASTPHASE == 0xa0 SEQADDR == 0xa0 sd1(ahc0:1:0): parity error during Data-In phase. sd1(ahc0:1:0): ABORTED COMMAND asc:48,0 sd1(ahc0:1:0): Initiator detected error message received , retries:4 sd1(ahc0:1:0): ABORTED COMMAND info:0x1a0049 asc:47,0 SCSI parity error , retries:4 sd1(ahc0:1:0): parity error during Data-In phase. sd1(ahc0:1:0): ABORTED COMMAND asc:48,0 sd1(ahc0:1:0): Initiator detected error message received , retries:4 spec_getpages: I/O read error vm_fault: pager input (probably hardware) error, PID 320 failure pid 320 (bash), uid 1052: exited on signal 11 (core dumped) sd1(ahc0:1:0): parity error during Data-In phase. sd1(ahc0:1:0): ABORTED COMMAND asc:48,0 sd1(ahc0:1:0): Initiator detected error message received , retries:4 Now, the above -- which did *not* give me a "Warm and fuzzy" feeling -- occurred after I had made various attempts at re-cabling the devices in question. Since things had (obviously) *not* gone at all well, and since the Micropolis drive was the "new" hardware (to the machine), I powered the box off, disconnected the Micropolis drive (the Quantum XP31070W was providing the termination for that leg), hacked /etc/fstab a little, and brought the system back up. Just this morning, I saw: sd1(ahc0:1:0): parity error during Data-In phase. sd1(ahc0:1:0): ABORTED COMMAND asc:48,0 sd1(ahc0:1:0): Initiator detected error message received , retries:4 And ahc0:1:0 is the Quantum XP31070W. The Micropolis drive is completely disconnected from the machine. I would *hope* that this would minimize its ability to affect the behavior of the machine.... :-( If that's true, something else is going wrong.... :-( I'm reasonably open to suggestions, though experimentation is tricky for this case. I'm thinking that maybe the ahc0 is a problem (as configured)...? david -- David Wolfskill UNIX System Administrator dhw@whistle.com voice: (650) 577-7158 pager: (650) 371-4621 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message