From owner-freebsd-bugs Fri Mar 10 09:38:11 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id JAA19310 for bugs-outgoing; Fri, 10 Mar 1995 09:38:11 -0800 Received: from maroon.tc.umn.edu (root@maroon.tc.umn.edu [128.101.118.21]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id JAA19294 for ; Fri, 10 Mar 1995 09:37:59 -0800 From: pritc003@maroon.tc.umn.edu Received: by maroon.tc.umn.edu; Fri, 10 Mar 95 11:35:58 -0500 Message-Id: <2f608dfe74be002@maroon.tc.umn.edu> Subject: Parity error on SCSI tape causes panic w/Adaptec 2842 controller To: bugs@FreeBSD.org Date: Fri, 10 Mar 1995 11:35:56 -0600 (CST) X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3053 Sender: bugs-owner@FreeBSD.org Precedence: bulk To: FreeBSD-gnats-submit@freebsd.org Subject: From: pritc003@maroon.tc.umn.edu Reply-To: pritc003@maroon.tc.umn.edu >Submitter-Id: current-users >Originator: Mike Pritchard >Organization: None >Confidential: no >Synopsis: Parity error from SCSI tape cause panic w/Adaptec 2842 >Severity: serious >Priority: medium >Category: kern >Release: FreeBSD 2.0-950210-SNAP i386 >Class: sw-bug >Environment: Adaptec 2842VL SCSI controller Archive 2150S tape drive >Description: If a tape parity error is detected by the Adaptec 2842 driver software (sys/i386/scsi/aic7xxxx.c), the system will panic with the following messages: ahc1: parity error on channel A target 0, lun 0 ahc1: Unknown SCSIINT. Status = 0x17 panic: ahc1: brkaddrint, Illegal Host Access at seqaddr = 0x0 Examing the code shows that the parity error detection code incorrectly falls into the unknown scsiinit code, which eventually leads to the panic. A fix for this is attached, but that fix uncovers another problem that causes repeated scsi device timeouts on sd0. >How-To-Repeat: Find a QIC-150 tape with a parity error, and try reading the tape. The system will panic when the parity error is detected. >Fix: Here is a partial fix to the problem to help someone get started, but there is still some other underlying problem that shows up with this fix installed. With this fix installed, the parity error is detected, and the machine will not panic, but then it starts complaining about scsi device timeouts on sd0 and keeps doing that forever, so the machine hangs up anyways. I've seen the scsi device timeout problem a few other times before, so it probably does need to be addressed, although in this case it may just be happening because the parity error code is just plain broken in some fasion. I'm also willing to help test out any fixes. *** old/aic7xxx.c Fri Mar 10 10:53:44 1995 --- ./aic7xxx.c Fri Mar 10 10:56:42 1995 *************** *** 1141,1146 **** --- 1141,1155 ---- } xs = scb->xs; + if ((status & (SELTO | SCSIPERR | BUSFREE)) == 0) { + printf("ahc%d: Unknown SCSIINT. Status = 0x%x\n", + unit, status); + outb(CLRSINT1 + iobase, status); + UNPAUSE_SEQUENCER(ahc); + outb(CLRINT + iobase, CLRINTSTAT); + scb = NULL; + goto cmdcomplete; + } if (status & SELTO) { u_char active; u_char flags; *************** *** 1196,1209 **** #endif } - else { - printf("ahc%d: Unknown SCSIINT. Status = 0x%x\n", - unit, status); - outb(CLRSINT1 + iobase, status); - UNPAUSE_SEQUENCER(ahc); - outb(CLRINT + iobase, CLRINTSTAT); - scb = NULL; - } if(scb != NULL) { /* We want to process the command */ untimeout(ahc_timeout, (caddr_t)scb); --- 1205,1210 ----