From owner-freebsd-hackers Sun May 21 05:28:30 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id FAA19242 for hackers-outgoing; Sun, 21 May 1995 05:28:30 -0700 Received: from hda.com (hda.com [199.232.40.182]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id FAA19236 ; Sun, 21 May 1995 05:28:25 -0700 Received: (dufault@localhost) by hda.com (8.6.9/8.3) id IAA17587; Sun, 21 May 1995 08:28:50 -0400 From: Peter Dufault Message-Id: <199505211228.IAA17587@hda.com> Subject: Re: kern/430: bug in tape drivers To: bugs@ns1.win.net (Mark Hittinger) Date: Sun, 21 May 1995 08:28:49 -0400 (EDT) Cc: hackers@FreeBSD.org, julian@FreeBSD.org In-Reply-To: <199505200134.VAA07349@ns1.win.net> from "Mark Hittinger" at May 19, 95 09:34:07 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 2488 Sender: hackers-owner@FreeBSD.org Precedence: bulk Mark Hittinger writes: > > >Number: 430 > >Category: kern > >Synopsis: SCSI Tape dont work > >Originator: Charles Henrich (MSU) > >Release: FreeBSD 2.1.0-Development i386 > > > > ALR Dual Pentium, BT747 SCSI-2, Connor DDS-2 Dat, 3 Seagate Hawk 2gig > ^^^^^ > > drives. > > > >Description: > > > > 90% of the time you access the dat drive via dump, FreeBSD goes off > > and scrambles the other disks in the system. This sucks, and has > > happened to me several times. > > I think that the the tape drive is tying up the SCSI bus (and maybe therefore the host adapter?) for some reason. > I have seen the same problem since 2.0R. I have a WangDAT3400DX. When a > process closes the tape drive I get "bt0a: try to abort". I believe this > is due to the lengthy rewind, although recently I noted that there was a > problem with scsi commands that contained no data. In any event I > still see the problem in -current. I will try a 2940 controller this > weekend and see if the problem exists there. As I mentioned, zero length commands aren't an issue. > After a few "bt0a try to abort" I get a "bt0a abort timed out". It is > at this point that horrible things happen. The driver corrupts the ccb > chain and bit sprays your disks. If the rewind finishes before the > "bt0a abort timed out" then no badness happens to your disks. You get more than one "bt0: Try to abort" messages? That is probably the scsi system aborting the ongoing disk transfers that aren't completing due to the problem with the tape drive, since you will only get one "Try to abort" message per aborted transaction. I'm not sure what your work around does: you end up stretching out the "Try to abort" time until the drive finishes and "unlocks" the host adapter. So you've tried to abort a few transfers. Did they abort? I don't know. Do you wind up getting a disk retry per abort message after this? Anyway, if the "abort timed out" happens we toss that active CCB's back onto the freelist and the next SCSI transaction will get that same CCB. This is probably a mistake: we should instead let the CCBs leak off into the bit bucket, potentially hanging the system, but tossing them back so that they wind up being reused may be what is trashing the disk. Peter -- Peter Dufault Real Time Machine Control and Simulation HD Associates, Inc. Voice: 508 433 6936 dufault@hda.com Fax: 508 433 5267