From owner-freebsd-hackers Sun May 21 21:12:16 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id VAA08684 for hackers-outgoing; Sun, 21 May 1995 21:12:16 -0700 Received: from specgw.spec.co.jp (specgw.spec.co.jp [202.32.13.1]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id VAA08678 ; Sun, 21 May 1995 21:12:07 -0700 Received: from tama3 (tama3 [202.32.13.252]) by specgw.spec.co.jp (8.6.5/3.3Wb-SPEC) with SMTP id NAA19293; Mon, 22 May 1995 13:06:45 +0900 Date: Mon, 22 May 1995 13:06:45 +0900 Message-Id: <199505220406.NAA19293@specgw.spec.co.jp> To: dufault@hda.com Cc: bugs@ns1.win.net, hackers@FreeBSD.org, julian@FreeBSD.org Subject: Re: kern/430: bug in tape drivers In-Reply-To: <199505211228.IAA17587@hda.com> From: =?ISO-2022-JP?B?GyRCQjwwZhsoSg==?= =?ISO-2022-JP?B?GyRCPV8bKEo=?= Atsushi Murai X-Mailer: AL-Mail for Windows(0.36B) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Sender: hackers-owner@FreeBSD.org Precedence: bulk Peter Dufault wrotes; >Mark Hittinger writes: >> After a few "bt0a try to abort" I get a "bt0a abort timed out". It is >> at this point that horrible things happen. The driver corrupts the ccb >> chain and bit sprays your disks. If the rewind finishes before the >> "bt0a abort timed out" then no badness happens to your disks. > >You get more than one "bt0: Try to abort" messages? That >is probably the scsi system aborting the ongoing disk transfers that aren't >completing due to the problem with the tape drive, since you will >only get one "Try to abort" message per aborted transaction. More than one "bt0 Try to abort" means try to abort a previous abort command. So I guess SCSI bus itself stuck up entirely. Check keep light on borad LED or not. >I'm not sure what your work around does: you end up stretching out >the "Try to abort" time until the drive finishes and "unlocks" >the host adapter. So you've tried to abort a few transfers. Did they >abort? I don't know. Do you wind up getting a disk retry per >abort message after this? > >Anyway, if the "abort timed out" happens we toss that active CCB's back >onto the freelist and the next SCSI transaction will get that same >CCB. This is probably a mistake: we should instead let the CCBs leak >off into the bit bucket, potentially hanging the system, >but tossing them back so that they wind up being reused may be what >is trashing the disk. If I memory correct, never happen such a case. (leak the CCBS and so on.) >Peter >-- >Peter Dufault Real Time Machine Control and Simulation >HD Associates, Inc. Voice: 508 433 6936 >dufault@hda.com Fax: 508 433 5267 Atsushi. -- Atsushi Murai E-Mail: amurai@spec.co.jp SPEC Voice : +81-3-3833-5341 System Planning and Engineering Corp.