From owner-freebsd-hackers  Sun May 21 21:12:16 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id VAA08684
          for hackers-outgoing; Sun, 21 May 1995 21:12:16 -0700
Received: from specgw.spec.co.jp (specgw.spec.co.jp [202.32.13.1])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id VAA08678
          ; Sun, 21 May 1995 21:12:07 -0700
Received: from tama3 (tama3 [202.32.13.252]) by specgw.spec.co.jp (8.6.5/3.3Wb-SPEC) with SMTP id NAA19293; Mon, 22 May 1995 13:06:45 +0900
Date: Mon, 22 May 1995 13:06:45 +0900
Message-Id: <199505220406.NAA19293@specgw.spec.co.jp>
To: dufault@hda.com
Cc: bugs@ns1.win.net, hackers@FreeBSD.org, julian@FreeBSD.org
Subject: Re: kern/430: bug in tape drivers
In-Reply-To: <199505211228.IAA17587@hda.com>
From: =?ISO-2022-JP?B?GyRCQjwwZhsoSg==?= 
	=?ISO-2022-JP?B?GyRCPV8bKEo=?= Atsushi Murai  <amurai@spec.co.jp>
X-Mailer: AL-Mail for Windows(0.36B)
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-2022-jp
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

Peter Dufault <dufault@hda.com> wrotes;
>Mark Hittinger writes:
>> After a few "bt0a try to abort" I get a "bt0a abort timed out".  It is
>> at this point that horrible things happen.  The driver corrupts the ccb
>> chain and bit sprays your disks.  If the rewind finishes before the
>> "bt0a abort timed out" then no badness happens to your disks.
>
>You get more than one "bt0: Try to abort" messages?   That
>is probably the scsi system aborting the ongoing disk transfers that aren't
>completing due to the problem with the tape drive, since you will
>only get one "Try to abort" message per aborted transaction.

More than one "bt0 Try to abort" means  try to abort a previous abort
command. So I guess SCSI bus itself stuck up entirely. Check keep light
on borad LED or not.

>I'm not sure what your work around does:  you end up stretching out
>the "Try to abort" time until the drive finishes and "unlocks"
>the host adapter.  So you've tried to abort a few transfers.  Did they
>abort?  I don't know.  Do you wind up getting a disk retry per
>abort message after this?
>
>Anyway, if the "abort timed out" happens we toss that active CCB's back
>onto the freelist and the next SCSI transaction will get that same
>CCB.  This is probably a mistake: we should instead let the CCBs leak
>off into the bit bucket, potentially hanging the system,
>but tossing them back so that they wind up being reused may be what
>is trashing the disk.

If I memory correct, never happen such a case. (leak the CCBS and so on.)

>Peter
>-- 
>Peter Dufault               Real Time Machine Control and Simulation
>HD Associates, Inc.         Voice: 508 433 6936
>dufault@hda.com             Fax:   508 433 5267

Atsushi.

-- 
Atsushi Murai                                         E-Mail: amurai@spec.co.jp
SPEC                                                  Voice : +81-3-3833-5341
System Planning and Engineering Corp.