From owner-freebsd-hackers  Mon May 22 05:01:25 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id FAA22821
          for hackers-outgoing; Mon, 22 May 1995 05:01:25 -0700
Received: from hda.com (hda.com [199.232.40.182])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id FAA22814
          for <hackers@FreeBSD.org>; Mon, 22 May 1995 05:01:22 -0700
Received: (dufault@localhost) by hda.com (8.6.9/8.3) id IAA20247; Mon, 22 May 1995 08:01:43 -0400
From: Peter Dufault <dufault@hda.com>
Message-Id: <199505221201.IAA20247@hda.com>
Subject: Re: kern/430: bug in tape drivers (fwd)
To: bugs@ns1.win.net (Mark Hittinger)
Date: Mon, 22 May 1995 08:01:43 -0400 (EDT)
Cc: hackers@FreeBSD.org
In-Reply-To: <199505220441.AAA03138@ns1.win.net> from "Mark Hittinger" at May 22, 95 00:41:17 am
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 2789      
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

Mark Hittinger writes:
> 
> > >You get more than one "bt0: Try to abort" messages?   That
> 
> Yes, and they are repeated attempts to abort the rewind command I
> believe.

I don't think so; see the code fragment below.  As far as I can
see you should get only one "Try to abort" message per aborted
SCSI transaction.

> > 
> > More than one "bt0 Try to abort" means  try to abort a previous abort
> > command. So I guess SCSI bus itself stuck up entirely. Check keep light
> > on borad LED or not.
> 
> All disk i/o waits until the rewind finishes.  Disk activity light is lit
> during entire rewind.  Once rewind finishes then everything continues
> with no trouble.

Check the code:

>    if (ccb->flags == CCB_ABORTED) {
>        /*
>         * abort timed out
>         */
>        printf("bt%d: Abort Operation has timed out\n", unit);
>        ccb->xfer->retries = 0;     /* I MEAN IT ! */
>        ccb->host_stat = BT_ABORTED;
>        bt_done(unit, ccb);
>    } else {
>        /* abort the operation that has timed out */
>        printf("bt%d: Try to abort\n", unit);
>        bt_send_mbo(unit, ~SCSI_NOMASK,
>            BT_MBO_ABORT, ccb);
>        /* 2 secs for the abort */
>        ccb->flags = CCB_ABORTED;
>        timeout(bt_timeout, (caddr_t)ccb, 2 * hz);
>    }   

It sets the CCB_ABORTED flag the first time, so you should only
get one "Try to abort" message per aborted transaction. More than
one message means more than one aborted transaction.  We
ought to print out the ccb with %p.

> 
> > 
> > >I'm not sure what your work around does:  you end up stretching out
> > >the "Try to abort" time until the drive finishes and "unlocks"
> > >the host adapter.  So you've tried to abort a few transfers.  Did they
> > >abort?  I don't know.  Do you wind up getting a disk retry per
> > >abort message after this?
> 
> The workaround (a gross and sleazy kludge I admit :-) ) lets me do backups
> without bitspraying my disks!

Are you sure?
Which other transfers were aborted?  Were they
retried?  Are you sure you don't have
subtle corruption problems, and you just aren't getting gross disk
corruption and a system panic?

> When I have time I want to open the hood on the timeout period calculations,
> it seems like timeouts happen faster than they ought to given the comments
> in the source.  Maybe there are some pentium-90 incompatible assumptions.

I don't think it is timeout calculations.

I think the tape drive isn't disconnecting and the
SCSI bus is locked up.  As long as you ARE getting "disk retry" messages
you are OK, though.  If you aren't I think you may have problems.

Peter

-- 
Peter Dufault               Real Time Machine Control and Simulation
HD Associates, Inc.         Voice: 508 433 6936
dufault@hda.com             Fax:   508 433 5267