From owner-freebsd-hackers Mon May 22 05:01:25 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id FAA22821 for hackers-outgoing; Mon, 22 May 1995 05:01:25 -0700 Received: from hda.com (hda.com [199.232.40.182]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id FAA22814 for ; Mon, 22 May 1995 05:01:22 -0700 Received: (dufault@localhost) by hda.com (8.6.9/8.3) id IAA20247; Mon, 22 May 1995 08:01:43 -0400 From: Peter Dufault Message-Id: <199505221201.IAA20247@hda.com> Subject: Re: kern/430: bug in tape drivers (fwd) To: bugs@ns1.win.net (Mark Hittinger) Date: Mon, 22 May 1995 08:01:43 -0400 (EDT) Cc: hackers@FreeBSD.org In-Reply-To: <199505220441.AAA03138@ns1.win.net> from "Mark Hittinger" at May 22, 95 00:41:17 am X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 2789 Sender: hackers-owner@FreeBSD.org Precedence: bulk Mark Hittinger writes: > > > >You get more than one "bt0: Try to abort" messages? That > > Yes, and they are repeated attempts to abort the rewind command I > believe. I don't think so; see the code fragment below. As far as I can see you should get only one "Try to abort" message per aborted SCSI transaction. > > > > More than one "bt0 Try to abort" means try to abort a previous abort > > command. So I guess SCSI bus itself stuck up entirely. Check keep light > > on borad LED or not. > > All disk i/o waits until the rewind finishes. Disk activity light is lit > during entire rewind. Once rewind finishes then everything continues > with no trouble. Check the code: > if (ccb->flags == CCB_ABORTED) { > /* > * abort timed out > */ > printf("bt%d: Abort Operation has timed out\n", unit); > ccb->xfer->retries = 0; /* I MEAN IT ! */ > ccb->host_stat = BT_ABORTED; > bt_done(unit, ccb); > } else { > /* abort the operation that has timed out */ > printf("bt%d: Try to abort\n", unit); > bt_send_mbo(unit, ~SCSI_NOMASK, > BT_MBO_ABORT, ccb); > /* 2 secs for the abort */ > ccb->flags = CCB_ABORTED; > timeout(bt_timeout, (caddr_t)ccb, 2 * hz); > } It sets the CCB_ABORTED flag the first time, so you should only get one "Try to abort" message per aborted transaction. More than one message means more than one aborted transaction. We ought to print out the ccb with %p. > > > > > >I'm not sure what your work around does: you end up stretching out > > >the "Try to abort" time until the drive finishes and "unlocks" > > >the host adapter. So you've tried to abort a few transfers. Did they > > >abort? I don't know. Do you wind up getting a disk retry per > > >abort message after this? > > The workaround (a gross and sleazy kludge I admit :-) ) lets me do backups > without bitspraying my disks! Are you sure? Which other transfers were aborted? Were they retried? Are you sure you don't have subtle corruption problems, and you just aren't getting gross disk corruption and a system panic? > When I have time I want to open the hood on the timeout period calculations, > it seems like timeouts happen faster than they ought to given the comments > in the source. Maybe there are some pentium-90 incompatible assumptions. I don't think it is timeout calculations. I think the tape drive isn't disconnecting and the SCSI bus is locked up. As long as you ARE getting "disk retry" messages you are OK, though. If you aren't I think you may have problems. Peter -- Peter Dufault Real Time Machine Control and Simulation HD Associates, Inc. Voice: 508 433 6936 dufault@hda.com Fax: 508 433 5267