From owner-freebsd-hackers  Wed Feb 15 09:51:13 1995
Return-Path: hackers-owner
Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id JAA28474 for hackers-outgoing; Wed, 15 Feb 1995 09:51:13 -0800
Received: from warlock.win.net (warlock.win.net [198.30.130.3]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id JAA28466 for <freebsd-hackers@FreeBSD.ORG>; Wed, 15 Feb 1995 09:51:10 -0800
Received: (from bugs@localhost) by warlock.win.net (8.6.9/8.6.9) id MAA10044 for freebsd-hackers@FreeBSD.ORG; Wed, 15 Feb 1995 12:51:48 -0500
From: Mark Hittinger <bugs@warlock.win.net>
Message-Id: <199502151751.MAA10044@warlock.win.net>
Subject: long DAT tape rewind bit sprays disk
To: freebsd-hackers@FreeBSD.org
Date: Wed, 15 Feb 1995 12:51:46 -0500 (EST)
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 2669      
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

Hi -  

	I have been having a problem with the close/rewind function of
my scsi DAT tape drive.  It appears that the rewind is not being given
enough time to complete before the driver decides an abort condition
is in order.  This is one of those dds-2 16 gig jobber tapes so the
rewinds will take awhile.  This is happening with all SNAPS up through
the latest 2/10 snap.

	I have the BT946C controller.  If I get two "abort timeouts"
in a row while attempting to rewind/unload the tape my active disks
get bit sprayed.  :-)

	While chasing this down I've noticed a couple of things in
the bt742a.c driver that I wanted to ask about.

In routine bt_poll we call bt_timeout and then call untimeout.  I
note that inside bt_timeout we already called untimeout.  It looks
suspicious to me to have this dual call to untimeout.

---------------------------------------------------------------------
bt_poll
....
	if (count == 0) {
		/*
		 * We timed out, so call the timeout handler manually,
		 * accounting for the fact that the clock is not running yet
		 * by taking out the clock queue entry it makes.
		 */
		bt_timeout(ccb);

		/*
		 * because we are polling, take out the timeout entry
		 * bt_timeout made
		 */
		untimeout(bt_timeout, (caddr_t)ccb);
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ actually call #2
----------------------------------------------------------------------
bt_timeout()
....
	/*
         * A timeout routine in kernel DONOT unlink
	 * Entry chains when time outed....So infinity Loop..
         *                              94/04/20 amurai@spec.co.jp
         */
	untimeout(bt_timeout, (caddr_t)ccb);
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  actually call #1
---------------------------------------------------------------------

Finally further down in bt_timeout this code looks interesting:
....
		/* abort the operation that has timed out */
		printf("bt%d: Try to abort\n", unit);
		bt_send_mbo(unit, ~SCSI_NOMASK,
		    BT_MBO_ABORT, ccb);
		/* 2 secs for the abort */
		ccb->flags = CCB_ABORTED;
		timeout(bt_timeout, (caddr_t)ccb, 2 * hz);
						  ^^^^^^ (200 not 2000?)
	}


What I am doing now is mass NFS mounting every disk in the place on
my FreeBSD box.  Then tar'ing everything to SCSI DDS-2 using device
/dev/nrst0.  No rewind/unload attempt will be made when things are
complete.  I can then shutdown into single user mode, sync, halt.

If I attempt to rewind or unload the tape about 50% of the time the
system disk will get bit sprayed so I no longer try :-).

I am playing with longer timeouts ect but it does appear that two
abort timeouts in a row do some corruption of the ccbs.

Having fun!

Mark Hittinger
bugs@win.net