From owner-freebsd-hackers Wed Feb 15 09:51:13 1995 Return-Path: hackers-owner Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id JAA28474 for hackers-outgoing; Wed, 15 Feb 1995 09:51:13 -0800 Received: from warlock.win.net (warlock.win.net [198.30.130.3]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id JAA28466 for ; Wed, 15 Feb 1995 09:51:10 -0800 Received: (from bugs@localhost) by warlock.win.net (8.6.9/8.6.9) id MAA10044 for freebsd-hackers@FreeBSD.ORG; Wed, 15 Feb 1995 12:51:48 -0500 From: Mark Hittinger Message-Id: <199502151751.MAA10044@warlock.win.net> Subject: long DAT tape rewind bit sprays disk To: freebsd-hackers@FreeBSD.org Date: Wed, 15 Feb 1995 12:51:46 -0500 (EST) X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 2669 Sender: hackers-owner@FreeBSD.org Precedence: bulk Hi - I have been having a problem with the close/rewind function of my scsi DAT tape drive. It appears that the rewind is not being given enough time to complete before the driver decides an abort condition is in order. This is one of those dds-2 16 gig jobber tapes so the rewinds will take awhile. This is happening with all SNAPS up through the latest 2/10 snap. I have the BT946C controller. If I get two "abort timeouts" in a row while attempting to rewind/unload the tape my active disks get bit sprayed. :-) While chasing this down I've noticed a couple of things in the bt742a.c driver that I wanted to ask about. In routine bt_poll we call bt_timeout and then call untimeout. I note that inside bt_timeout we already called untimeout. It looks suspicious to me to have this dual call to untimeout. --------------------------------------------------------------------- bt_poll .... if (count == 0) { /* * We timed out, so call the timeout handler manually, * accounting for the fact that the clock is not running yet * by taking out the clock queue entry it makes. */ bt_timeout(ccb); /* * because we are polling, take out the timeout entry * bt_timeout made */ untimeout(bt_timeout, (caddr_t)ccb); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ actually call #2 ---------------------------------------------------------------------- bt_timeout() .... /* * A timeout routine in kernel DONOT unlink * Entry chains when time outed....So infinity Loop.. * 94/04/20 amurai@spec.co.jp */ untimeout(bt_timeout, (caddr_t)ccb); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ actually call #1 --------------------------------------------------------------------- Finally further down in bt_timeout this code looks interesting: .... /* abort the operation that has timed out */ printf("bt%d: Try to abort\n", unit); bt_send_mbo(unit, ~SCSI_NOMASK, BT_MBO_ABORT, ccb); /* 2 secs for the abort */ ccb->flags = CCB_ABORTED; timeout(bt_timeout, (caddr_t)ccb, 2 * hz); ^^^^^^ (200 not 2000?) } What I am doing now is mass NFS mounting every disk in the place on my FreeBSD box. Then tar'ing everything to SCSI DDS-2 using device /dev/nrst0. No rewind/unload attempt will be made when things are complete. I can then shutdown into single user mode, sync, halt. If I attempt to rewind or unload the tape about 50% of the time the system disk will get bit sprayed so I no longer try :-). I am playing with longer timeouts ect but it does appear that two abort timeouts in a row do some corruption of the ccbs. Having fun! Mark Hittinger bugs@win.net