Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Sep 1996 00:34:07 -0400 (EDT)
From:      Gene Stark <gene@starkhome.cs.sunysb.edu>
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/1661: ft driver hangs uninterruptably at "bavail"
Message-ID:  <199609220434.AAA00362@starkhome.cs.sunysb.edu>
Resent-Message-ID: <199609220440.VAA25842@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         1661
>Category:       kern
>Synopsis:       ft driver hangs uninterruptably at "bavail"
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 21 21:40:01 PDT 1996
>Last-Modified:
>Originator:     Gene Stark
>Organization:
>Release:        FreeBSD 2.1-STABLE i386
>Environment:

	FreeBSD 2.1.5-STABLE.  Pentium 133MHz, ASUS M/B,
	Colorado 250 floppy tape drive running off controller
	on M/B.

	Problem was also experienced on a noname 486DX/33MHz
	system.

>Description:

	The command:

		dump 0uf - /A | gzip -c | ft

	causes the tape to spin for a few seconds, and then the
	driver hangs in uninterruptable sleep with ps showing "bavail".

	Without the gzip in the pipeline, the hang does not occur.

>How-To-Repeat:

		dump 0uf - /A | gzip -c | ft

>Fix:
	
	I am not real familiar with the ft driver, but here is my analysis
	after turning on debugging printout and thinking about it for
	awhile.  The messages from the test are at the end of this report.

	When the driver is opened, it does an initial read, followed
	by readahead that continues until the completion queue is
	filled up and the free queue is empty.  Then the tape is stopped.
	Shortly after this, the first blocks of data are ready to be written,
	but there are no buffers available as they have all been used
	up by readahead.  The driver then hangs in the sleep at line
	2279:

		/* Sleep until a buffer becomes available. */
		while (sp == NULL) {
			ftsleep(wc_buff_avail, 0);
			sp = segio_alloc(ft);
		}

	In my test, there is an "unexpected interrupt" message
	that occurs just before the hang.  It appears to be a completion
	interrupt from the seek operation that was started because
	the current position was unknown.  As near as I can tell,
	this is a separate problem that does not contribute to the hanging
	problem, as even if the interrupt were to be processed, I couldn't
	find any way that it would help to free the buffers in the completion
	queue to unwedge the write.

	I believe the hanging does not occur when the gzip is omitted,
	because the data arrives faster, and there is not time for the
	readahead to use up all the buffers before the write is initiated,
	canceling the readahead and freeing the buffers.

	When I modified the driver by adding code to free any buffers
	in the completion queue before a write was started, the hang
	did not occur.  Somebody that understands the driver better than
	me should check this out and see if my analysis and fix is
	reasonable.

	Here is a context diff:

*** ft.c.orig	Tue May 30 04:01:41 1995
--- ft.c	Sun Sep 22 00:04:43 1996
***************
*** 2260,2265 ****
--- 2260,2269 ----
    } else {
  	if (ft->segh != NULL && ft->segh->reqtype != FTIO_WRITING)
  		tape_inactive(ftu);
+ 
+ 	/* Clear readahead blocks from completion queue */
+ 	while (ft->doneh != NULL)
+ 	    segio_free(ft, ft->doneh);
  
  	/* Allocate a buffer and start tape if we're running low. */
  	sp = segio_alloc(ft);


(test messages follow)
-----------------
Sep 21 20:45:22 starkhome /kernel.test: tape_start start
Sep 21 20:45:22 starkhome /kernel.test: tape_recal start
Sep 21 20:45:23 starkhome /kernel.test: tape_recal end
Sep 21 20:45:23 starkhome /kernel.test: tape_start end
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 46
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:24 starkhome last message repeated 8 times
Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:24 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:24 starkhome last message repeated 8 times
Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:24 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 27
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 4
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:24 starkhome last message repeated 8 times
Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 30
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:25 starkhome last message repeated 8 times
Sep 21 20:45:25 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:25 starkhome last message repeated 8 times
Sep 21 20:45:25 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 8
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:26 starkhome last message repeated 8 times
Sep 21 20:45:26 starkhome /kernel.test: qic_status returned $d0
Sep 21 20:45:26 starkhome /kernel.test: ftgetgeom report config got $00d0
Sep 21 20:45:26 starkhome /kernel.test: Tape format is QIC-80, length is 307.5/550
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 18
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:26 starkhome last message repeated 8 times
Sep 21 20:45:26 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:26 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 11
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:27 starkhome last message repeated 8 times
Sep 21 20:45:27 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:27 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 13
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:27 starkhome last message repeated 8 times
Sep 21 20:45:27 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:28 starkhome last message repeated 8 times
Sep 21 20:45:28 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:28 starkhome last message repeated 8 times
Sep 21 20:45:28 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:29 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:29 starkhome last message repeated 8 times
Sep 21 20:45:29 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:29 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:29 starkhome /kernel.test: segio_alloc: nfree=7 ndone=0 nreq=0
Sep 21 20:45:29 starkhome /kernel.test: segio_queue: nfree=7 ndone=0 nreq=1
Sep 21 20:45:29 starkhome /kernel.test: Starting read I/O chain
Sep 21 20:45:29 starkhome /kernel.test: async_read ******STARTING TAPE
Sep 21 20:45:29 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:29 starkhome /kernel.test: async status got $0065 ($02cb)
Sep 21 20:45:29 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:29 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1
Sep 21 20:45:30 starkhome /kernel.test: Processing readahead reqblk = 32
Sep 21 20:45:30 starkhome /kernel.test: segio_free: nfree=7 ndone=0 nreq=1
Sep 21 20:45:30 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:30 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1
Sep 21 20:45:30 starkhome /kernel.test: Processing readahead reqblk = 64
Sep 21 20:45:30 starkhome /kernel.test: segio_free: nfree=7 ndone=0 nreq=1
Sep 21 20:45:31 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:0 h:0 s:77 pos:76 want:76
Sep 21 20:45:31 starkhome /kernel.test: ft0: CRC error on block 76
Sep 21 20:45:31 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:31 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1
Sep 21 20:45:31 starkhome /kernel.test: Processing readahead reqblk = 96
Sep 21 20:45:31 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:31 starkhome /kernel.test: segio_done: (r) nfree=6 ndone=2 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_alloc: nfree=5 ndone=2 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_queue: nfree=5 ndone=2 nreq=1
Sep 21 20:45:32 starkhome /kernel.test: Processing readahead reqblk = 128
Sep 21 20:45:32 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:1 h:0 s:26 pos:153 want:153
Sep 21 20:45:32 starkhome /kernel.test: ft0: CRC error on block 153
Sep 21 20:45:32 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:32 starkhome /kernel.test: segio_done: (r) nfree=5 ndone=3 nreq=0
Sep 21 20:45:32 starkhome /kernel.test: segio_alloc: nfree=4 ndone=3 nreq=0
Sep 21 20:45:32 starkhome /kernel.test: segio_queue: nfree=4 ndone=3 nreq=1
Sep 21 20:45:32 starkhome /kernel.test: Processing readahead reqblk = 160
Sep 21 20:45:32 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:1 h:0 s:43 pos:170 want:170
Sep 21 20:45:32 starkhome /kernel.test: ft0: CRC error on block 170
Sep 21 20:45:33 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:33 starkhome /kernel.test: segio_done: (r) nfree=4 ndone=4 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_alloc: nfree=3 ndone=4 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_queue: nfree=3 ndone=4 nreq=1
Sep 21 20:45:33 starkhome /kernel.test: Processing readahead reqblk = 192
Sep 21 20:45:33 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:33 starkhome /kernel.test: segio_done: (r) nfree=3 ndone=5 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_alloc: nfree=2 ndone=5 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_queue: nfree=2 ndone=5 nreq=1
Sep 21 20:45:33 starkhome /kernel.test: Processing readahead reqblk = 224
Sep 21 20:45:33 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:34 starkhome /kernel.test: segio_done: (r) nfree=2 ndone=6 nreq=0
Sep 21 20:45:34 starkhome /kernel.test: segio_alloc: nfree=1 ndone=6 nreq=0
Sep 21 20:45:34 starkhome /kernel.test: segio_queue: nfree=1 ndone=6 nreq=1
Sep 21 20:45:34 starkhome /kernel.test: Processing readahead reqblk = 256
Sep 21 20:45:34 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:2 h:0 s:21 pos:276 want:276
Sep 21 20:45:34 starkhome /kernel.test: ft0: CRC error on block 276
Sep 21 20:45:34 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:2 h:0 s:24 pos:279 want:279
Sep 21 20:45:34 starkhome /kernel.test: ft0: CRC error on block 279
Sep 21 20:45:34 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:34 starkhome /kernel.test: segio_done: (r) nfree=1 ndone=7 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: segio_alloc: nfree=0 ndone=7 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: segio_queue: nfree=0 ndone=7 nreq=1
Sep 21 20:45:35 starkhome /kernel.test: Processing readahead reqblk = 288
Sep 21 20:45:35 starkhome /kernel.test: Read done..  Cancel = 0
Sep 21 20:45:35 starkhome /kernel.test: segio_done: (r) nfree=0 ndone=8 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: segio_alloc: nfree=0 ndone=8 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: No more I/O.. Stopping.
Sep 21 20:45:35 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:35 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:35 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:35 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:36 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:36 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:36 starkhome /kernel.test: async status got $0025 ($024b)
Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:51 starkhome last message repeated 8 times
Sep 21 20:45:51 starkhome /kernel.test: qic_status returned $25
Sep 21 20:45:51 starkhome /kernel.test: tape_status got $0025
Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:52 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:52 starkhome last message repeated 8 times
Sep 21 20:45:52 starkhome /kernel.test: qic_status returned $25
Sep 21 20:45:52 starkhome /kernel.test: tape_status got $0025
Sep 21 20:45:52 starkhome /kernel.test: segio_alloc: nfree=0 ndone=8 nreq=0
Sep 21 20:45:52 starkhome /kernel.test: Starting write I/O chain
Sep 21 20:45:52 starkhome /kernel.test: ft0: position unknown: lastpos:-2 ft->xblk:1952805733
Sep 21 20:45:52 starkhome /kernel.test: ft0: unexpected interrupt; st0 = $20 pcn = 20



>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609220434.AAA00362>