Date: Sun, 22 Sep 1996 00:34:07 -0400 (EDT) From: Gene Stark <gene@starkhome.cs.sunysb.edu> To: FreeBSD-gnats-submit@freebsd.org Subject: kern/1661: ft driver hangs uninterruptably at "bavail" Message-ID: <199609220434.AAA00362@starkhome.cs.sunysb.edu> Resent-Message-ID: <199609220440.VAA25842@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 1661
>Category: kern
>Synopsis: ft driver hangs uninterruptably at "bavail"
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Sep 21 21:40:01 PDT 1996
>Last-Modified:
>Originator: Gene Stark
>Organization:
>Release: FreeBSD 2.1-STABLE i386
>Environment:
FreeBSD 2.1.5-STABLE. Pentium 133MHz, ASUS M/B,
Colorado 250 floppy tape drive running off controller
on M/B.
Problem was also experienced on a noname 486DX/33MHz
system.
>Description:
The command:
dump 0uf - /A | gzip -c | ft
causes the tape to spin for a few seconds, and then the
driver hangs in uninterruptable sleep with ps showing "bavail".
Without the gzip in the pipeline, the hang does not occur.
>How-To-Repeat:
dump 0uf - /A | gzip -c | ft
>Fix:
I am not real familiar with the ft driver, but here is my analysis
after turning on debugging printout and thinking about it for
awhile. The messages from the test are at the end of this report.
When the driver is opened, it does an initial read, followed
by readahead that continues until the completion queue is
filled up and the free queue is empty. Then the tape is stopped.
Shortly after this, the first blocks of data are ready to be written,
but there are no buffers available as they have all been used
up by readahead. The driver then hangs in the sleep at line
2279:
/* Sleep until a buffer becomes available. */
while (sp == NULL) {
ftsleep(wc_buff_avail, 0);
sp = segio_alloc(ft);
}
In my test, there is an "unexpected interrupt" message
that occurs just before the hang. It appears to be a completion
interrupt from the seek operation that was started because
the current position was unknown. As near as I can tell,
this is a separate problem that does not contribute to the hanging
problem, as even if the interrupt were to be processed, I couldn't
find any way that it would help to free the buffers in the completion
queue to unwedge the write.
I believe the hanging does not occur when the gzip is omitted,
because the data arrives faster, and there is not time for the
readahead to use up all the buffers before the write is initiated,
canceling the readahead and freeing the buffers.
When I modified the driver by adding code to free any buffers
in the completion queue before a write was started, the hang
did not occur. Somebody that understands the driver better than
me should check this out and see if my analysis and fix is
reasonable.
Here is a context diff:
*** ft.c.orig Tue May 30 04:01:41 1995
--- ft.c Sun Sep 22 00:04:43 1996
***************
*** 2260,2265 ****
--- 2260,2269 ----
} else {
if (ft->segh != NULL && ft->segh->reqtype != FTIO_WRITING)
tape_inactive(ftu);
+
+ /* Clear readahead blocks from completion queue */
+ while (ft->doneh != NULL)
+ segio_free(ft, ft->doneh);
/* Allocate a buffer and start tape if we're running low. */
sp = segio_alloc(ft);
(test messages follow)
-----------------
Sep 21 20:45:22 starkhome /kernel.test: tape_start start
Sep 21 20:45:22 starkhome /kernel.test: tape_recal start
Sep 21 20:45:23 starkhome /kernel.test: tape_recal end
Sep 21 20:45:23 starkhome /kernel.test: tape_start end
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 46
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:24 starkhome last message repeated 8 times
Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:24 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:24 starkhome last message repeated 8 times
Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:24 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 27
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 4
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:24 starkhome last message repeated 8 times
Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 30
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:25 starkhome last message repeated 8 times
Sep 21 20:45:25 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:25 starkhome last message repeated 8 times
Sep 21 20:45:25 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 8
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:26 starkhome last message repeated 8 times
Sep 21 20:45:26 starkhome /kernel.test: qic_status returned $d0
Sep 21 20:45:26 starkhome /kernel.test: ftgetgeom report config got $00d0
Sep 21 20:45:26 starkhome /kernel.test: Tape format is QIC-80, length is 307.5/550
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 18
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:26 starkhome last message repeated 8 times
Sep 21 20:45:26 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:26 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 11
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:27 starkhome last message repeated 8 times
Sep 21 20:45:27 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:27 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 13
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:27 starkhome last message repeated 8 times
Sep 21 20:45:27 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:28 starkhome last message repeated 8 times
Sep 21 20:45:28 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:28 starkhome last message repeated 8 times
Sep 21 20:45:28 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:29 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:29 starkhome last message repeated 8 times
Sep 21 20:45:29 starkhome /kernel.test: qic_status returned $65
Sep 21 20:45:29 starkhome /kernel.test: tape_status got $0065
Sep 21 20:45:29 starkhome /kernel.test: segio_alloc: nfree=7 ndone=0 nreq=0
Sep 21 20:45:29 starkhome /kernel.test: segio_queue: nfree=7 ndone=0 nreq=1
Sep 21 20:45:29 starkhome /kernel.test: Starting read I/O chain
Sep 21 20:45:29 starkhome /kernel.test: async_read ******STARTING TAPE
Sep 21 20:45:29 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:29 starkhome /kernel.test: async status got $0065 ($02cb)
Sep 21 20:45:29 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:29 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1
Sep 21 20:45:30 starkhome /kernel.test: Processing readahead reqblk = 32
Sep 21 20:45:30 starkhome /kernel.test: segio_free: nfree=7 ndone=0 nreq=1
Sep 21 20:45:30 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:30 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0
Sep 21 20:45:30 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1
Sep 21 20:45:30 starkhome /kernel.test: Processing readahead reqblk = 64
Sep 21 20:45:30 starkhome /kernel.test: segio_free: nfree=7 ndone=0 nreq=1
Sep 21 20:45:31 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:0 h:0 s:77 pos:76 want:76
Sep 21 20:45:31 starkhome /kernel.test: ft0: CRC error on block 76
Sep 21 20:45:31 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:31 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1
Sep 21 20:45:31 starkhome /kernel.test: Processing readahead reqblk = 96
Sep 21 20:45:31 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:31 starkhome /kernel.test: segio_done: (r) nfree=6 ndone=2 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_alloc: nfree=5 ndone=2 nreq=0
Sep 21 20:45:31 starkhome /kernel.test: segio_queue: nfree=5 ndone=2 nreq=1
Sep 21 20:45:32 starkhome /kernel.test: Processing readahead reqblk = 128
Sep 21 20:45:32 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:1 h:0 s:26 pos:153 want:153
Sep 21 20:45:32 starkhome /kernel.test: ft0: CRC error on block 153
Sep 21 20:45:32 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:32 starkhome /kernel.test: segio_done: (r) nfree=5 ndone=3 nreq=0
Sep 21 20:45:32 starkhome /kernel.test: segio_alloc: nfree=4 ndone=3 nreq=0
Sep 21 20:45:32 starkhome /kernel.test: segio_queue: nfree=4 ndone=3 nreq=1
Sep 21 20:45:32 starkhome /kernel.test: Processing readahead reqblk = 160
Sep 21 20:45:32 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:1 h:0 s:43 pos:170 want:170
Sep 21 20:45:32 starkhome /kernel.test: ft0: CRC error on block 170
Sep 21 20:45:33 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:33 starkhome /kernel.test: segio_done: (r) nfree=4 ndone=4 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_alloc: nfree=3 ndone=4 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_queue: nfree=3 ndone=4 nreq=1
Sep 21 20:45:33 starkhome /kernel.test: Processing readahead reqblk = 192
Sep 21 20:45:33 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:33 starkhome /kernel.test: segio_done: (r) nfree=3 ndone=5 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_alloc: nfree=2 ndone=5 nreq=0
Sep 21 20:45:33 starkhome /kernel.test: segio_queue: nfree=2 ndone=5 nreq=1
Sep 21 20:45:33 starkhome /kernel.test: Processing readahead reqblk = 224
Sep 21 20:45:33 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:34 starkhome /kernel.test: segio_done: (r) nfree=2 ndone=6 nreq=0
Sep 21 20:45:34 starkhome /kernel.test: segio_alloc: nfree=1 ndone=6 nreq=0
Sep 21 20:45:34 starkhome /kernel.test: segio_queue: nfree=1 ndone=6 nreq=1
Sep 21 20:45:34 starkhome /kernel.test: Processing readahead reqblk = 256
Sep 21 20:45:34 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:2 h:0 s:21 pos:276 want:276
Sep 21 20:45:34 starkhome /kernel.test: ft0: CRC error on block 276
Sep 21 20:45:34 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:2 h:0 s:24 pos:279 want:279
Sep 21 20:45:34 starkhome /kernel.test: ft0: CRC error on block 279
Sep 21 20:45:34 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:34 starkhome /kernel.test: segio_done: (r) nfree=1 ndone=7 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: segio_alloc: nfree=0 ndone=7 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: segio_queue: nfree=0 ndone=7 nreq=1
Sep 21 20:45:35 starkhome /kernel.test: Processing readahead reqblk = 288
Sep 21 20:45:35 starkhome /kernel.test: Read done.. Cancel = 0
Sep 21 20:45:35 starkhome /kernel.test: segio_done: (r) nfree=0 ndone=8 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: segio_alloc: nfree=0 ndone=8 nreq=0
Sep 21 20:45:35 starkhome /kernel.test: No more I/O.. Stopping.
Sep 21 20:45:35 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:35 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:35 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:35 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:36 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:36 starkhome /kernel.test: async status got $0024 ($0249)
Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8
Sep 21 20:45:36 starkhome /kernel.test: async status got $0025 ($024b)
Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:51 starkhome last message repeated 8 times
Sep 21 20:45:51 starkhome /kernel.test: qic_status returned $25
Sep 21 20:45:51 starkhome /kernel.test: tape_status got $0025
Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 6
Sep 21 20:45:52 starkhome /kernel.test: ===> tape_cmd: 2
Sep 21 20:45:52 starkhome last message repeated 8 times
Sep 21 20:45:52 starkhome /kernel.test: qic_status returned $25
Sep 21 20:45:52 starkhome /kernel.test: tape_status got $0025
Sep 21 20:45:52 starkhome /kernel.test: segio_alloc: nfree=0 ndone=8 nreq=0
Sep 21 20:45:52 starkhome /kernel.test: Starting write I/O chain
Sep 21 20:45:52 starkhome /kernel.test: ft0: position unknown: lastpos:-2 ft->xblk:1952805733
Sep 21 20:45:52 starkhome /kernel.test: ft0: unexpected interrupt; st0 = $20 pcn = 20
>Audit-Trail:
>Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609220434.AAA00362>
