From owner-freebsd-bugs Sat Sep 21 21:40:06 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA25890 for bugs-outgoing; Sat, 21 Sep 1996 21:40:06 -0700 (PDT) Received: (from gnats@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA25842; Sat, 21 Sep 1996 21:40:02 -0700 (PDT) Resent-Date: Sat, 21 Sep 1996 21:40:02 -0700 (PDT) Resent-Message-Id: <199609220440.VAA25842@freefall.freebsd.org> Resent-From: gnats (GNATS Management) Resent-To: freebsd-bugs Resent-Reply-To: FreeBSD-gnats@freefall.FreeBSD.org, gene@starkhome.cs.sunysb.edu Received: from bsd7.cs.sunysb.edu (bsd7.cs.sunysb.edu [130.245.1.197]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id VAA22411 for ; Sat, 21 Sep 1996 21:35:13 -0700 (PDT) Received: (from uucp@localhost) by bsd7.cs.sunysb.edu (8.6.12/8.6.9) with UUCP id AAA00125 for FreeBSD-gnats-submit@freebsd.org; Sun, 22 Sep 1996 00:35:02 -0400 Received: (from gene@localhost) by starkhome.cs.sunysb.edu (8.7.5/8.6.9) id AAA00362; Sun, 22 Sep 1996 00:34:07 -0400 (EDT) Message-Id: <199609220434.AAA00362@starkhome.cs.sunysb.edu> Date: Sun, 22 Sep 1996 00:34:07 -0400 (EDT) From: Gene Stark Reply-To: gene@starkhome.cs.sunysb.edu To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.2 Subject: kern/1661: ft driver hangs uninterruptably at "bavail" Sender: owner-bugs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >Number: 1661 >Category: kern >Synopsis: ft driver hangs uninterruptably at "bavail" >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Sep 21 21:40:01 PDT 1996 >Last-Modified: >Originator: Gene Stark >Organization: >Release: FreeBSD 2.1-STABLE i386 >Environment: FreeBSD 2.1.5-STABLE. Pentium 133MHz, ASUS M/B, Colorado 250 floppy tape drive running off controller on M/B. Problem was also experienced on a noname 486DX/33MHz system. >Description: The command: dump 0uf - /A | gzip -c | ft causes the tape to spin for a few seconds, and then the driver hangs in uninterruptable sleep with ps showing "bavail". Without the gzip in the pipeline, the hang does not occur. >How-To-Repeat: dump 0uf - /A | gzip -c | ft >Fix: I am not real familiar with the ft driver, but here is my analysis after turning on debugging printout and thinking about it for awhile. The messages from the test are at the end of this report. When the driver is opened, it does an initial read, followed by readahead that continues until the completion queue is filled up and the free queue is empty. Then the tape is stopped. Shortly after this, the first blocks of data are ready to be written, but there are no buffers available as they have all been used up by readahead. The driver then hangs in the sleep at line 2279: /* Sleep until a buffer becomes available. */ while (sp == NULL) { ftsleep(wc_buff_avail, 0); sp = segio_alloc(ft); } In my test, there is an "unexpected interrupt" message that occurs just before the hang. It appears to be a completion interrupt from the seek operation that was started because the current position was unknown. As near as I can tell, this is a separate problem that does not contribute to the hanging problem, as even if the interrupt were to be processed, I couldn't find any way that it would help to free the buffers in the completion queue to unwedge the write. I believe the hanging does not occur when the gzip is omitted, because the data arrives faster, and there is not time for the readahead to use up all the buffers before the write is initiated, canceling the readahead and freeing the buffers. When I modified the driver by adding code to free any buffers in the completion queue before a write was started, the hang did not occur. Somebody that understands the driver better than me should check this out and see if my analysis and fix is reasonable. Here is a context diff: *** ft.c.orig Tue May 30 04:01:41 1995 --- ft.c Sun Sep 22 00:04:43 1996 *************** *** 2260,2265 **** --- 2260,2269 ---- } else { if (ft->segh != NULL && ft->segh->reqtype != FTIO_WRITING) tape_inactive(ftu); + + /* Clear readahead blocks from completion queue */ + while (ft->doneh != NULL) + segio_free(ft, ft->doneh); /* Allocate a buffer and start tape if we're running low. */ sp = segio_alloc(ft); (test messages follow) ----------------- Sep 21 20:45:22 starkhome /kernel.test: tape_start start Sep 21 20:45:22 starkhome /kernel.test: tape_recal start Sep 21 20:45:23 starkhome /kernel.test: tape_recal end Sep 21 20:45:23 starkhome /kernel.test: tape_start end Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 46 Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:23 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:24 starkhome last message repeated 8 times Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:24 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:24 starkhome last message repeated 8 times Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:24 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 27 Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 4 Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:24 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:24 starkhome last message repeated 8 times Sep 21 20:45:24 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 30 Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:25 starkhome last message repeated 8 times Sep 21 20:45:25 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:25 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:25 starkhome last message repeated 8 times Sep 21 20:45:25 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:25 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 8 Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:26 starkhome last message repeated 8 times Sep 21 20:45:26 starkhome /kernel.test: qic_status returned $d0 Sep 21 20:45:26 starkhome /kernel.test: ftgetgeom report config got $00d0 Sep 21 20:45:26 starkhome /kernel.test: Tape format is QIC-80, length is 307.5/550 Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 18 Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:26 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:26 starkhome last message repeated 8 times Sep 21 20:45:26 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:26 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 11 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:27 starkhome last message repeated 8 times Sep 21 20:45:27 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:27 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 13 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:27 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:27 starkhome last message repeated 8 times Sep 21 20:45:27 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:28 starkhome last message repeated 8 times Sep 21 20:45:28 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:28 starkhome last message repeated 8 times Sep 21 20:45:28 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:28 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:28 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:29 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:29 starkhome last message repeated 8 times Sep 21 20:45:29 starkhome /kernel.test: qic_status returned $65 Sep 21 20:45:29 starkhome /kernel.test: tape_status got $0065 Sep 21 20:45:29 starkhome /kernel.test: segio_alloc: nfree=7 ndone=0 nreq=0 Sep 21 20:45:29 starkhome /kernel.test: segio_queue: nfree=7 ndone=0 nreq=1 Sep 21 20:45:29 starkhome /kernel.test: Starting read I/O chain Sep 21 20:45:29 starkhome /kernel.test: async_read ******STARTING TAPE Sep 21 20:45:29 starkhome /kernel.test: async_status got cmd = 6 nbits = 8 Sep 21 20:45:29 starkhome /kernel.test: async status got $0065 ($02cb) Sep 21 20:45:29 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:29 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0 Sep 21 20:45:30 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0 Sep 21 20:45:30 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1 Sep 21 20:45:30 starkhome /kernel.test: Processing readahead reqblk = 32 Sep 21 20:45:30 starkhome /kernel.test: segio_free: nfree=7 ndone=0 nreq=1 Sep 21 20:45:30 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:30 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0 Sep 21 20:45:30 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0 Sep 21 20:45:30 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1 Sep 21 20:45:30 starkhome /kernel.test: Processing readahead reqblk = 64 Sep 21 20:45:30 starkhome /kernel.test: segio_free: nfree=7 ndone=0 nreq=1 Sep 21 20:45:31 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:0 h:0 s:77 pos:76 want:76 Sep 21 20:45:31 starkhome /kernel.test: ft0: CRC error on block 76 Sep 21 20:45:31 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:31 starkhome /kernel.test: segio_done: (r) nfree=7 ndone=1 nreq=0 Sep 21 20:45:31 starkhome /kernel.test: segio_alloc: nfree=6 ndone=1 nreq=0 Sep 21 20:45:31 starkhome /kernel.test: segio_queue: nfree=6 ndone=1 nreq=1 Sep 21 20:45:31 starkhome /kernel.test: Processing readahead reqblk = 96 Sep 21 20:45:31 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:31 starkhome /kernel.test: segio_done: (r) nfree=6 ndone=2 nreq=0 Sep 21 20:45:31 starkhome /kernel.test: segio_alloc: nfree=5 ndone=2 nreq=0 Sep 21 20:45:31 starkhome /kernel.test: segio_queue: nfree=5 ndone=2 nreq=1 Sep 21 20:45:32 starkhome /kernel.test: Processing readahead reqblk = 128 Sep 21 20:45:32 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:1 h:0 s:26 pos:153 want:153 Sep 21 20:45:32 starkhome /kernel.test: ft0: CRC error on block 153 Sep 21 20:45:32 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:32 starkhome /kernel.test: segio_done: (r) nfree=5 ndone=3 nreq=0 Sep 21 20:45:32 starkhome /kernel.test: segio_alloc: nfree=4 ndone=3 nreq=0 Sep 21 20:45:32 starkhome /kernel.test: segio_queue: nfree=4 ndone=3 nreq=1 Sep 21 20:45:32 starkhome /kernel.test: Processing readahead reqblk = 160 Sep 21 20:45:32 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:1 h:0 s:43 pos:170 want:170 Sep 21 20:45:32 starkhome /kernel.test: ft0: CRC error on block 170 Sep 21 20:45:33 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:33 starkhome /kernel.test: segio_done: (r) nfree=4 ndone=4 nreq=0 Sep 21 20:45:33 starkhome /kernel.test: segio_alloc: nfree=3 ndone=4 nreq=0 Sep 21 20:45:33 starkhome /kernel.test: segio_queue: nfree=3 ndone=4 nreq=1 Sep 21 20:45:33 starkhome /kernel.test: Processing readahead reqblk = 192 Sep 21 20:45:33 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:33 starkhome /kernel.test: segio_done: (r) nfree=3 ndone=5 nreq=0 Sep 21 20:45:33 starkhome /kernel.test: segio_alloc: nfree=2 ndone=5 nreq=0 Sep 21 20:45:33 starkhome /kernel.test: segio_queue: nfree=2 ndone=5 nreq=1 Sep 21 20:45:33 starkhome /kernel.test: Processing readahead reqblk = 224 Sep 21 20:45:33 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:34 starkhome /kernel.test: segio_done: (r) nfree=2 ndone=6 nreq=0 Sep 21 20:45:34 starkhome /kernel.test: segio_alloc: nfree=1 ndone=6 nreq=0 Sep 21 20:45:34 starkhome /kernel.test: segio_queue: nfree=1 ndone=6 nreq=1 Sep 21 20:45:34 starkhome /kernel.test: Processing readahead reqblk = 256 Sep 21 20:45:34 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:2 h:0 s:21 pos:276 want:276 Sep 21 20:45:34 starkhome /kernel.test: ft0: CRC error on block 276 Sep 21 20:45:34 starkhome /kernel.test: xd: st0:40 st1:20 st2:20 c:2 h:0 s:24 pos:279 want:279 Sep 21 20:45:34 starkhome /kernel.test: ft0: CRC error on block 279 Sep 21 20:45:34 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:34 starkhome /kernel.test: segio_done: (r) nfree=1 ndone=7 nreq=0 Sep 21 20:45:35 starkhome /kernel.test: segio_alloc: nfree=0 ndone=7 nreq=0 Sep 21 20:45:35 starkhome /kernel.test: segio_queue: nfree=0 ndone=7 nreq=1 Sep 21 20:45:35 starkhome /kernel.test: Processing readahead reqblk = 288 Sep 21 20:45:35 starkhome /kernel.test: Read done.. Cancel = 0 Sep 21 20:45:35 starkhome /kernel.test: segio_done: (r) nfree=0 ndone=8 nreq=0 Sep 21 20:45:35 starkhome /kernel.test: segio_alloc: nfree=0 ndone=8 nreq=0 Sep 21 20:45:35 starkhome /kernel.test: No more I/O.. Stopping. Sep 21 20:45:35 starkhome /kernel.test: async_status got cmd = 6 nbits = 8 Sep 21 20:45:35 starkhome /kernel.test: async status got $0024 ($0249) Sep 21 20:45:35 starkhome /kernel.test: async_status got cmd = 6 nbits = 8 Sep 21 20:45:35 starkhome /kernel.test: async status got $0024 ($0249) Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8 Sep 21 20:45:36 starkhome /kernel.test: async status got $0024 ($0249) Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8 Sep 21 20:45:36 starkhome /kernel.test: async status got $0024 ($0249) Sep 21 20:45:36 starkhome /kernel.test: async_status got cmd = 6 nbits = 8 Sep 21 20:45:36 starkhome /kernel.test: async status got $0025 ($024b) Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:51 starkhome last message repeated 8 times Sep 21 20:45:51 starkhome /kernel.test: qic_status returned $25 Sep 21 20:45:51 starkhome /kernel.test: tape_status got $0025 Sep 21 20:45:51 starkhome /kernel.test: ===> tape_cmd: 6 Sep 21 20:45:52 starkhome /kernel.test: ===> tape_cmd: 2 Sep 21 20:45:52 starkhome last message repeated 8 times Sep 21 20:45:52 starkhome /kernel.test: qic_status returned $25 Sep 21 20:45:52 starkhome /kernel.test: tape_status got $0025 Sep 21 20:45:52 starkhome /kernel.test: segio_alloc: nfree=0 ndone=8 nreq=0 Sep 21 20:45:52 starkhome /kernel.test: Starting write I/O chain Sep 21 20:45:52 starkhome /kernel.test: ft0: position unknown: lastpos:-2 ft->xblk:1952805733 Sep 21 20:45:52 starkhome /kernel.test: ft0: unexpected interrupt; st0 = $20 pcn = 20 >Audit-Trail: >Unformatted: