From owner-freebsd-bugs Thu May 13 13:10:17 1999 Delivered-To: freebsd-bugs@freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (Postfix) with ESMTP id E05E31504C for ; Thu, 13 May 1999 13:10:03 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.9.3/8.9.2) id NAA55071; Thu, 13 May 1999 13:10:03 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: from midten.fast.no (midten.fast.no [195.139.251.11]) by hub.freebsd.org (Postfix) with ESMTP id 03C3914FC8 for ; Thu, 13 May 1999 13:06:42 -0700 (PDT) (envelope-from tegge@not.fast.no) Received: from not.fast.no (IDENT:tegge@not.fast.no [195.139.251.12]) by midten.fast.no (8.9.1/8.9.1) with ESMTP id WAA76064 for ; Thu, 13 May 1999 22:06:41 +0200 (CEST) Received: (from tegge@localhost) by not.fast.no (8.9.3/8.8.8) id WAA59935; Thu, 13 May 1999 22:06:41 +0200 (CEST) (envelope-from tegge@not.fast.no) Message-Id: <199905132006.WAA59935@not.fast.no> Date: Thu, 13 May 1999 22:06:41 +0200 (CEST) From: Tor Egge Reply-To: tegge@not.fast.no To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.2 Subject: kern/11697: Disk failure hangs system Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 11697 >Category: kern >Synopsis: Disk failure hangs system >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu May 13 13:10:02 PDT 1999 >Closed-Date: >Last-Modified: >Originator: Tor Egge >Release: FreeBSD 3.1-STABLE i386 >Organization: Fast Search & Transfer ASA >Environment: FreeBSD 3.1-STABLE #0: Sat May 1 19:00:19 CEST 1999 root@response.fast.no:/usr/src/sys/compile/INDEX_SMP_SERIAL_DDB i386 ahc1: rev 0x00 int a irq 17 on pci0.14.0 ahc1: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs da13 at ahc1 bus 0 target 9 lun 0 da13: Fixed Direct Access SCSI-2 device da13: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled da13: 17366MB (35566499 512 byte sectors: 255H 63S/T 2213C) >Description: ---------------------- Unexpected busfree. LASTPHASE == 0x80 SEQADDR == 0x15b (da13:ahc1:0:9:0): Invalidating pack (da13:ahc1:0:9:0): Invalidating pack (da13:ahc1:0:9:0): Invalidating pack vm_fault: pager read error, pid 63486 (mkserv) (da13:ahc1:0:9:0): Invalidating pack Stopped at siointr1+0x6d: jmp siointr1+0x159 db> trace siointr1(e3c8d800,e02890b0,0,f2e0da2c,e0206144) at siointr1+0x6d siointr(0,f2e00010,0,1,e0289014) at siointr+0x1d Xfastintr4(ebd13528,e3e12800,ebd13528,c8000040,e0e7e8c8) at Xfastintr4+0x24 biodone(ebd13528,ebd13528,ebd13528,c8000040,e3e08000) at biodone+0x2d0 dastrategy(ebd13528,200202b4,f2e0daa8,e018167d,f2e0dacc) at dastrategy+0xab spec_strategy(f2e0dacc,f2e0dab4,e01e73a9,f2e0dacc,f2e0dad8) at spec_strategy+0x3e spec_vnoperate(f2e0dacc,f2e0dad8,e016d46f,f2e0dacc,2000) at spec_vnoperate+0x15 ufs_vnoperatespec(f2e0dacc) at ufs_vnoperatespec+0x15 bwrite(ebd13528,f2e0daf0,e0171879,f2e0db34,f2e0dafc) at bwrite+0xaf vop_stdbwrite(f2e0db34,f2e0dafc,e018167d,f2e0db34,f2e0db08) at vop_stdbwrite+0xe vop_defaultop(f2e0db34,f2e0db08,e01e73a9,f2e0db34,f2e0db3c) at vop_defaultop+0x15 spec_vnoperate(f2e0db34,f2e0db3c,e016de03,f2e0db34,200) at spec_vnoperate+0x15 ufs_vnoperatespec(f2e0db34,200,ebd13528,1,0) at ufs_vnoperatespec+0x15 vfs_bio_awrite(ebd13528,200,a200a000,1,f2e00010) at vfs_bio_awrite+0x103 getnewbuf(f1cea900,d10050,0,0,2000) at getnewbuf+0x2ec getblk(f1cea900,d10050,2000,0,0) at getblk+0x244 bread(f1cea900,d10050,2000,0,f2e0dc48) at bread+0x21 ffs_vget(e3e8c200,54ee7,f2e0dccc,f283ee40,f2e0df14) at ffs_vget+0x1bc ufs_lookup(f2e0dd24,f2e0dd38,e017055c,f2e0dd24,f3009c47) at ufs_lookup+0x936 ufs_vnoperate(f2e0dd24,f3009c47,f283ee40,f2e0df14,0) at ufs_vnoperate+0x15 vfs_cache_lookup(f2e0dd80,f2e0dd90,e01729fd,f2e0dd80,f1c6ce00) at vfs_cache_lookup+0x248 ufs_vnoperate(f2e0dd80,f1c6ce00,f2e0df14,f2e0def0,0) at ufs_vnoperate+0x15 lookup(f2e0def0,0,f2e0df84,f2e0def0,7273752f) at lookup+0x2c1 namei(f2e0def0,0,f2e0df84,f2d5c840,286) at namei+0x133 vn_open(f2e0def0,3,584,f2d5c840,e0254064) at vn_open+0x1f6 open(f2d5c840,f2e0df84,dfbfd594,dfbfc7e0,dfbfbfe4) at open+0xad syscall(27,27,dfbfbfe4,dfbfc7e0,dfbfc7b4) at syscall+0x187 Xint0x80_syscall() at Xint0x80_syscall+0x4c db> panic panic: from debugger mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 syncing disks... ------------- The SCSI bus is freed at the wrong moment, probably due to the device resetting. Then the command is retried, but is aborted AGAIN due to a selection timeout (indicating that the device had not completed resetting). This might be caused by bad firmware on the disk or a too weak power supply. I assume this is bad firmware. Combined with the VFS code being conservative (not wanting to throw away buffer contents on fatal write errors (which might lead to file system corruption if this is a transient error)), this sometimes lead to the buffer queues being filled with dirty buffers associated with the invalidated disk pack. Combined with what appears to be a bug in the routine waitfreebuffers, this could lead to an infinite busy loop in the kernel inside a splbio() protect region of code. >How-To-Repeat: Use Quantum disks. >Fix: Index: vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.193.2.5 diff -u -r1.193.2.5 vfs_bio.c --- vfs_bio.c 1999/04/20 19:54:20 1.193.2.5 +++ vfs_bio.c 1999/05/12 19:57:13 @@ -577,7 +577,8 @@ if (bp->b_flags & B_LOCKED) bp->b_flags &= ~B_ERROR; - if ((bp->b_flags & (B_READ | B_ERROR)) == B_ERROR) { + if ((bp->b_flags & (B_READ | B_ERROR)) == B_ERROR && + bp->b_error != ENXIO) { bp->b_flags &= ~B_ERROR; bdirty(bp); } else if ((bp->b_flags & (B_NOCACHE | B_INVAL | B_ERROR | B_FREEBUF)) || @@ -1219,7 +1220,7 @@ waitfreebuffers(int slpflag, int slptimeo) { while (numfreebuffers < hifreebuffers) { flushdirtybuffers(slpflag, slptimeo); - if (numfreebuffers < hifreebuffers) + if (numfreebuffers >= hifreebuffers) break; needsbuffer |= VFS_BIO_NEED_FREE; if (tsleep(&needsbuffer, (PRIBIO + 4)|slpflag, "biofre", slptimeo)) >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message