From owner-freebsd-current@FreeBSD.ORG Tue May 6 22:27:47 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 56F8D37B405 for ; Tue, 6 May 2003 22:27:47 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id AD81943F85 for ; Tue, 6 May 2003 22:27:46 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h475ReM7031452 for ; Tue, 6 May 2003 22:27:44 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200305070527.h475ReM7031452@gw.catspoiler.org> Date: Tue, 6 May 2003 22:27:40 -0700 (PDT) From: Don Lewis To: current@FreeBSD.org In-Reply-To: <200305070458.h474waM7031393@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Subject: Re: bwrite() wdrain hang in -current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 May 2003 05:27:48 -0000 On 6 May, I wrote: > The vnode in question this time around is 0xc9ba27fc, which corresponds > to /tmp/obj/usr/src/gnu/usr.bin/cc/cc1/cc1. The stack for the bufdaemon > thread holding the lock is: > > Proc 0xc6153000 [SLPQ wdrain c05ce978][SLP] bufdaemon > mi_switch(c60d8260,44,c050c1cf,ca,1) at mi_switch+0x210 > msleep(c05ce978,c05ce980,44,c0511486,0) at msleep+0x432 > bwrite(d28ce2e0,0,c0511368,697,137e400) at bwrite+0x442 > vfs_bio_awrite(d28ce2e0,0,c0511368,87b,0) at vfs_bio_awrite+0x221 > flushbufqueues(0,0,c0511368,10e,64) at flushbufqueues+0x17d > buf_daemon(0,e0a92d48,c0509769,310,0) at buf_daemon+0xdc > fork_exit(c0363240,0,e0a92d48) at fork_exit+0xc0 > fork_trampoline() at fork_trampoline+0x1a > > What is puzzling is why this process is sleeping here. It appears that > maybe a wakeup didn't happen. This machine has 1 GB of RAM, so I don't > think memory pressure should be a cause. Here's the source at bwrite+0x442 Sigh ... it looks like the problem is that enough work gets queued up on the NFS client side that it is preventing the server side from draining the total amount to below lorunningspace. This deadlocks the NFS server side, which prevents the NFS client side from draining. static __inline void runningbufwakeup(struct buf *bp) { if (bp->b_runningbufspace) { atomic_subtract_int(&runningbufspace, bp->b_runningbufspace); bp->b_runningbufspace = 0; mtx_lock(&rbreqlock); if (runningbufreq && runningbufspace <= lorunningspace) { runningbufreq = 0; wakeup(&runningbufreq); } mtx_unlock(&rbreqlock); } } Probably the best cure would be to always allow at least some minimum amount per device or mount point.