From owner-freebsd-stable Wed Dec 6 12:51: 2 2000 From owner-freebsd-stable@FreeBSD.ORG Wed Dec 6 12:50:59 2000 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id 89DBD37B400 for ; Wed, 6 Dec 2000 12:50:59 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eB6Koxc98287; Wed, 6 Dec 2000 12:50:59 -0800 (PST) (envelope-from dillon) Date: Wed, 6 Dec 2000 12:50:59 -0800 (PST) From: Matt Dillon Message-Id: <200012062050.eB6Koxc98287@earth.backplane.com> To: stable@freebsd.org Cc: Barry Lustig Subject: Re: VMware hanging Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Ok (this to the -stable mailing list). Barry and I have figurd out why VMWare was locking up on him. We tracked it down to excessive dirty filesystem buffers, created through this codepath: Debugger(c02ca083) at Debugger+0x35 panic(c02cd684,ca77cbe8,c018054a,c4f9cdb8,c4f9cdb8) at panic+0x70 bdirty(c4f9cdb8,c4f9cdb8,cb719000,0,c1218800) at bdirty+0x5d bdwrite(c4f9cdb8,cb719000,0,0,2000) at bdwrite+0x52 cluster_write(c4f9cdb8,676a000,0,0) at cluster_write+0x313 ffs_write(ca77ccf8) at ffs_write+0x48b vnode_pager_generic_putpages(cb719000,ca77ce08,10000,0,ca77cd9c) at vnode_pager_generic_putpages+0x181 ffs_putpages(ca77cd60) at ffs_putpages+0x1f vnode_pager_putpages(cb72dae0,ca77ce08,10,0,ca77cd9c) at vnode_pager_putpages+0x 6a vm_pageout_flush(ca77ce08,10,0,0,c0ad7a7c) at vm_pageout_flush+0xb1 vm_object_page_clean(cb72dae0,0,0,4,0) at vm_object_page_clean+0x36a vfs_msync(c11e5200,2,ca768600,0,ca76e780) at vfs_msync+0xc5 sync_fsync(ca77cf7c) at sync_fsync+0x4b sched_sync(0) at sched_sync+0xf3 fork_trampoline() at fork_trampoline+0x8 The problem occurs when msync() tries to flush non-contiguous dirty pages. There is a hole in cluster code that does not check for excessive dirty buffers before issuing a bdwrite(), resulting in excessive dirty buffers. Since the cluster code is rather sensitive, my fix occurs at a higher level (in the UFS code). The patch below appears to solve the problem for Barry. I will commit it to -current now, and to -stable in 2 days. I've also committed a patch to writev() to -current and will backport it to -stable in 2 days as well. -Matt Index: ufs/ufs/ufs_readwrite.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_readwrite.c,v retrieving revision 1.65.2.3 diff -u -r1.65.2.3 ufs_readwrite.c --- ufs/ufs/ufs_readwrite.c 2000/11/26 02:55:13 1.65.2.3 +++ ufs/ufs/ufs_readwrite.c 2000/12/06 20:03:59 @@ -495,6 +495,9 @@ if (ioflag & IO_SYNC) { (void)bwrite(bp); + } else if (vm_page_count_severe() || buf_dirty_count_severe()) { + bp->b_flags |= B_CLUSTEROK; + bawrite(bp); } else if (xfersize + blkoffset == fs->fs_bsize) { if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0) { bp->b_flags |= B_CLUSTEROK; @@ -502,9 +505,6 @@ } else { bawrite(bp); } - } else if (vm_page_count_severe() || buf_dirty_count_severe()) { - bp->b_flags |= B_CLUSTEROK; - bawrite(bp); } else { bp->b_flags |= B_CLUSTEROK; bdwrite(bp); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message