From owner-freebsd-arch Mon Feb 10 14:54:15 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D30E37B401 for ; Mon, 10 Feb 2003 14:54:14 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6222A43F3F for ; Mon, 10 Feb 2003 14:54:07 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1AMs7SJ023932; Mon, 10 Feb 2003 14:54:07 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1AMs7ra023931; Mon, 10 Feb 2003 14:54:07 -0800 (PST) Date: Mon, 10 Feb 2003 14:54:07 -0800 (PST) From: Matthew Dillon Message-Id: <200302102254.h1AMs7ra023931@apollo.backplane.com> To: Alfred Perlstein Cc: Morten Rodal , phk@phk.freebsd.dk, David Schultz , arch@FreeBSD.ORG Subject: Re: Our lemming-syncer caught in the act. References: <20030210091317.GD5165@HAL9000.homeunix.com> <37473.1044868995@critter.freebsd.dk> <20030210204523.GF12240@slurp.rodal.no> <20030210205443.GA88781@elvis.mu.org> <200302102225.h1AMPkTE023700@apollo.backplane.com> <20030210223904.GG88781@elvis.mu.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :There are several good points in your mail, especially about performing :partial fsyncs, however this doesn't really explain why 4.x is "ok" versus :5.x having issues. The only explanation I can think of is that the syncer :runs with giant for a long time but under 4.x only needs splbio() for short :periods. The syncer tends to block quite a bit under normal operation, so it seems unlikely that Giant would be an issue unless the syncer is looping without getting any work accomplished. *THIS* is possible, since we've already had to hack the syncer considerably to prevent it from looping on unflushable softupdates buffers. There could be new cases that have not been addressed. If I recall I have made one change in the last few months which might be worth looking at. dillon 2002/12/28 13:03:42 PST Modified files: sys/vm vm_object.c vm_pager.h vnode_pager.c Log: Allow the VM object flushing code to cluster. When the filesystem syncer comes along and flushes a file which has been mmap()'d SHARED/RW, with dirty pages, it was flushing the underlying VM object asynchronously, resulting in thousands of 8K writes. With this change the VM Object flushing code will cluster dirty pages in 64K blocks. Note that until the low memory deadlock issue is reviewed, it is not safe to allow the pageout daemon to use this feature. Forced pageouts still use fs block size'd ops for the moment. MFC after: 3 days Revision Changes Path 1.250 +10 -3 src/sys/vm/vm_object.c 1.37 +4 -2 src/sys/vm/vm_pager.h 1.165 +8 -2 src/sys/vm/vnode_pager.c But, as you can see, it is in -stable as well. I doubt the above could be the cause. Profiling the problem in action might give us some more clues. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message