From owner-freebsd-hackers@FreeBSD.ORG Mon Dec 26 20:42:03 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 79063106564A for ; Mon, 26 Dec 2011 20:42:03 +0000 (UTC) (envelope-from me@acm.jhu.edu) Received: from centaur.acm.jhu.edu (batman.acm.jhu.edu [128.220.251.35]) by mx1.freebsd.org (Postfix) with ESMTP id 4DE4C8FC12 for ; Mon, 26 Dec 2011 20:42:03 +0000 (UTC) Received: from centaur.acm.jhu.edu (localhost.localdomain [127.0.0.1]) by centaur.acm.jhu.edu (Postfix) with ESMTP id 0DF8C1F60B0C for ; Mon, 26 Dec 2011 15:24:15 -0500 (EST) Received: by centaur.acm.jhu.edu (Postfix, from userid 1003) id CF3CD1F60AD2; Mon, 26 Dec 2011 15:24:14 -0500 (EST) Date: Mon, 26 Dec 2011 15:24:14 -0500 From: Venkatesh Srinivas To: freebsd-hackers@freebsd.org Message-ID: <20111226202414.GA18713@centaur.acm.jhu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-08-17) X-Virus-Scanned: ClamAV using ClamSMTP X-Mailman-Approved-At: Tue, 27 Dec 2011 01:37:38 +0000 Subject: Per-mount syncer threads and fanout for pagedaemon cleaning X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Dec 2011 20:42:03 -0000 Hi! I've been playing with two things in DragonFly that might be of interest here. Thing #1 := First, per-mountpoint syncer threads. Currently there is a single thread, 'syncer', which periodically calls fsync() on dirty vnodes from every mount, along with calling vfs_sync() on each filesystem itself (via syncer vnodes). My patch modifies this to create syncer threads for mounts that request it. For these mounts, vnodes are synced from their mount-specific thread rather than the global syncer. The idea is that periodic fsync/sync operations from one filesystem should not stall or delay synchronization for other ones. The patch was fairly simple: http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640 There's certainly more that could be done in this direction -- the current patch does preserve a global syncer ('syncer0') for unflagged filesystems and for running the rushjobs logic from speedup_syncer. And the current patch preserves the notion of syncer vnodes, which are entirely overkill when there are per-mount sync threads. But its a start and something very similar could apply to FreeBSD. Thing #2 := Currently when pagedaemon decides to launder a dirty page, it initiates I/O for the launder from within its own thread context. While the I/O is generally asynchronous, the call path to get there from pagedaemon is deep and fraught with stall points: (for vnode_pager, possible stalls annotated) pagedaemon scans -> ... vm_pageout_clean -> [block on vm_object locks, page busy] vm_pageout_flush -> vnode_pager_putpages -> vnode_generic_putpages -> _write -> [block on FS locks] b(,a,d)write -> [wait on runningbufspace] _stratgy -> Oh my... While any part of this path is stalled, pagedaemon is not continuing to do its job; this could be a problem -- so long as it is not laundering pages, we are not resolving any page shortages. Given Thing #1, we have per-mountpoint service threads; I think it'd be worth pushing out the deeper parts of this callpath into those threads. The idea is that pagedaemon would select and cluster pages as it does now, but would use the syncer threads to walk through the pager and FS layer. An added benefit of using the syncer threads is that contention between fsync/vfs_sync on an FS and pageout on that same FS would be excluded. The pagedaemon would not wait for the I/O to initiate before continuing to scan more candidates. I've not found an ideal place to break up this callchain, but either between vm_pageout_clean / vm_pageout_flush, or at the entry to the vnode_pager would be good places. In experiments, I've sent the vm_pageout_flush calls off to a convenient taskqueue, seems to work okay. But sending them to per-mount threads would be better. Any thoughts on either of these things? Hope this was interesting, --vs;