From owner-freebsd-hackers@FreeBSD.ORG  Mon Dec 26 20:42:03 2011
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 79063106564A
	for <freebsd-hackers@freebsd.org>; Mon, 26 Dec 2011 20:42:03 +0000 (UTC)
	(envelope-from me@acm.jhu.edu)
Received: from centaur.acm.jhu.edu (batman.acm.jhu.edu [128.220.251.35])
	by mx1.freebsd.org (Postfix) with ESMTP id 4DE4C8FC12
	for <freebsd-hackers@freebsd.org>; Mon, 26 Dec 2011 20:42:03 +0000 (UTC)
Received: from centaur.acm.jhu.edu (localhost.localdomain [127.0.0.1])
	by centaur.acm.jhu.edu (Postfix) with ESMTP id 0DF8C1F60B0C
	for <freebsd-hackers@freebsd.org>; Mon, 26 Dec 2011 15:24:15 -0500 (EST)
Received: by centaur.acm.jhu.edu (Postfix, from userid 1003)
	id CF3CD1F60AD2; Mon, 26 Dec 2011 15:24:14 -0500 (EST)
Date: Mon, 26 Dec 2011 15:24:14 -0500
From: Venkatesh Srinivas <vsrinivas@dragonflybsd.org>
To: freebsd-hackers@freebsd.org
Message-ID: <20111226202414.GA18713@centaur.acm.jhu.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-08-17)
X-Virus-Scanned: ClamAV using ClamSMTP
X-Mailman-Approved-At: Tue, 27 Dec 2011 01:37:38 +0000
Subject: Per-mount syncer threads and fanout for pagedaemon cleaning
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Dec 2011 20:42:03 -0000

Hi!

I've been playing with two things in DragonFly that might be of interest here.

Thing #1 :=

First, per-mountpoint syncer threads. Currently there is a single thread,
'syncer', which periodically calls fsync() on dirty vnodes from every mount,
along with calling vfs_sync() on each filesystem itself (via syncer vnodes).

My patch modifies this to create syncer threads for mounts that request it.
For these mounts, vnodes are synced from their mount-specific thread rather
than the global syncer.

The idea is that periodic fsync/sync operations from one filesystem should not
stall or delay synchronization for other ones. 

The patch was fairly simple:
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640

There's certainly more that could be done in this direction -- the current patch
does preserve a global syncer ('syncer0') for unflagged filesystems and for
running the rushjobs logic from speedup_syncer. And the current patch preserves
the notion of syncer vnodes, which are entirely overkill when there are 
per-mount sync threads. But its a start and something very similar could apply
to FreeBSD.

Thing #2 :=

Currently when pagedaemon decides to launder a dirty page, it initiates I/O
for the launder from within its own thread context. While the I/O is generally
asynchronous, the call path to get there from pagedaemon is deep and fraught
with stall points: (for vnode_pager, possible stalls annotated)

	pagedaemon scans ->
		...
		vm_pageout_clean ->							[block on vm_object locks,
													 page busy]
			vm_pageout_flush ->
				vnode_pager_putpages ->
						vnode_generic_putpages ->
							<fs>_write ->			[block on FS locks]
								b(,a,d)write ->		[wait on runningbufspace]
									<fs>_stratgy ->
										Oh my...

While any part of this path is stalled, pagedaemon is not continuing to do its
job; this could be a problem -- so long as it is not laundering pages, we are
not resolving any page shortages.

Given Thing #1, we have per-mountpoint service threads; I think it'd be worth
pushing out the deeper parts of this callpath into those threads. The idea is
that pagedaemon would select and cluster pages as it does now, but would use
the syncer threads to walk through the pager and FS layer. An added benefit
of using the syncer threads is that contention between fsync/vfs_sync on an
FS and pageout on that same FS would be excluded. The pagedaemon would not 
wait for the I/O to initiate before continuing to scan more candidates.

I've not found an ideal place to break up this callchain, but either between
vm_pageout_clean / vm_pageout_flush, or at the entry to the vnode_pager would
be good places. In experiments, I've sent the vm_pageout_flush calls off to
a convenient taskqueue, seems to work okay. But sending them to per-mount
threads would be better.

Any thoughts on either of these things? 

Hope this was interesting,
--vs;