From owner-freebsd-hackers@FreeBSD.ORG Tue Dec 27 17:22:07 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 043591065670; Tue, 27 Dec 2011 17:22:07 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id C42AF8FC12; Tue, 27 Dec 2011 17:22:06 +0000 (UTC) Received: by dakp5 with SMTP id p5so11062580dak.13 for ; Tue, 27 Dec 2011 09:22:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=k4jsR63zRpPmq8Dg8kJwA37Czdqs4/Y1R6RauuSAkG0=; b=TnAupdvWJMsoSewOEfiVQ7r31vjRpAlpRthqn4YAsB66Puc5gqlrV2pEim7kb2iQ7R 2W0Fe/xPbtYlL+NhQ46YtYIB3VgyPiDZ2ZmfvZ8Zf0eCiPTJoWVCT0fB9lK2VEwBs7OP ugZDrk4QYtbVubFrUChbH1qkpLBhwbx8N89Oo= MIME-Version: 1.0 Received: by 10.68.196.169 with SMTP id in9mr66821744pbc.54.1325006526270; Tue, 27 Dec 2011 09:22:06 -0800 (PST) Sender: mdf356@gmail.com Received: by 10.68.208.167 with HTTP; Tue, 27 Dec 2011 09:22:06 -0800 (PST) In-Reply-To: References: <20111226202414.GA18713@centaur.acm.jhu.edu> Date: Tue, 27 Dec 2011 09:22:06 -0800 X-Google-Sender-Auth: _q_1c9XGB-k320kiaTuAIqfo-5E Message-ID: From: mdf@FreeBSD.org To: Attilio Rao Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, Giovanni Trematerra , Venkatesh Srinivas Subject: Re: Per-mount syncer threads and fanout for pagedaemon cleaning X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Dec 2011 17:22:07 -0000 On Tue, Dec 27, 2011 at 9:05 AM, Attilio Rao wrote: > 2011/12/27 =A0: >> On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao wrote= : >>> 2011/12/27 Giovanni Trematerra : >>>> On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas >>>> wrote: >>>>> Hi! >>>>> >>>>> I've been playing with two things in DragonFly that might be of inter= est >>>>> here. >>>>> >>>>> Thing #1 :=3D >>>>> >>>>> First, per-mountpoint syncer threads. Currently there is a single thr= ead, >>>>> 'syncer', which periodically calls fsync() on dirty vnodes from every= mount, >>>>> along with calling vfs_sync() on each filesystem itself (via syncer v= nodes). >>>>> >>>>> My patch modifies this to create syncer threads for mounts that reque= st it. >>>>> For these mounts, vnodes are synced from their mount-specific thread = rather >>>>> than the global syncer. >>>>> >>>>> The idea is that periodic fsync/sync operations from one filesystem s= hould >>>>> not >>>>> stall or delay synchronization for other ones. >>>>> The patch was fairly simple: >>>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e= 1efc595db0db397b4365f08b640 >>>>> >>>> >>>> There's something WIP by attilio@ on that area. >>>> you might want to take a look at >>>> http://people.freebsd.org/~attilio/syncer_alpha_15.diff >>>> >>>> I don't know what hammerfs needs but UFS/FFS and buffer cache make a g= ood >>>> job performance-wise and so the authors are skeptical about the boost = that such >>>> a change can give. We believe that brain cycles need to be spent on >>>> other pieces of the system such as ARC and ZFS. >>> >>> More specifically, it is likely that focusing on UFS and buffer cache >>> for performance is not really useful, we should drive our efforts over >>> ARC and ZFS. >>> Also, the real bottlenecks in our I/O paths are in GEOM >>> single-threaded design, lack of unmapped I/O functionality, possibly >>> lack of proritized I/O, etc. >> >> Indeed, Isilon (and probably other vendors as well) entirely skip >> VFS_SYNC when the WAIT argument is MNT_LAZY. =A0Since we're a >> distributed journalled filesystem, syncing via a system thread is not >> a relevant operation; i.e. all writes that have exited a VOP_WRITE or >> similar operation are already in reasonably stable storage in a >> journal on the relevant nodes. >> >> However, we do then have our own threads running on each node to flush >> the journal regularly (in addition to when it fills up), and I don't >> know enough about this to know if it could be fit into the syncer >> thread idea or if it's too tied in somehow to our architecture. > > I'm not really sure how does journaling is implemented on OneFS, but > when I made this patch SU+J wasn't yet there. > Also, this patch just adds the infrastructure for a multithreaded and > configurable syncer, which means it still requires the UFS bits for > skipping the "double-syncing" (alias the MNT_LAZY skippage you > mentioned). Right, I don't object to any changes relating to multiple sync threads, etc., just trying to offer a vendor viewpoint. Though having one thread per mount would allow for a different sync interval for each filesystem which can be of advantage. Right after I did Isilon's last FreeBSD merge (it seems like a long time ago now), I wanted to look into what it would take to eliminate our specialed journal flush thread (i.e. tie it into VFS_SYNC), but one objection was that then the flush interval would not be configurable separately from the one for our UFS partition. Cheers, matthew