From owner-freebsd-hackers@FreeBSD.ORG Tue Dec 27 17:05:49 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92BBB106566B; Tue, 27 Dec 2011 17:05:49 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id EB9158FC0C; Tue, 27 Dec 2011 17:05:45 +0000 (UTC) Received: by wgbdr11 with SMTP id dr11so21198510wgb.31 for ; Tue, 27 Dec 2011 09:05:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=M/KufI56rlxkDHPM5O5GtePaan/LJTcvGNd/EWk48l0=; b=F1EA7nPc1v+RTRHy90Y/vwaElIs7DucxIOYlgvRg52FbZozqaBPNX7bpytUFOHD7qD Xxu9KAj+qclZWmhcLMmOuDGdcEqRlZmqJyl59dY8Z3GsBpxRNAltNkrS+LEvId2Dr7qr D5DF1Hug+Kcls0qKK3akGZp6a33UmzaHLUOAA= MIME-Version: 1.0 Received: by 10.227.207.15 with SMTP id fw15mr28428072wbb.15.1325005544447; Tue, 27 Dec 2011 09:05:44 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.216.18.130 with HTTP; Tue, 27 Dec 2011 09:05:44 -0800 (PST) In-Reply-To: References: <20111226202414.GA18713@centaur.acm.jhu.edu> Date: Tue, 27 Dec 2011 18:05:44 +0100 X-Google-Sender-Auth: utPQqJ-hIoEkTFmidEiIxdWWszo Message-ID: From: Attilio Rao To: mdf@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, Giovanni Trematerra , Venkatesh Srinivas Subject: Re: Per-mount syncer threads and fanout for pagedaemon cleaning X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Dec 2011 17:05:49 -0000 2011/12/27 : > On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao wrote: >> 2011/12/27 Giovanni Trematerra : >>> On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas >>> wrote: >>>> Hi! >>>> >>>> I've been playing with two things in DragonFly that might be of intere= st >>>> here. >>>> >>>> Thing #1 :=3D >>>> >>>> First, per-mountpoint syncer threads. Currently there is a single thre= ad, >>>> 'syncer', which periodically calls fsync() on dirty vnodes from every = mount, >>>> along with calling vfs_sync() on each filesystem itself (via syncer vn= odes). >>>> >>>> My patch modifies this to create syncer threads for mounts that reques= t it. >>>> For these mounts, vnodes are synced from their mount-specific thread r= ather >>>> than the global syncer. >>>> >>>> The idea is that periodic fsync/sync operations from one filesystem sh= ould >>>> not >>>> stall or delay synchronization for other ones. >>>> The patch was fairly simple: >>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1= efc595db0db397b4365f08b640 >>>> >>> >>> There's something WIP by attilio@ on that area. >>> you might want to take a look at >>> http://people.freebsd.org/~attilio/syncer_alpha_15.diff >>> >>> I don't know what hammerfs needs but UFS/FFS and buffer cache make a go= od >>> job performance-wise and so the authors are skeptical about the boost t= hat such >>> a change can give. We believe that brain cycles need to be spent on >>> other pieces of the system such as ARC and ZFS. >> >> More specifically, it is likely that focusing on UFS and buffer cache >> for performance is not really useful, we should drive our efforts over >> ARC and ZFS. >> Also, the real bottlenecks in our I/O paths are in GEOM >> single-threaded design, lack of unmapped I/O functionality, possibly >> lack of proritized I/O, etc. > > Indeed, Isilon (and probably other vendors as well) entirely skip > VFS_SYNC when the WAIT argument is MNT_LAZY. =C2=A0Since we're a > distributed journalled filesystem, syncing via a system thread is not > a relevant operation; i.e. all writes that have exited a VOP_WRITE or > similar operation are already in reasonably stable storage in a > journal on the relevant nodes. > > However, we do then have our own threads running on each node to flush > the journal regularly (in addition to when it fills up), and I don't > know enough about this to know if it could be fit into the syncer > thread idea or if it's too tied in somehow to our architecture. I'm not really sure how does journaling is implemented on OneFS, but when I made this patch SU+J wasn't yet there. Also, this patch just adds the infrastructure for a multithreaded and configurable syncer, which means it still requires the UFS bits for skipping the "double-syncing" (alias the MNT_LAZY skippage you mentioned). Attilio --=20 Peace can only be achieved by understanding - A. Einstein