From owner-freebsd-arch@FreeBSD.ORG Thu Apr 15 10:10:24 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E8D4106566C; Thu, 15 Apr 2010 10:10:24 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.159]) by mx1.freebsd.org (Postfix) with ESMTP id 9C9158FC13; Thu, 15 Apr 2010 10:10:22 +0000 (UTC) Received: by fg-out-1718.google.com with SMTP id l26so1416648fgb.13 for ; Thu, 15 Apr 2010 03:10:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:received:message-id:subject:from:to:cc :content-type; bh=hCDk0oD70uEo6LrCGM57pj1oCwP7FyVRtghE+cwavbg=; b=Ob/grwGt/NTGbuaGWbyUUXZZYBYkAa9DhNwGPLSSjQZvRfBXyqXZ2SgiCAGG3PJTRZ HuGBZ/dSt2KHKfGbDi2z5zDdjNsBiyRY03BwUC0uRCRtZyU+gcLnIBfmMVv1NFzFUcKC waufk3YpvB1FIx5N2FeMzBMsKfPphHUQx9EDk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; b=cmgimaGkY2J8z8P3Me/yZfixNVoSEzZHHMFWRLwTz64nO32Qt8hCZAVuBcs4GuFOOW QqFKKJTJdw/DFYYRYD0Z+fA8H+s1DKLLubi4T4VXEBbFN3L+L8buy2SHJWZOp/FS6LRI CShUTr1t3Y2h4p0msHpZifflfH8v83CtEuh4Q= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.239.164.140 with HTTP; Thu, 15 Apr 2010 03:10:21 -0700 (PDT) Date: Thu, 15 Apr 2010 12:10:21 +0200 X-Google-Sender-Auth: 7e5271c0cf20c8e6 Received: by 10.239.186.140 with SMTP id g12mr520973hbh.146.1271326221435; Thu, 15 Apr 2010 03:10:21 -0700 (PDT) Message-ID: From: Attilio Rao To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Giovanni Trematerra Subject: [PATCH] Syncer rewriting X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2010 10:10:24 -0000 With a fundamental aid by Giovanni Trematerra and Peter Holm, I rewrote the syncer following plans and discussions happened over the last 2 years and started by a Jeff's effort during BSDCan 2008 (for a more complete reference you may check: http://people.freebsd.org/~jeff/bsdcanbuf.pdf ). Summarizing a bit, the syncer suffers of the following problems: - Poor scalability: just one thread that needs to serve all the several different mounted filesystems - Poor flexibility: the current syncer is just used to sync on disk dirty buffers and nothing else, catering buffer-cache based filesystems - Complex design: in order to DTRT, syncer needs the help of a syncer vnode and introduce some complex locking pattern. Additively, as a partial mitigation, a separate queue for the !MPSAFE filesystem might be added - Poor performance: that is actually more FS specific than anything. UFS (but I'm not sure if this is the only one), after have synced the dirty vnodes, does a VFS_SYNC() that actually re-synces all the referenced vnodes. That means dirty vnodes will be synced 2 times in the same timeframe. The rewriting wants to address all these problems. The main idea is to offer a simple and opaque layer that interacts directly with the VFS and that any filesystem may override in order to offer their own implementation of syncer ability. Right now, the layer lives within the VFS_* methods and the mount structure. More precisely it offers 5 virtual functions (VFS_SYNCER_INIT, VFS_SYNCER_DESTROY, VFS_SYNCER_ATTACH, VFS_SYNCER_DETACH, VFS_SYNCER_SPEEDUP) and an opaque, private pointer for storing syncer-specific datas. This means the syncer design may not stuck to the specific thread/process model as it is now, for a given filesystem. Also, this design may be easilly extended in order to support more features, if needed. The syncer, meant as what we have now, becames the 'standard one' but switches to a different model. It becames per-mount and it then gets rid of the syncer vnode. This also helps in simplifying a lot the locking within the syncer because now any thread is responsible only for its own dog-food. Filesystems specify their own syncer in the vfsops or they receive, by default, the buffer cache "standard" syncer. Current filesystems not using the buffer cache, however, may use the VFS_EOPNOTSUPP specification in order to avoid completely defining a filesystem syncer. The patch has been tested intensively by trema and pho on a lot of different workload and filesystems: http://www.freebsd.org/~attilio/syncer_beta_0.diff Sparse notes: - The performance problem, even if the patch doesn't currently supports it, may be easilly addressed now by skipping syncing, in ffs_fsync() for the MNT_LAZY case and having ffs_sync() taking care of it. - The standard syncer may be further improved getting rid of the bufobj. It should actually handle a list of vnodes rather than a list of bufobj. However similar optimizations may be done after the patch is ready to enter the tree. - The mount interlock now protects the bo_flag & BO_ONWORKLST and the synclist iterator, thus there is no need to hold the bufobj lock when accessing them. However the specific for checking if a bufobj is dirty or not are still protected by bufobj lock, thus the insertion path still needs of it too. Notably things that I would receive comments on are mostly linked to the default syncer: - I didn't use any form of threads consolidation for threads automatically assigned by the default syncer. We may have different opinion and good arguments on it. - Something we might be willing is to think about the !SMP case. Maybe we don't want the multi-thread approach for that case? Should we revert the current approach for !SMP? - Right now the VFS_SYNCER_INIT() and VFS_SYNCER_DESTROY() are used not only for flexibility but also for necessity by the default syncer. Some consumers may be willing to fill-in the workitem queues earlier than the syncer starts (VFS_SYNCER_ATTACH()) and you may not want to loose such filled vnodes. This approach is good and offers the possibility to also support mount state updates simply without loosing informations, but it has the dis-advantage to allocate structures for filesystems that may forever be RO. More testing, reviews and comments are undoubtly welcome at this point. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein