Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2012 21:50:04 -0700
From:      Scott Long <scottl@samsco.org>
To:        Peter Jeremy <peterjeremy@acm.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: disk devices speed is ugly
Message-ID:  <CA28336C-8462-4358-9E68-B01EEB4237CE@samsco.org>
In-Reply-To: <20120214200258.GA29641@server.vk2pj.dyndns.org>
References:  <4F215A99.8020003@os2.kiev.ua> <4F27C04F.7020400@omnilan.de> <4F27C7C7.3060807@os2.kiev.ua> <CAJ-VmomezUWrEgxxmUEOhWnmLDohMAWRpSXmTR=n2y_LuizKJg@mail.gmail.com> <4F37F81E.7070100@os2.kiev.ua> <CAJ-Vmok9Ph1sgFCy6kNT4XR14grTLvG9M3JvT9eVBRjgqD%2BY9g@mail.gmail.com> <4F38AF69.6010506@os2.kiev.ua> <20120213132821.GA78733@in-addr.com> <20120214200258.GA29641@server.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote:

> On 2012-Feb-13 08:28:21 -0500, Gary Palmer <gpalmer@freebsd.org> =
wrote:
>> The filesystem is the *BEST* place to do caching.  It knows what =
metadata
>> is most effective to cache and what other data (e.g. file contents) =
doesn't
>> need to be cached.
>=20
> Agreed.
>=20
>> Any attempt to do this in layers between the FS and
>> the disk won't achieve the same gains as a properly written =
filesystem.=20
>=20
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
>=20
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
>=20

Any filesystem that uses bread/bwrite/cluster_read are already using the =
"generic caching subsystem" that you propose.  This includes UDF, =
CD9660, MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local =
storage filesystem in the tree except for ZFS.  Not all of them =
implement VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations =
for the vnode pager, not requirements for using buffer-cache services on =
block devices.  As Kostik pointed out in a parallel email, the only =
thing that was removed from FreeBSD was the userland interface to cached =
devices via /dev nodes.  This has nothing to do with filesystems, though =
I suppose that could maybe sorta kinda be an issue for FUSE?.

ZFS isn't in this list because it implements its own private =
buffer/cache (the ARC) that understands the special requirements of ZFS. =
 There are good and bad aspects to this, noted below.

> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
>=20

I'm not sure what you mean here.  The ARC is limited by available wired =
memory; attempts to allocate such memory will evict pages from the =
buffer cache as necessary, until all available RAM is consumed.  If =
anything, ZFS starves the rest of the system, not the other way around, =
and that's simply because the ARC isn't integrated with the normal VM.  =
Such integration is extremely hard and has nothing to do with having a =
generic caching subsystem.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA28336C-8462-4358-9E68-B01EEB4237CE>