Date: Tue, 14 Feb 2012 21:50:04 -0700 From: Scott Long <scottl@samsco.org> To: Peter Jeremy <peterjeremy@acm.org> Cc: freebsd-stable@freebsd.org Subject: Re: disk devices speed is ugly Message-ID: <CA28336C-8462-4358-9E68-B01EEB4237CE@samsco.org> In-Reply-To: <20120214200258.GA29641@server.vk2pj.dyndns.org> References: <4F215A99.8020003@os2.kiev.ua> <4F27C04F.7020400@omnilan.de> <4F27C7C7.3060807@os2.kiev.ua> <CAJ-VmomezUWrEgxxmUEOhWnmLDohMAWRpSXmTR=n2y_LuizKJg@mail.gmail.com> <4F37F81E.7070100@os2.kiev.ua> <CAJ-Vmok9Ph1sgFCy6kNT4XR14grTLvG9M3JvT9eVBRjgqD%2BY9g@mail.gmail.com> <4F38AF69.6010506@os2.kiev.ua> <20120213132821.GA78733@in-addr.com> <20120214200258.GA29641@server.vk2pj.dyndns.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote: > On 2012-Feb-13 08:28:21 -0500, Gary Palmer <gpalmer@freebsd.org> = wrote: >> The filesystem is the *BEST* place to do caching. It knows what = metadata >> is most effective to cache and what other data (e.g. file contents) = doesn't >> need to be cached. >=20 > Agreed. >=20 >> Any attempt to do this in layers between the FS and >> the disk won't achieve the same gains as a properly written = filesystem.=20 >=20 > Agreed - but traditionally, Unix uses this approach via block devices. > For various reasons, FreeBSD moved caching into UFS and removed block > devices. Unfortunately, this means that any FS that wants caching has > to implement its own - and currently only UFS & ZFS do. >=20 > What would be nice is a generic caching subsystem that any FS can use > - similar to the old block devices but with hooks to allow the FS to > request read-ahead, advise of unwanted blocks and ability to flush > dirty blocks in a requested order with the equivalent of barriers > (request Y will not occur until preceeding request X has been > committed to stable media). This would allow filesystems to regain > the benefits of block devices with minimal effort and then improve > performance & cache efficiency with additional work. >=20 Any filesystem that uses bread/bwrite/cluster_read are already using the = "generic caching subsystem" that you propose. This includes UDF, = CD9660, MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local = storage filesystem in the tree except for ZFS. Not all of them = implement VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations = for the vnode pager, not requirements for using buffer-cache services on = block devices. As Kostik pointed out in a parallel email, the only = thing that was removed from FreeBSD was the userland interface to cached = devices via /dev nodes. This has nothing to do with filesystems, though = I suppose that could maybe sorta kinda be an issue for FUSE?. ZFS isn't in this list because it implements its own private = buffer/cache (the ARC) that understands the special requirements of ZFS. = There are good and bad aspects to this, noted below. > One downside of the "each FS does its own caching" in that the caches > are all separate and need careful integration into the VM subsystem to > prevent starvation (eg past problems with UFS starving ZFS L2ARC). >=20 I'm not sure what you mean here. The ARC is limited by available wired = memory; attempts to allocate such memory will evict pages from the = buffer cache as necessary, until all available RAM is consumed. If = anything, ZFS starves the rest of the system, not the other way around, = and that's simply because the ARC isn't integrated with the normal VM. = Such integration is extremely hard and has nothing to do with having a = generic caching subsystem. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA28336C-8462-4358-9E68-B01EEB4237CE>