Date: Thu, 30 Jan 2003 16:33:46 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Tim Kientzle <kientzle@acm.org> Cc: "Brian T. Schellenberger" <bschellenberger@nc.rr.com>, Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG Subject: Re: Random disk cache expiry Message-ID: <200301310033.h0V0XkRH091013@apollo.backplane.com> References: <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301270019.44066.bschellenberger@nc.rr.com> <3E34C734.8010801@acm.org> <200301270904.43899.bschellenberger@nc.rr.com> <200301302222.h0UMMfFI090349@apollo.backplane.com> <3E39BE22.8050207@acm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:Not necessarily. I suspect that there is
:a strong tendency to access particular files
:in particular ways. E.g., in your example of
:a download server, those files are always
:read sequentially. You can make similar assertions
:about a lot of files: manpages, gzip files,
:C source code files, etc, are "always" read
:sequentially.
:
:If a file's access history were stored as a "hint"
:associated with the file, then it would
:be possible to make better up-front decisions about
:how to allocate cache space. The ideal would be to
This has been tried. It works up to a point, but
not to the extent that you want it to. The basic
problem is that past history does not necessarily
predict future behavior. With the web server example,
different client loads will result in different
access behaviors. They might still all be sequential,
but the combinations of multiple users will change
the behavior enough that you would not be able to use
the history as a reliable metric to control the cache.
There is also an issue of how to store the 'history'.
It isn't a simple matter of storing when a block was
last accessed. Analysis of the access history is just
as important and a lot of the type of analysis we humans
do is intuitive and just cannot be replicated by a computer.
Basically it all devolves down into the case where if
you know exactly how something is going to be accessed,
or you need caching to work a certain way in order to
guarentee a certain behavior, the foreknowledge you have
of the access methodologies will allow you to cache the
information manually far better then the system could
cache it heuristically.
:store such hints on disk (maybe as an extended
:attribute?), but it might also be useful to cache
:them in memory somewhere. That would allow the
:cache-management code to make much earlier decisions
:about how to handle a file. For example, if a process
:started to read a 10GB file that has historically been
:accessed sequentially, you could immediately decide
:to enable read-ahead for performance, but also mark
:those pages to be released as soon as they were read by the
:process.
:
:FWIW, a web search for "randomized caching" yields
:some interesting reading. Apparently, there are
:a few randomized cache-management algorithms for
:which the mathematics work out reasonably well,
:despite Terry's protestations to the contrary. ;-)
:I haven't yet found any papers describing experiences
:with real implementations, though.
:
:If only I had the time to spend poring over FreeBSD's
:cache-management code to see how these ideas might
:actually be implemented... <sigh>
:
:Tim Kientzle
It should be noted that was already implement most of the
heuristics you talk about. We have a heuristic that
detects sequential access patterns, for example, and
enables clustered read-ahead. The problem isn't detection,
the problem is scale. These heuristics work wonderfully
at a small scale (i.e. lets read 64K ahead verses trying to
cache 64MB worth of the file). Just knowing something is
sequential does not allow you to choose how much memory you
should set aside to cache that object, for example. Automatically
depressing the priority of pages read sequentially after they've been
used can have as terrible a performance impact as it can a positive
one, depending on the size of the object, the number of distinct
objects being accessed in that manner, perceived latency by end users,
number of end users, the speed of their connections (some objects may
be accessed more slowly then others depending on the client's network
bandwidth), and so forth.
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200301310033.h0V0XkRH091013>
