FreeBSD Mail Archives

Date:      Thu, 30 Jan 2003 16:33:46 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Tim Kientzle <kientzle@acm.org>
Cc:        "Brian T. Schellenberger" <bschellenberger@nc.rr.com>, Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG
Subject:   Re: Random disk cache expiry
Message-ID:  <200301310033.h0V0XkRH091013@apollo.backplane.com>
References:  <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301270019.44066.bschellenberger@nc.rr.com> <3E34C734.8010801@acm.org> <200301270904.43899.bschellenberger@nc.rr.com> <200301302222.h0UMMfFI090349@apollo.backplane.com> <3E39BE22.8050207@acm.org>

index | next in thread | previous in thread | raw e-mail



:Not necessarily.  I suspect that there is
:a strong tendency to access particular files
:in particular ways.  E.g., in your example of
:a download server, those files are always
:read sequentially.  You can make similar assertions
:about a lot of files: manpages, gzip files,
:C source code files, etc, are "always" read
:sequentially.
:
:If a file's access history were stored as a "hint"
:associated with the file, then it would
:be possible to make better up-front decisions about
:how to allocate cache space.  The ideal would be to

    This has been tried.  It works up to a point, but
    not to the extent that you want it to.  The basic
    problem is that past history does not necessarily
    predict future behavior.  With the web server example,
    different client loads will result in different
    access behaviors.  They might still all be sequential,
    but the combinations of multiple users will change
    the behavior enough that you would not be able to use
    the history as a reliable metric to control the cache.

    There is also an issue of how to store the 'history'.
    It isn't a simple matter of storing when a block was
    last accessed.  Analysis of the access history is just
    as important and a lot of the type of analysis we humans
    do is intuitive and just cannot be replicated by a computer.

    Basically it all devolves down into the case where if
    you know exactly how something is going to be accessed,
    or you need caching to work a certain way in order to
    guarentee a certain behavior, the foreknowledge you have
    of the access methodologies will allow you to cache the
    information manually far better then the system could
    cache it heuristically.

:store such hints on disk (maybe as an extended
:attribute?), but it might also be useful to cache
:them in memory somewhere.  That would allow the
:cache-management code to make much earlier decisions
:about how to handle a file.  For example, if a process
:started to read a 10GB file that has historically been
:accessed sequentially, you could immediately decide
:to enable read-ahead for performance, but also mark
:those pages to be released as soon as they were read by the
:process.
:
:FWIW, a web search for "randomized caching" yields
:some interesting reading.  Apparently, there are
:a few randomized cache-management algorithms for
:which the mathematics work out reasonably well,
:despite Terry's protestations to the contrary.  ;-)
:I haven't yet found any papers describing experiences
:with real implementations, though.
:
:If only I had the time to spend poring over FreeBSD's
:cache-management code to see how these ideas might
:actually be implemented... <sigh>
:
:Tim Kientzle

    It should be noted that was already implement most of the
    heuristics you talk about.  We have a heuristic that
    detects sequential access patterns, for example, and
    enables clustered read-ahead.  The problem isn't detection,
    the problem is scale.  These heuristics work wonderfully
    at a small scale (i.e. lets read 64K ahead verses trying to
    cache 64MB worth of the file).  Just knowing something is
    sequential does not allow you to choose how much memory you
    should set aside to cache that object, for example.  Automatically
    depressing the priority of pages read sequentially after they've been
    used can have as terrible a performance impact as it can a positive
    one, depending on the size of the object, the number of distinct
    objects being accessed in that manner, perceived latency by end users,
    number of end users, the speed of their connections (some objects may
    be accessed more slowly then others depending on the client's network
    bandwidth), and so forth.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200301310033.h0V0XkRH091013>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation