Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jan 2003 22:49:20 -0800
From:      David Schultz <dschultz@uclink.Berkeley.EDU>
To:        Tim Kientzle <kientzle@acm.org>
Cc:        Terry Lambert <tlambert2@mindspring.com>, Matthew Dillon <dillon@apollo.backplane.com>, Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG
Subject:   Re: Random disk cache expiry
Message-ID:  <20030127064920.GB1018@HAL9000.homeunix.com>
In-Reply-To: <3E34A6BB.2090601@acm.org>
References:  <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301261931.h0QJVCp8052101@apollo.backplane.com> <3E348B51.6F4D6096@mindspring.com> <200301270142.h0R1guR3070182@apollo.backplane.com> <3E3494CC.5895492D@mindspring.com> <3E34A6BB.2090601@acm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Thus spake Tim Kientzle <kientzle@acm.org>:
> Sean Hamilton proposes:
> 
> >Wouldn't it seem logical to have [randomized disk cache expiration] in
> >place at all times?
> 
> Terry Lambert responds:
> 
> >>:I really dislike the idea of random expiration; I don't understand
> >>:the point, unless you are trying to get better numbers on some
> 
> >>:benchmark.
> 
> Matt Dillon concedes:
> 
> >>... it's only useful when you are cycling through a [large] data set ...
> 
> Cycling through large data sets is not really that uncommon.
> I do something like the following pretty regularly:
>    find /usr/src -type f | xargs grep function_name
> 
> Even scanning through a large dataset once can really hurt
> competing applications on the same machine by flushing
> their data from the cache for no gain.  I think this
> is where randomized expiration might really win, by reducing the
> penalty for disk-cache-friendly applications who are competing
> with disk-cache-unfriendly applications.
> 
> There's an extensive literature on randomized algorithms.
> Although I'm certainly no expert, I understand that such
> algorithms work very well in exactly this sort of application,
> since they "usually" avoid worst-case behavior under a broad
> variety of inputs.  The current cache is, in essence,
> tuned specifically to work badly on a system where applications
> are scanning through large amounts of data.  No matter what
> deterministic caching algorithm you use, you're choosing
> to behave badly under some situation.

Yes, if you randomly vary the behavior of the algorithm, you can
guarantee that on average, performance will never be too bad for
any particular input, but it will never be very good, on average,
either.

You can't mathematically prove everything about the memory access
patterns of real-world programs, but LRU seems to do pretty well
in a variety of situations.  It does, however, have its worst
cases.  A random replacement algorithm is very unlikely to do *as*
badly as LRU in LRU's worst case; its performance is consistent
and relatively poor.  Keep in mind that there's a bias in
most real-world programs that favors LRU more than randomness.

So should FreeBSD make it possible to ask for random replacement?
Probably, since it would be helpful for those times when you
*know* that LRU isn't going to do the right thing.  (In the
sequential read special case the OP mentions, random replacement
is better than LRU, but still worse than a deterministic algorithm
that just caches the prefix of the file that will fit in memory.
So in this situation we could do even better in theory.)

Should randomness be part of the default replacement algorithm, as
the OP suggests?  Probably not, since that would be pessimizing
performance in the common case for the sake of improving it in an
uncommon case.

Should the system be able to detect cases where the default
replacement algorithm is failing and dynamically modify its
behavior?  I think that would be really cool, if it is possible...

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030127064920.GB1018>