Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jan 2003 19:25:47 -0800
From:      Tim Kientzle <kientzle@acm.org>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Matthew Dillon <dillon@apollo.backplane.com>, Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG
Subject:   Re: Random disk cache expiry
Message-ID:  <3E34A6BB.2090601@acm.org>
References:  <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301261931.h0QJVCp8052101@apollo.backplane.com> <3E348B51.6F4D6096@mindspring.com> <200301270142.h0R1guR3070182@apollo.backplane.com> <3E3494CC.5895492D@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Sean Hamilton proposes:

> Wouldn't it seem logical to have [randomized disk cache expiration] in
> place at all times?

Terry Lambert responds:

>>:I really dislike the idea of random expiration; I don't understand
>>:the point, unless you are trying to get better numbers on some

 >>:benchmark.

Matt Dillon concedes:

>> ... it's only useful when you are cycling through a [large] data set ...

Cycling through large data sets is not really that uncommon.
I do something like the following pretty regularly:
    find /usr/src -type f | xargs grep function_name

Even scanning through a large dataset once can really hurt
competing applications on the same machine by flushing
their data from the cache for no gain.  I think this
is where randomized expiration might really win, by reducing the
penalty for disk-cache-friendly applications who are competing
with disk-cache-unfriendly applications.

There's an extensive literature on randomized algorithms.
Although I'm certainly no expert, I understand that such
algorithms work very well in exactly this sort of application,
since they "usually" avoid worst-case behavior under a broad
variety of inputs.  The current cache is, in essence,
tuned specifically to work badly on a system where applications
are scanning through large amounts of data.  No matter what
deterministic caching algorithm you use, you're choosing
to behave badly under some situation.


Personally, I think there's a lot of merit to _trying_

randomized disk cache expiry and seeing how it works in practice.
(I would also observe here that 5.0 now has a fast, high-quality
source of randomness that seems ideal for exactly such
applications.)  I don't believe that it would _prevent_ applications
from using optimizations such as those that Terry suggests,
while possibly providing reasonable performance under a
broader range of scenarios than are currently supported.

Sounds like a good idea to me.

Tim Kientzle


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E34A6BB.2090601>