Date: Sun, 26 Jan 2003 22:49:20 -0800 From: David Schultz <dschultz@uclink.Berkeley.EDU> To: Tim Kientzle <kientzle@acm.org> Cc: Terry Lambert <tlambert2@mindspring.com>, Matthew Dillon <dillon@apollo.backplane.com>, Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG Subject: Re: Random disk cache expiry Message-ID: <20030127064920.GB1018@HAL9000.homeunix.com> In-Reply-To: <3E34A6BB.2090601@acm.org> References: <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301261931.h0QJVCp8052101@apollo.backplane.com> <3E348B51.6F4D6096@mindspring.com> <200301270142.h0R1guR3070182@apollo.backplane.com> <3E3494CC.5895492D@mindspring.com> <3E34A6BB.2090601@acm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Thus spake Tim Kientzle <kientzle@acm.org>: > Sean Hamilton proposes: > > >Wouldn't it seem logical to have [randomized disk cache expiration] in > >place at all times? > > Terry Lambert responds: > > >>:I really dislike the idea of random expiration; I don't understand > >>:the point, unless you are trying to get better numbers on some > > >>:benchmark. > > Matt Dillon concedes: > > >>... it's only useful when you are cycling through a [large] data set ... > > Cycling through large data sets is not really that uncommon. > I do something like the following pretty regularly: > find /usr/src -type f | xargs grep function_name > > Even scanning through a large dataset once can really hurt > competing applications on the same machine by flushing > their data from the cache for no gain. I think this > is where randomized expiration might really win, by reducing the > penalty for disk-cache-friendly applications who are competing > with disk-cache-unfriendly applications. > > There's an extensive literature on randomized algorithms. > Although I'm certainly no expert, I understand that such > algorithms work very well in exactly this sort of application, > since they "usually" avoid worst-case behavior under a broad > variety of inputs. The current cache is, in essence, > tuned specifically to work badly on a system where applications > are scanning through large amounts of data. No matter what > deterministic caching algorithm you use, you're choosing > to behave badly under some situation. Yes, if you randomly vary the behavior of the algorithm, you can guarantee that on average, performance will never be too bad for any particular input, but it will never be very good, on average, either. You can't mathematically prove everything about the memory access patterns of real-world programs, but LRU seems to do pretty well in a variety of situations. It does, however, have its worst cases. A random replacement algorithm is very unlikely to do *as* badly as LRU in LRU's worst case; its performance is consistent and relatively poor. Keep in mind that there's a bias in most real-world programs that favors LRU more than randomness. So should FreeBSD make it possible to ask for random replacement? Probably, since it would be helpful for those times when you *know* that LRU isn't going to do the right thing. (In the sequential read special case the OP mentions, random replacement is better than LRU, but still worse than a deterministic algorithm that just caches the prefix of the file that will fit in memory. So in this situation we could do even better in theory.) Should randomness be part of the default replacement algorithm, as the OP suggests? Probably not, since that would be pessimizing performance in the common case for the sake of improving it in an uncommon case. Should the system be able to detect cases where the default replacement algorithm is failing and dynamically modify its behavior? I think that would be really cool, if it is possible... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030127064920.GB1018>