From owner-freebsd-hackers Sun Jan 26 19:26:24 2003 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E56237B401 for ; Sun, 26 Jan 2003 19:26:22 -0800 (PST) Received: from kientzle.com (h-66-166-149-50.SNVACAID.covad.net [66.166.149.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 904F843ED8 for ; Sun, 26 Jan 2003 19:26:21 -0800 (PST) (envelope-from kientzle@acm.org) Received: from acm.org (UGLY.x.kientzle.comg [66.166.149.51] (may be forged)) by kientzle.com (8.11.3/8.11.3) with ESMTP id h0R3PlR17908; Sun, 26 Jan 2003 19:25:47 -0800 (PST) (envelope-from kientzle@acm.org) Message-ID: <3E34A6BB.2090601@acm.org> Date: Sun, 26 Jan 2003 19:25:47 -0800 From: Tim Kientzle Reply-To: kientzle@acm.org User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:0.9.6) Gecko/20011206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Terry Lambert Cc: Matthew Dillon , Sean Hamilton , hackers@FreeBSD.ORG Subject: Re: Random disk cache expiry References: <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301261931.h0QJVCp8052101@apollo.backplane.com> <3E348B51.6F4D6096@mindspring.com> <200301270142.h0R1guR3070182@apollo.backplane.com> <3E3494CC.5895492D@mindspring.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Sean Hamilton proposes: > Wouldn't it seem logical to have [randomized disk cache expiration] in > place at all times? Terry Lambert responds: >>:I really dislike the idea of random expiration; I don't understand >>:the point, unless you are trying to get better numbers on some >>:benchmark. Matt Dillon concedes: >> ... it's only useful when you are cycling through a [large] data set ... Cycling through large data sets is not really that uncommon. I do something like the following pretty regularly: find /usr/src -type f | xargs grep function_name Even scanning through a large dataset once can really hurt competing applications on the same machine by flushing their data from the cache for no gain. I think this is where randomized expiration might really win, by reducing the penalty for disk-cache-friendly applications who are competing with disk-cache-unfriendly applications. There's an extensive literature on randomized algorithms. Although I'm certainly no expert, I understand that such algorithms work very well in exactly this sort of application, since they "usually" avoid worst-case behavior under a broad variety of inputs. The current cache is, in essence, tuned specifically to work badly on a system where applications are scanning through large amounts of data. No matter what deterministic caching algorithm you use, you're choosing to behave badly under some situation. Personally, I think there's a lot of merit to _trying_ randomized disk cache expiry and seeing how it works in practice. (I would also observe here that 5.0 now has a fast, high-quality source of randomness that seems ideal for exactly such applications.) I don't believe that it would _prevent_ applications from using optimizations such as those that Terry suggests, while possibly providing reasonable performance under a broader range of scenarios than are currently supported. Sounds like a good idea to me. Tim Kientzle To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message