From owner-freebsd-hackers  Thu Jan 30 16: 7:44 2003
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7E4E437B401
	for <hackers@FreeBSD.ORG>; Thu, 30 Jan 2003 16:07:42 -0800 (PST)
Received: from kientzle.com (h-66-166-149-50.SNVACAID.covad.net [66.166.149.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 94D0843F79
	for <hackers@FreeBSD.ORG>; Thu, 30 Jan 2003 16:07:40 -0800 (PST)
	(envelope-from kientzle@acm.org)
Received: from acm.org (big.x.kientzle.com [66.166.149.54])
	by kientzle.com (8.11.3/8.11.3) with ESMTP id h0V074R31167;
	Thu, 30 Jan 2003 16:07:04 -0800 (PST)
	(envelope-from kientzle@acm.org)
Message-ID: <3E39BE22.8050207@acm.org>
Date: Thu, 30 Jan 2003 16:06:58 -0800
From: Tim Kientzle <kientzle@acm.org>
Reply-To: kientzle@acm.org
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:0.9.6) Gecko/20011206
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: "Brian T. Schellenberger" <bschellenberger@nc.rr.com>,
	Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG
Subject: Re: Random disk cache expiry
References: <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301270019.44066.bschellenberger@nc.rr.com> <3E34C734.8010801@acm.org> <200301270904.43899.bschellenberger@nc.rr.com> <200301302222.h0UMMfFI090349@apollo.backplane.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Matthew Dillon wrote:

> Your idea of 'sequential' access cache restriction only

> works if there is just one process doing the accessing.


Not necessarily.  I suspect that there is
a strong tendency to access particular files
in particular ways.  E.g., in your example of
a download server, those files are always
read sequentially.  You can make similar assertions
about a lot of files: manpages, gzip files,
C source code files, etc, are "always" read
sequentially.

If a file's access history were stored as a "hint"
associated with the file, then it would
be possible to make better up-front decisions about
how to allocate cache space.  The ideal would be to
store such hints on disk (maybe as an extended
attribute?), but it might also be useful to cache
them in memory somewhere.  That would allow the
cache-management code to make much earlier decisions
about how to handle a file.  For example, if a process
started to read a 10GB file that has historically been
accessed sequentially, you could immediately decide
to enable read-ahead for performance, but also mark
those pages to be released as soon as they were read by the
process.

FWIW, a web search for "randomized caching" yields
some interesting reading.  Apparently, there are
a few randomized cache-management algorithms for
which the mathematics work out reasonably well,
despite Terry's protestations to the contrary.  ;-)
I haven't yet found any papers describing experiences
with real implementations, though.

If only I had the time to spend poring over FreeBSD's
cache-management code to see how these ideas might
actually be implemented... <sigh>

Tim Kientzle


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message