Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jan 2003 00:19:41 -0500
From:      "Brian T. Schellenberger" <bschellenberger@nc.rr.com>
To:        "Sean Hamilton" <sh@bel.bc.ca>, <hackers@freebsd.org>
Subject:   Re: Random disk cache expiry
Message-ID:  <200301270019.44066.bschellenberger@nc.rr.com>
In-Reply-To: <001801c2c5c0$5666de10$16e306cf@slugabed.org>
References:  <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <3E34A6BB.2090601@acm.org> <001801c2c5c0$5666de10$16e306cf@slugabed.org>

next in thread | previous in thread | raw e-mail | index | archive | help



On Sunday 26 January 2003 11:55 pm, Sean Hamilton wrote:
| ----- Original Message -----
| From: "Tim Kientzle" <kientzle@acm.org>
|
| | Cycling through large data sets is not really that uncommon.
| | I do something like the following pretty regularly:
| |     find /usr/src -type f | xargs grep function_name
| |
| | Even scanning through a large dataset once can really hurt
| | competing applications on the same machine by flushing
| | their data from the cache for no gain.  I think this
| | is where randomized expiration might really win, by reducing the
| | penalty for disk-cache-friendly applications who are competing
| | with disk-cache-unfriendly applications.
|
|
| After further thought, I propose something much simpler: when the
| kernel is hinted that access will be sequential, it should stop
| caching when there is little cache space available, instead of
| throwing away old blocks, or be much more hesitant to throw away old
| blocks. 

This to me is imminently sensible.
In fact there seem like two rules that have come up in this discussion:

1. For sequential access, you should be very hesitant to throw away 
*another* processes blocks, at least once you have used more than, say, 
25% of the cache or potential cache.

2. For sequential access, you should stop caching before you throw away 
your own blocks.  If it's sequential it is, it seems to me, always a 
lose to throw away your *own* processes older bllocks on thee same 
file.

These algorithmic changes seem to me to be more likely to be optimal 
most of the time.

A random approach *does* reduce the penalty for worst-case scenarios but 
at the cost of reducing the benefit of both "normal" and "best-case" 
scenarios, it seems to me, even more.


| Consider that in almost all cases where access is sequential,
| as reading continues, the chances of the read being aborted increase:
| ie, users downloading files, directory tree traversal, etc. Since the
| likelihood of the first byte reading the first byte is very high, and
| the next one less high, and the next less yet, etc, it seems to make
| sense to tune the caching algorithm to accomodate this.
|
| While discussing disks, I have a minor complaint: at least on IDE
| systems, when doing something like an untar, the entire system is
| painfully unresponsive, even though CPU load is low. I presume this
| is because when an executable is run, it needs to sit and wait for
| the disk. Wouldn't it make sense to give very high disk priority to
| executables? Isn't that worth the extra seeks?
|
| sh
|
|
| To Unsubscribe: send mail to majordomo@FreeBSD.org
| with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200301270019.44066.bschellenberger>