Date: Mon, 27 Jan 2003 00:19:41 -0500 From: "Brian T. Schellenberger" <bschellenberger@nc.rr.com> To: "Sean Hamilton" <sh@bel.bc.ca>, <hackers@freebsd.org> Subject: Re: Random disk cache expiry Message-ID: <200301270019.44066.bschellenberger@nc.rr.com> In-Reply-To: <001801c2c5c0$5666de10$16e306cf@slugabed.org> References: <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <3E34A6BB.2090601@acm.org> <001801c2c5c0$5666de10$16e306cf@slugabed.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday 26 January 2003 11:55 pm, Sean Hamilton wrote: | ----- Original Message ----- | From: "Tim Kientzle" <kientzle@acm.org> | | | Cycling through large data sets is not really that uncommon. | | I do something like the following pretty regularly: | | find /usr/src -type f | xargs grep function_name | | | | Even scanning through a large dataset once can really hurt | | competing applications on the same machine by flushing | | their data from the cache for no gain. I think this | | is where randomized expiration might really win, by reducing the | | penalty for disk-cache-friendly applications who are competing | | with disk-cache-unfriendly applications. | | | After further thought, I propose something much simpler: when the | kernel is hinted that access will be sequential, it should stop | caching when there is little cache space available, instead of | throwing away old blocks, or be much more hesitant to throw away old | blocks. This to me is imminently sensible. In fact there seem like two rules that have come up in this discussion: 1. For sequential access, you should be very hesitant to throw away *another* processes blocks, at least once you have used more than, say, 25% of the cache or potential cache. 2. For sequential access, you should stop caching before you throw away your own blocks. If it's sequential it is, it seems to me, always a lose to throw away your *own* processes older bllocks on thee same file. These algorithmic changes seem to me to be more likely to be optimal most of the time. A random approach *does* reduce the penalty for worst-case scenarios but at the cost of reducing the benefit of both "normal" and "best-case" scenarios, it seems to me, even more. | Consider that in almost all cases where access is sequential, | as reading continues, the chances of the read being aborted increase: | ie, users downloading files, directory tree traversal, etc. Since the | likelihood of the first byte reading the first byte is very high, and | the next one less high, and the next less yet, etc, it seems to make | sense to tune the caching algorithm to accomodate this. | | While discussing disks, I have a minor complaint: at least on IDE | systems, when doing something like an untar, the entire system is | painfully unresponsive, even though CPU load is low. I presume this | is because when an executable is run, it needs to sit and wait for | the disk. Wouldn't it make sense to give very high disk priority to | executables? Isn't that worth the extra seeks? | | sh | | | To Unsubscribe: send mail to majordomo@FreeBSD.org | with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200301270019.44066.bschellenberger>
