Date: Thu, 30 Jan 2003 14:22:41 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: "Brian T. Schellenberger" <bschellenberger@nc.rr.com> Cc: kientzle@acm.org, Sean Hamilton <sh@bel.bc.ca>, hackers@FreeBSD.ORG Subject: Re: Random disk cache expiry Message-ID: <200301302222.h0UMMfFI090349@apollo.backplane.com> References: <000501c2c4dd$f43ed450$16e306cf@slugabed.org> <200301270019.44066.bschellenberger@nc.rr.com> <3E34C734.8010801@acm.org> <200301270904.43899.bschellenberger@nc.rr.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Well, here's a counterpoint. Lets say you have an FTP
server with 1G of ram full of, say, pirated CDs at 600MB a
pop.
Now lets say someone puts up a new madonna CD and suddenly
you have thousands of people from all over the world trying
to download a single 600MB file.
Lets try another one. Lets say you have an FTP server with
1G of ram full of hundreds of MPEG encoded pirated CDs at
50MB a pop and you have thousands of people from all over the
world trying to download a core set of 25 CDs, which exceeds
the available ram you have to cache all of them.
What I'm trying to illustrate here is the impossibility of
what you are asking. Your idea of 'sequential' access cache
restriction only works if there is just one process doing the
accessing. But if you have 25 processes accessing 25 different files
sequentially it doesn't work, and how is the system supposed
to detect the difference between 25 processes accessing 25
50MB files on a 1G machine (which doesn't fit in the cache)
verses 300 processes accessing 15 50MB files on a 1G machine
(which does fit). Furthermore, how do you differentiate
between 30 processes all downloading the same 600MB CD verses
30 processes downloading two different 600MB CD's, on a machine
with 1G of cache?
You can't. That's the problem. There is no magic number between
0 and the amount of memory you have where you can say "I am going
to stop caching this sequential file" that covers even the more
common situations that come up. There is no algorithm that can
detect the above situations before the fact or on the fly. You
can analyize the situation after the fact, but by then it is too late,
and the situation may change from minute to minute. One minute you
have 300 people trying to download one CD, the next minute you have
20 people trying to download 10 different CD's.
-Matt
:The suggestion here basically boils down to this: if the system could
:act on hints that somebody will be doing sequential access, then it
:should be more timid about caching for that file access. That is to
:say, it should allow that file to "use up" a smaller number of blocks
:from the cache (yes, the VM) at a time, and it should favor, if
:anything, a LIFO scheme instead of the usual FIFO (LRU) scheme. (That
:is to say, for the special case of *sequential* access, LRU == FIFO,
:and yet LIFO is probably more optimal for this case, at least if the
:file will be re-read later.)
:
:Caching will do more good on files that that will be randomly accessed;
:an intermediate amount of good on files sequentially accessed but
:rewound and/or accessed over and over, and if the file system could
:somehow know (or be hinted) that the file is being sequentially
:accessed and is unlikely to be accessed again for a good long while it
:would clearly be better off not caching it at all.
:
:Of course the trick here is waving my hands and saying "assume that you
:know how the file will be accessed in the future." You ought to
:pillory me for *that* bit. Even with hinting there are problems with
:this whole idea. Still with some hinting the algorithm could probably
:be a little more clever.
:
:(Actually, Terry Lambert *did* pillory me for that bit, just a bit, when
:he pointed out the impossibility of knowing whether the file is being
:used in the same way by other processes.)
:
:And . . . also to Terry, yes, I know that my proposal about
:over-simplifies, but the point is that for sequential access you want
:to go "gentle" on making the cache of other process' and earlier reads
:go away.
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200301302222.h0UMMfFI090349>
