Date: Fri, 31 Jan 2003 01:27:38 +0000 From: Fergal Daly <fergal@esatclear.ie> To: freebsd-hackers@freebsd.org Subject: Random disk cache expiry Message-ID: <200301310127.38826.fergal@esatclear.ie>
next in thread | raw e-mail | index | archive | help
kientzle@acm.org (Tim Kientzle) wrote in message=20 news:<b128ta$b0n$1@FreeBSD.csie.NCTU.edu.tw>... > Personally, I think there's a lot of merit to _trying_ There's even more merit in only pretending to try...=20 Here's the results of a quick simulation of a cache using random replacem= ent.=20 I've also included a scheme for a caching algorithm that might solve the=20 problem. I have simulated multiple sequential reads of a file. There are two=20 parameters, the ratio of the file size to the cache size and the number o= f=20 times the file is read. For each combination of paramters, the simulation is averaged over 50 run= s. The size of the cache was 100 blocks, if you have a faster machine than m= ine,=20 you can turn that up but I'm pretty sure it'll make very little differenc= e. Since we are talking about files being read more than once, the first rea= ding=20 of the file (which will give only cache misses) is not included in the=20 statistics. What does each column mean? ratio: how big the file is compared to the cache size lock: the cache hit % for a cache that just locks the first x blocks of t= he=20 file into the cache and doesn't try to cache anything else once the cache= is=20 full. 1, 2, 3, 4, 5: the cache hit % for the random replacement cache with 1, 2= , 3,=20 4, 5 full reads of the file (not counting the initial read to warm up the= =20 cache) l:r: how much better the locking cache is compared to the random cache ratio=09lock=091=092=093=094=095=09l:r 1.0=09100=09100=09100=09100=09100=09100=09100 1.1=09 90=09 84=09 83=09 83=09 82=09 82=09107 1.2=09 83=09 72=09 69=09 69=09 69=09 69=09115 1.3=09 76=09 61=09 59=09 58=09 58=09 58=09124 1.4=09 71=09 52=09 51=09 50=09 49=09 49=09136 1.5=09 66=09 45=09 43=09 42=09 42=09 42=09146 1.6=09 62=09 39=09 38=09 36=09 36=09 36=09158 1.7=09 58=09 33=09 32=09 31=09 31=09 31=09173 1.8=09 55=09 29=09 28=09 27=09 27=09 27=09187 1.9=09 52=09 26=09 24=09 24=09 24=09 23=09198 2.0=09 50=09 22=09 21=09 20=09 20=09 20=09219 2.1=09 47=09 19=09 19=09 18=09 18=09 18=09238 2.2=09 45=09 17=09 16=09 16=09 16=09 15=09253 2.3=09 43=09 15=09 15=09 14=09 14=09 14=09280 2.4=09 41=09 13=09 12=09 12=09 12=09 12=09298 2.5=09 40=09 11=09 11=09 11=09 10=09 10=09336 2.6=09 38=09 10=09 9=09 10=09 9=09 9=09352 2.7=09 37=09 9=09 8=09 8=09 8=09 8=09394 2.8=09 35=09 8=09 8=09 8=09 7=09 7=09434 2.9=09 34=09 7=09 6=09 6=09 6=09 6=09472 3.0=09 33=09 6=09 6=09 6=09 6=09 6=09477 3.1=09 32=09 5=09 5=09 5=09 5=09 5=09556 3.2=09 31=09 5=09 5=09 5=09 4=09 4=09577 3.3=09 30=09 4=09 4=09 4=09 4=09 4=09645 3.4=09 29=09 4=09 3=09 3=09 3=09 3=09700 3.5=09 28=09 3=09 3=09 3=09 3=09 3=09714 3.6=09 27=09 3=09 3=09 3=09 3=09 3=09869 3.7=09 27=09 3=09 2=09 2=09 2=09 2=09886 3.8=09 26=09 2=09 2=09 2=09 2=09 2=09948 3.9=09 25=09 2=09 2=09 2=09 2=09 2=091035 4.0=09 25=09 2=09 2=09 2=09 2=09 2=091094 As you can see, the locking cache is always better than the random one an= d the=20 file doesn't have to be very big for it to be hugely better. Maybe this is well known already or maybe it is not compatible with the=20 current system but here's an idea for how you might implement the locking= =20 cache. When a program hints that a file will be read in this way, the cached dat= a for=20 that file is treated differently. Blocks to be uncached are selected in t= he=20 usual LRU way. Say block A is selected to be evicted but block A is part of one of these= =20 special files, so instead of evicting block A, we look for another block = in=20 the file, call it block B. Block B should be the block furthest from the = head=20 of the file. We evict block B and we give block B's timestamp to block A. To see this in action, imagine that blocks 1 to 20 of a file are in the c= ache.=20 The LRU cache decides to evict block 1, we see that it's part of a specia= l=20 file so instead we evict block 20 and give block 20's timestamp to block = 1.=20 Soon after, the LRU cache decides to evict block 2 but we evict block 19=20 instead and give its timestamp to block 2. Next our program reads block 2= 1=20 which is then cached. Soon after that the cache tries to evict block 3 bu= t=20 instead we evict block 21 and give it's timestamp to block 3. Taking the file as a whole, the cache neither prefers nor penalises it=20 compared to the current algorithm. However within the file, the blocks ne= ar=20 the head are given preference over the ones near the tail. If after 1 run through the file the cache has only kept 40% of the file. = Under=20 the current scheme this would be the final 40% of the file which is no us= e=20 for the next run. Under the new scheme it's got the initial 40% which is = as=20 good as we could hope for. There is the downside. The cache now has to keep an ordered list of the b= locks=20 contained in these special files owever there is no extra overhead for ot= her=20 files. One other possible problem is that the blocks at the head of the file wil= l=20 tend to have the newer timestamps and so when the file is read a second t= ime=20 the newer timstamps will be overwritten before the older ones. It might b= e=20 better to manage timestamps as a FIFO queue but on second thought, I thin= k=20 that will give this file an unfair advantage. It should be straight forward to extend this to groups of files by keepin= g a=20 single ordered list of all the blocks in several files. The Perl for the benchmark is available at http://www.fergaldaly.com/computer/randcache/bench F --=20 Do you need someone with lots of Unix sysadmin and/or lots of OO software= =20 development experience? Go on, giz a job. My CV - http://www.fergaldaly.com/cv.html To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200301310127.38826.fergal>