Date: Fri, 26 Jul 2002 07:31:04 +1000 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Andreas Koch <koch@eis.cs.tu-bs.de>, freebsd-stable@FreeBSD.ORG Subject: Re: 4.6-RC: Glacial speed of dump backups Message-ID: <20020726073104.R38313@gsmx07.alcatel.com.au> In-Reply-To: <200207251715.g6PHFGDD034256@apollo.backplane.com>; from dillon@apollo.backplane.com on Thu, Jul 25, 2002 at 10:15:16AM -0700 References: <20020606204948.GA4540@ultra4.eis.cs.tu-bs.de> <20020722081614.E367@gsmx07.alcatel.com.au> <20020722100408.GP26095@ultra4.eis.cs.tu-bs.de> <200207221943.g6MJhIBX054785@apollo.backplane.com> <20020725164416.A52778@gsmx07.alcatel.com.au> <200207251715.g6PHFGDD034256@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I wrote: 64MB cache hangs. On 2002-Jul-25 10:15:16 -0700, Matthew Dillon <dillon@apollo.backplane.com> wrote: >Interesting. What was the cache block size reported by dump? DUMP: Cache 67108864 MB, blocksize = 32768 >If you have the time, it may be worth playing with the cache block size. I don't have time right now, but I agree that this should be worth experimenting with. > The NetBSD caching code appears to try to avoid caching whole blocks, > operating under the assumption that if a read for a whole block occurs > dump is not likely to re-request the block. Changing the conditional > above and setting the BLKFACTOR to 1 in my code will mimic this > behavior. Actually, from memory of the statistics I gathered previously, apart from inodes, dump only ever reads a single "block" (offset/size pair) once. The trick is to identify when dump will read both (offset,size1) and (offset+size1,size2) and merge it into read(offset,size1+size2) (even though the original reads occur at different times and read into non-adjacent buffers). A traditional cache relies on locality of reference - and I'm not sure that UFS layout provides this when there are lots of small files. > I'm not sure why dump failed w/ a 64MB cache. I will investigate. Having had a bit of a closer look, the problem is related to swap starvation - one of the children dies and the parent doesn't notice. Also, whilst I knew dump forked multiple times, but I thought that the parents were just sleeping. It looks like at least the first few children are active which means the system thrashes fairly badly unless there's enough RAM to keep 5 or 6 copies of the cache resident. I've tried repeating the 64M cache on another Proliant with 256MB RAM and it ran to completion (though slowly). This suggests that unless you want to limit dump to using very small caches, you need to share the cache between all the children (which implies a lot more synchronisation code). Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020726073104.R38313>