Date: Tue, 28 Oct 2003 18:32:59 -0800 From: Ken Marx <kmarx@vicor.com> To: Kirk McKusick <mckusick@beastie.mckusick.com> Cc: julian@elischer.org Subject: Re: 4.8 ffs_dirpref problem Message-ID: <3F9F26DB.6050207@vicor.com> In-Reply-To: <200310261749.h9QHnieN015824@beastie.mckusick.com> References: <200310261749.h9QHnieN015824@beastie.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Kirk McKusick wrote: >>Date: Thu, 23 Oct 2003 17:58:54 -0700 >>From: Ken Marx <kmarx@vicor.com> >>To: Kirk McKusick <mckusick@mckusick.com> >>CC: Julian Elischer <julian@vicor.com>, cburrell@vicor.com, davep@vicor.com, >> Ken Marx <kmarx@vicor.com>, gluk@ptci.ru, jpl@vicor.com, jrh@vicor.com, >> julian@vicor-nb.com, VicPE@aol.com >>Subject: Re: 4.8 ffs_dirpref problem >>X-ASK-Info: Whitelist match >> >>Hi Kirk, >> >>I had a few minutes before heading out, so tried getting a list >>of block numbers in the bufferhash bucket that seemed to have >>lots of hits. The depth changes of course, but I caught it at >>one point at a depth of 600 or so: >> >>/kernel: dumpbh( 250 ) >>/kernel: bp[1]: b_vp=0xcfa3d480, b_lblkno=52561, b_flags=0x20100020 >>/kernel: bp[2]: b_vp=0xcf3c5d00, b_lblkno=345047104, b_flags=0x200000a0 >>... >> >>For no good reason, I sorted by block number and looked at differences >>between block number values. It varies a bit, but of 522 block numbers, >>494 of them have a difference of 65536. >> >>Er, some duplicates also show up, but the b_flags values differ. >> >>I'm not cc'ing fs@freebsd on this just in case it's being seen >>as getting out of control. Feel free to fold them back in. >> >>Thanks again, >>k. > > > I does look like the hash function is having some trouble. > It has been completely revamped in 5.0, but is still using > a "power-of-2" hashing scheme in 4.X. I highly recommend > trying a scheme with non-power-of-2 base. Perhaps something > as simple as changing the hashing to use modulo rather than > logical & (e.g., in bufhash change from & bufhashmask to > % bufhashmask). > > Kirk McKusick > > Hi, Hope this isn't seen as spamming the list, but this should be the last of it I hope. I'll summarize findings briefly. More details at: http://www.bigshed.com/kernel/raid_full_problem and/or you can find our patches for what we finally did at: http://www.bigshed.com/kernel/ffs_vfsbio.diff We did re-newfs our raid as Kirk suggested. Stupidly, our data file and some test results were lost in the process (doh!). So we had to use a slightly different datafile for re-testing. Still 1.5Gb of mixed files/dir sizes. Anyway, it would appear that the new fs settings (average file size=48k, average files per dir = 1500) help some, but performance still suffers as the disk fills. We have a sample 'fix' for the hashtable in vfs_bio.c that uses all the blkno bits. It's in the diff link above. Use as you see fit. However, it too doesn't really address our symptoms significantly. Darn. Bogging down to 1Mb/sec and > 90% system seen. The only thing that really addressed our problem was going back to the 4.4 dirpref logic. We added a sysctl OID to support this on a system-wide basis. That's also in the diff patch. It would be nice if we could do this on a per fs basis via fs.h's fs_flags or some such, but perhaps this is too messy for future support. We can live with system-wide 4.4 semantics if necessary, as Doug White mentioned. If any of this does get addressed in 4.8 code, please let us (er, julian@vicor.com) know so we can clean up our kernel tree. Of course, any comments, suggestions, flames totally welcome. Thanks again for everyone's patience and assistance. regards, k -- Ken Marx, kmarx@vicor-nb.com Ramp up the solution space!! - http://www.bigshed.com/cgi-bin/speak.cgi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F9F26DB.6050207>