Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Oct 2003 18:32:59 -0800
From:      Ken Marx <kmarx@vicor.com>
To:        Kirk McKusick <mckusick@beastie.mckusick.com>
Cc:        julian@elischer.org
Subject:   Re: 4.8 ffs_dirpref problem
Message-ID:  <3F9F26DB.6050207@vicor.com>
In-Reply-To: <200310261749.h9QHnieN015824@beastie.mckusick.com>
References:  <200310261749.h9QHnieN015824@beastie.mckusick.com>

next in thread | previous in thread | raw e-mail | index | archive | help


Kirk McKusick wrote:
>>Date: Thu, 23 Oct 2003 17:58:54 -0700
>>From: Ken Marx <kmarx@vicor.com>
>>To: Kirk McKusick <mckusick@mckusick.com>
>>CC: Julian Elischer <julian@vicor.com>, cburrell@vicor.com, davep@vicor.com,
>>       Ken Marx <kmarx@vicor.com>, gluk@ptci.ru, jpl@vicor.com, jrh@vicor.com,
>>       julian@vicor-nb.com, VicPE@aol.com
>>Subject: Re: 4.8 ffs_dirpref problem
>>X-ASK-Info: Whitelist match
>>
>>Hi Kirk,
>>
>>I had a few minutes before heading out, so tried getting a list
>>of block numbers in the bufferhash bucket that seemed to have
>>lots of hits. The depth changes of course, but I caught it  at
>>one point at a depth of 600 or so:
>> 
>>/kernel: dumpbh( 250 )
>>/kernel: bp[1]: b_vp=0xcfa3d480, b_lblkno=52561, b_flags=0x20100020
>>/kernel: bp[2]: b_vp=0xcf3c5d00, b_lblkno=345047104, b_flags=0x200000a0
>>...
>>
>>For no good reason, I sorted by block number and looked at differences
>>between block number values. It varies a bit, but of 522 block numbers,
>>494 of them have a difference of 65536.
>>
>>Er, some duplicates also show up, but the b_flags values differ.
>>
>>I'm not cc'ing fs@freebsd on this just in case it's being seen
>>as getting out of control. Feel free to fold them back in.
>>
>>Thanks again,
>>k.
> 
> 
> I does look like the hash function is having some trouble.
> It has been completely revamped in 5.0, but is still using
> a "power-of-2" hashing scheme in 4.X. I highly recommend 
> trying a scheme with non-power-of-2 base. Perhaps something
> as simple as changing the hashing to use modulo rather than 
> logical & (e.g., in bufhash change from & bufhashmask to
> % bufhashmask).
> 
> 	Kirk McKusick
> 
> 

Hi,

Hope this isn't seen as spamming the list, but this should
be the last of it I hope.

I'll summarize findings briefly. More details at:
	http://www.bigshed.com/kernel/raid_full_problem

and/or you can find our patches for what we finally did at:
	http://www.bigshed.com/kernel/ffs_vfsbio.diff

We did re-newfs our raid as Kirk suggested. Stupidly,
our data file and some test results were lost in the
process (doh!). So we had to use a slightly different
datafile for re-testing. Still 1.5Gb of mixed files/dir sizes.

Anyway, it would appear that the new fs settings
(average file size=48k, average files per dir = 1500)
help some, but performance still suffers as the disk fills.

We have a sample 'fix' for the hashtable in vfs_bio.c
that uses all the blkno bits. It's in the diff link above.
Use as you see fit. However, it too doesn't really address
our symptoms significantly. Darn.
Bogging down to 1Mb/sec and > 90% system seen.

The only thing that really addressed our problem was going
back to the 4.4 dirpref logic. We added a sysctl OID to
support this on a system-wide basis. That's also in the
diff patch.

It would be nice if we could do this on a per fs basis
via fs.h's fs_flags or some such, but perhaps this is too
messy for future support.

We can live with system-wide 4.4 semantics if necessary,
as Doug White mentioned.

If any of this does get addressed in 4.8 code, please
let us (er, julian@vicor.com) know so we can clean up
our kernel tree.

Of course, any comments, suggestions, flames totally welcome.

Thanks again for everyone's patience and assistance.

regards,
k
-- 
Ken Marx, kmarx@vicor-nb.com
Ramp up the solution space!!
		- http://www.bigshed.com/cgi-bin/speak.cgi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F9F26DB.6050207>