Date: Sun, 9 May 2004 20:10:13 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Brian Fundakowski Feldman <green@FreeBSD.org> Cc: current@FreeBSD.org Subject: Re: 5.x w/auto-maxusers has insane kern.maxvnodes Message-ID: <20040509191554.T8241@gamplex.bde.org> In-Reply-To: <200405090518.i495IUpL073464@green.homeunix.org> References: <200405090518.i495IUpL073464@green.homeunix.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 9 May 2004, Brian Fundakowski Feldman wrote: > Brian Fundakowski Feldman <green@FreeBSD.org> wrote: > > I have a 512MB system and had to adjust kern.maxvnodes (desiredvnodes) down > > to something reasonable after discovering that it was the sole cause of too > > much paging for my workstation. The target number of vnodes was set to > > 33000, which would not be so bad if it did not also cause so many more > > UFS, VM and VFS objects, and the VM objects' associated inactive cache > > pages, lying around. I ended up saving a good 100MB of memory just > > adjusting kern.maxvnodes back down to something reasonable. Here are the > > current allocations (and some of the peak values): The default for desiredvnodes is almost perfect for my main application of running makeworld and otherwise working with the entire src tree. Actually, it's too low with 512MB and almost perfect with 1024MB. The latter gives desiredvnodes = 70240, and there are 47742 vnodes in my src tree (a few hundred extras). 512MB is also not quite enough for caching the whole src tree (mine has 476358 1K-blocks according to du). In one application involving 2 src trees (slightly reduced to get them both cached in 1024MB of which only about 800MB is available for VMIO pages), I needed to increase kern.vnodes to 90000+ to avoid disk accesses for inodes. Caching them in vnodes didn't work because the default number of vnodes wasn't enough, and caching them in VMIO pages didn't work for some reason (either because I was testing a filesystem that was missing VMIO for metadata, or because the replacement policy didn't work -- when inodes are cached in vnodes and not written to due to mounting with noatime, they get discarded from VMIO and then when theire vnode gets recycled they aren't cached anywhere). Since 512MB isn't enough to cache everything for makeworld, the default of 33000+ vnodes won't help much, and a better target might be to cache everything in /sys. 15000 vnodes and a couple of hundred MB is enough for that unless you build too many modules or kernels. > > ITEM SIZE LIMIT USED FREE REQUESTS > > FFS2 dinode: 256, 0, 12340, 95, 1298936 > > FFS1 dinode: 128, 0, 315, 3901, 2570969 > > FFS inode: 140, 0, 12655, 14589, 3869905 > > L VFS Cache: 291, 0, 5, 892, 51835 > > S VFS Cache: 68, 0, 13043, 23301, 4076311 > > VNODE: 260, 0, 32339, 16, 32339 > > VM OBJECT: 132, 0, 10834, 24806, 2681863 I don't use ffs2 (nice to see ffs* spelled right), so I have slightly smaller oveheads. > > (The number of VM pages allocated specifically to vnodes is not something > > easy to determine other than the fact that I saved so much memory even > > without the objects themselves, after uma_zfree(), having been reclaimed.) The number of VMIO pages is also hard to determine. systat's "inact" count gives an approximate value for the amount of VMIO memory, but various stats utilities' "buf" count gives a useless value. VMIO pages are easier to flush (unmount works for them). > > We really need to look into making the desiredvnodes default target more > > sane before 5.x is -STABLE or people are going to be very surprised > > switching from 4.x and seeing paging increase substantially. One more 5.x has bloat everywhere? Is desiredvnodes the worst part of it? I haven't noticed its bloat especially. Not long ago (in early 4.x?), the number of vnodes was unbounded and there were bugs like the ufs inode allocation doubling due to the required amount growing for bogus reasons to just larger than a power of 2 (so that power of 2 allocation almost doubled it). > > but why are they not already like that? One last good example I personally > > see of wastage-by-virtue-of-zfree-function is the page tables on i386: > > PV ENTRY: 28, 938280, 59170, 120590, 199482221 > > Once again, why do those actually need to be non-reclaimable? I haven't noticed much wastage for PV ENTRY. Right now, I have only the following large memory consumers in uma, but the system hasn't been up long and the measurement is distorted by recently reading the src tree: %%% ITEM SIZE LIMIT USED FREE REQUESTS FFS1 dinode: 128, 0, 52202, 33, 563854 FFS inode: 140, 0, 52202, 46, 563854 S VFS Cache: 68, 0, 52494, 75, 573456 VNODE: 260, 0, 52209, 21, 52209 2048: 2048, 0, 123, 2843, 20845 PV ENTRY: 28, 1494920, 4438, 2282, 703502 VM OBJECT: 132, 0, 52355, 85, 495338 %%% PV ENTRY's are small, so the 2048's waste a lot more. It's hard to see what they are for; vmstat -z never showed as much as vmstat -m, and vmstat -m is not as good as it used to be. > It really doesn't seem appropriate to _ever_ scale maxvnodes (desiredvnodes) > up that high just because I have 512MB of RAM. Like most things, the best value depends on the workload. Sinc the number of vnodes that can be handled scales with the amount of memory, it seems reasonable for the default to scale with the amount of memory. -current needs a larger scale factor than RELENG_4 if anything, since it has more files. Combined with more costs per file, it could easily need twice as much real memory as RELENG_4 for equivalent disk caching. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040509191554.T8241>