From owner-freebsd-hackers Fri Dec 1 11:20:43 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id CE72037B401 for ; Fri, 1 Dec 2000 11:20:35 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id eB1JIol53670; Fri, 1 Dec 2000 11:18:50 -0800 (PST) (envelope-from dillon) Date: Fri, 1 Dec 2000 11:18:50 -0800 (PST) From: Matt Dillon Message-Id: <200012011918.eB1JIol53670@earth.backplane.com> To: News History File User Cc: hackers@FreeBSD.ORG, usenet@tdk.net, soren@wasabisystems.com Subject: Re: vm_pageout_scan badness References: <200012011044.eB1Ai4353062@newsmangler.inet.tele.dk> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> Personally speaking, I would much rather use MAP_NOSYNC anyway, even with :> a fixed filesystem syncer. MAP_NOSYNC pages are not restricted by :... : :Yeah, no kidding -- here's what I see it screwing up. First, some :background: : :I've built three news machines, two transit boxen and one reader box, :with recent INN k0dez, and 4.2-STABLE of a few days ago (having tested :NetBSD, more on that later), and a brief detour into 5-current. :.. : :Everything starts out well, where the history disk is beaten at startup :but as time passes, the time taken to do lookups and writes drops down :to near-zero levels, and the disk gets quiet. And actually, the transit :... : :What I notice is that the amount of memory used keeps increasing, until :it's all used, and the Free amount shown by `top' drops to a meg or so. :Cache and Buf get a bit, but most of it is Active. Far more than is :accounted for by the processes. This is to be expected, because the dirty MAP_NOSYNC pages will not be written out until they are forced out, or by msync(). :Now, what happens on the reader machine is that after some time of the :Active memory increasing, it runs out and starts to swap out processes, :and the timestamps on the history database files (.index and .hash, this :is the md5-based history) get updated, rather than remaining at the :time INN is started. Then the rapid history times skyrocket until it :takes more than 1/4 of the time. I don't see this on the transit boxen :even after days of operation. Hmm. That doesn't sound right. Free memory should drop to near zero, but then what should happen is the pageout daemon should come along and deactivate a big chunk of the 'active' pages... so you should see a situation where you have, say, 200MB worth of active pages and 200MB worth of inactive pages. After that the pageout daemon should start paging out the inactive pages and increasing the 'cache'. The number of 'free' pages will always be near zero, which is to be expected. But it should not be swapping out any process. The actual amount of 'free' memory in the system is actually 'free+cache' pages. :Now, what happens when I stop INN and everything news-related is that :some memory is freed up, but still, there can be, say, 400MB still :reported as Active. More when I had a full gig in this machine to :... : :Then, when I reboot the machine, it gives the kernel messages about :syncing disks; done, and then suddenly the history drive light goes :on and it starts grinding for five minutes or so, before the actual :reboot happens. Right. This is to be expected. You have a lot of dirty pages in the system due to the use of MAP_NOSYNC that have to be flushed out. :No history activity happens when I shut down INN normally, which should :free the MAP_NOSYNC'ed pages and make them available to be written to :disk before rebooting, maybe. MAP_NOSYNC pages are not flushed when the referencing program exits. They stick around until they are forced out. You can flush them manually by using a mmap()/msync() combination. i.e. an msync() prior to munmap()ing (from INND only) ought to do it. :What I think is happening, based on these observations, is that the :data from the history hash files (less than 100MB) gets read into :memory, but the updates to it are not written over the data to be :replaced -- it's simply appended to, up to the limit of the available :memory. When this limit is reached on the transit machines, then :things stabilize and old pages get recycled (but still, more memory :overall is used than the size of the actual file). It doesn't append... the pages are reused. The set of 'active' pages in the VM system is effectively the set of all files accessed for the entire system, not just MAP_NOSYNC pages. If you are only MAP_NOSYNC'ing 100MB worth of pages, then only 100MB worth of pages will be left unflushed. Is it possible that history file rewriting is creating an issue? Doesn't INN rewrite the history file every once in a while to clear out old garbage? I'm not up on the latest INN. :I'm guessing that additional activity of the reader machine causes :jumps in memory usage not seen on the transit machines, that is enough :to force some of the unwritten dirty pages to be written to the :history file, as a few megs of swap get used, which is why it does :not stabilize as `nicely' as the transit machines. This makes sense... the amount of swap that gets used is critical. If we are talking about only a few megabytes, then your system is *not* swapping significantly, it is simply swapping out completely idle pages from things like idle getty's and such. This is a good thing. The disk activity would thus be mostly due to MAP_NOSYNC pages being written out. :Now, something I contemplated -- it seems that Bad Undesirable Things :happen as soon as I start to actually swap, that I'd prefer to avoid. :What I'm wondering is if I can avoid this by adjusting some of the :values I see in `top' for Cache, Buf, and most importantly, Free. :May I ask where (in which source file) these ratios or limits or :whatever are set? I'm hoping I can up the `Free' limit to a few :dozen megs to give headroom before actual swapping happens, since :now the Free value is a meg or two out of, oh, a gig available... : :Anyway, once this happens, performance sucks rocks, the history :drive light is enough to read by (or should I say, it keeps me from :getting much-needed sleep), and apparently only a reboot can free :up memory for better purposes. : :I've also only a small margin of memory headroom on the transit :machines, but much more on the reader machine, that can benefit :from cache far more, in case this makes any difference. But I think :I also saw this steady increase when I first started with something :like 256M. Well, the performance sucking part means something is not working as designed. The question is what. :I just now noticed that you made a patch available just over a month :ago; I'm not sure if it would affect what I'm seeing here at all, or :if it's already in the recent source I've built. I committed a number of patches just after 4.2-REL. Your recent system may or may not have them. Here is what I would recommend. First, I would use 'systat -vm 1' and carefully examine the pageout/swapout activity. If the SWAP PAGER has no significant activity then we can discard it as a possible problem. If the VN PAGER has significant activity, then this is what we need to focus on. I would try changing the pageout and VM cache parameters. Do NOT mess with the VM free parameters! Try changing the vm.v_cache_min and vm.v_cache_max parameters. For example, increase vm.v_cache_max to widen the hysteresis. You can slo try changing vm.pageout_algorithm from 0 to 1 (this is not likely to have much of an effect), and you can also try increasing vm.max_page_launder, e.g. from 32 to 100 (much larger would not have any effect). Finally, you can try increasing the vm.v_inactive_target. Do not increase the vm.v_free_target. Do NOT mess with any of the v_*free* sysctl's, not unless you want to destabilize your box! Last thing: Using MAP_NOSYNC has a well known problem when used to fill 'holes' in files. That is, if the history file is being appended to by calling ftruncate(), but the new space is not write()n to and instead is dirtied via the mmap, you will have a serious fragmentation problem with the file. In order to avoid this problem any file appends should occur using write() if possible, or the newly allocated space in the file should be filled with zero's using write() prior to being random-accessed by mmap() (which might be easier to implement). -Matt :And, in an earlier message in this thread, concerning something :related but different as far as I can make out: : :disclaimer: i really don't know what I'm talking about, so be gentle :when flaming me, thanks :(reply-to header is valid) :barry bouwsma, thwarted in all my attempts to build a good readerbox To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message