From owner-freebsd-hackers  Fri Dec  1 11:20:43 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP id CE72037B401
	for <hackers@FreeBSD.ORG>; Fri,  1 Dec 2000 11:20:35 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id eB1JIol53670;
	Fri, 1 Dec 2000 11:18:50 -0800 (PST)
	(envelope-from dillon)
Date: Fri, 1 Dec 2000 11:18:50 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200012011918.eB1JIol53670@earth.backplane.com>
To: News History File User <newsuser@free-pr0n.netscum.dk>
Cc: hackers@FreeBSD.ORG, usenet@tdk.net, soren@wasabisystems.com
Subject: Re: vm_pageout_scan badness
References:  <200012011044.eB1Ai4353062@newsmangler.inet.tele.dk>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:>     Personally speaking, I would much rather use MAP_NOSYNC anyway, even with
:>     a fixed filesystem syncer.   MAP_NOSYNC pages are not restricted by
:...
:
:Yeah, no kidding -- here's what I see it screwing up.  First, some
:background:
:
:I've built three news machines, two transit boxen and one reader box,
:with recent INN k0dez, and 4.2-STABLE of a few days ago (having tested
:NetBSD, more on that later), and a brief detour into 5-current.
:..
:
:Everything starts out well, where the history disk is beaten at startup
:but as time passes, the time taken to do lookups and writes drops down
:to near-zero levels, and the disk gets quiet.  And actually, the transit
:...
:
:What I notice is that the amount of memory used keeps increasing, until
:it's all used, and the Free amount shown by `top' drops to a meg or so.
:Cache and Buf get a bit, but most of it is Active.  Far more than is
:accounted for by the processes.

    This is to be expected, because the dirty MAP_NOSYNC pages will not
    be written out until they are forced out, or by msync().

:Now, what happens on the reader machine is that after some time of the
:Active memory increasing, it runs out and starts to swap out processes,
:and the timestamps on the history database files (.index and .hash, this
:is the md5-based history) get updated, rather than remaining at the
:time INN is started.  Then the rapid history times skyrocket until it
:takes more than 1/4 of the time.  I don't see this on the transit boxen
:even after days of operation.

    Hmm.  That doesn't sound right.  Free memory should drop to near zero,
    but then what should happen is the pageout daemon should come along
    and deactivate a big chunk of the 'active' pages... so you should
    see a situation where you have, say, 200MB worth of active pages
    and 200MB worth of inactive pages.  After that the pageout daemon
    should start paging out the inactive pages and increasing the 'cache'.
    The number of 'free' pages will always be near zero, which is to be
    expected.  But it should not be swapping out any process.

    The actual amount of 'free' memory in the system is actually 'free+cache'
    pages.

:Now, what happens when I stop INN and everything news-related is that
:some memory is freed up, but still, there can be, say, 400MB still
:reported as Active.  More when I had a full gig in this machine to
:...
:
:Then, when I reboot the machine, it gives the kernel messages about
:syncing disks; done, and then suddenly the history drive light goes
:on and it starts grinding for five minutes or so, before the actual
:reboot happens.

    Right.  This is to be expected.  You have a lot of dirty pages
    in the system due to the use of MAP_NOSYNC that have to be flushed
    out.

:No history activity happens when I shut down INN normally, which should
:free the MAP_NOSYNC'ed pages and make them available to be written to
:disk before rebooting, maybe.

    MAP_NOSYNC pages are not flushed when the referencing program exits.
    They stick around until they are forced out.  You can flush them
    manually by using a mmap()/msync() combination.  i.e. an msync() prior
    to munmap()ing (from INND only) ought to do it.

:What I think is happening, based on these observations, is that the
:data from the history hash files (less than 100MB) gets read into
:memory, but the updates to it are not written over the data to be
:replaced -- it's simply appended to, up to the limit of the available
:memory.  When this limit is reached on the transit machines, then
:things stabilize and old pages get recycled (but still, more memory
:overall is used than the size of the actual file).

    It doesn't append... the pages are reused.  The set of 'active'
    pages in the VM system is effectively the set of all files accessed
    for the entire system, not just MAP_NOSYNC pages.  If you are only
    MAP_NOSYNC'ing 100MB worth of pages, then only 100MB worth of pages
    will be left unflushed.

    Is it possible that history file rewriting is creating an issue?  Doesn't
    INN rewrite the history file every once in a while to clear out old
    garbage?  I'm not up on the latest INN.

:I'm guessing that additional activity of the reader machine causes
:jumps in memory usage not seen on the transit machines, that is enough
:to force some of the unwritten dirty pages to be written to the
:history file, as a few megs of swap get used, which is why it does
:not stabilize as `nicely' as the transit machines.

    This makes sense... the amount of swap that gets used is critical.
    If we are talking about only a few megabytes, then your system is
    *not* swapping significantly, it is simply swapping out completely
    idle pages from things like idle getty's and such.  This is a good
    thing.  The disk activity would thus be mostly due to MAP_NOSYNC pages
    being written out.

:Now, something I contemplated -- it seems that Bad Undesirable Things
:happen as soon as I start to actually swap, that I'd prefer to avoid.
:What I'm wondering is if I can avoid this by adjusting some of the
:values I see in `top' for Cache, Buf, and most importantly, Free.
:May I ask where (in which source file) these ratios or limits or
:whatever are set?  I'm hoping I can up the `Free' limit to a few
:dozen megs to give headroom before actual swapping happens, since
:now the Free value is a meg or two out of, oh, a gig available...
:
:Anyway, once this happens, performance sucks rocks, the history
:drive light is enough to read by (or should I say, it keeps me from
:getting much-needed sleep), and apparently only a reboot can free
:up memory for better purposes.
:
:I've also only a small margin of memory headroom on the transit
:machines, but much more on the reader machine, that can benefit
:from cache far more, in case this makes any difference.  But I think
:I also saw this steady increase when I first started with something
:like 256M.

    Well, the performance sucking part means something is not working
    as designed.  The question is what.

:I just now noticed that you made a patch available just over a month
:ago; I'm not sure if it would affect what I'm seeing here at all, or
:if it's already in the recent source I've built.

    I committed a number of patches just after 4.2-REL.  Your recent
    system may or may not have them.

    Here is what I would recommend.  First, I would use 'systat -vm 1'
    and carefully examine the pageout/swapout activity.  If the SWAP PAGER
    has no significant activity then we can discard it as a possible problem.
    If the VN PAGER has significant activity, then this is what we need
    to focus on.  

    I would try changing the pageout and VM cache parameters.  Do NOT mess
    with the VM free parameters!  Try changing the vm.v_cache_min and
    vm.v_cache_max parameters.  For example, increase vm.v_cache_max to
    widen the hysteresis.  You can slo try changing vm.pageout_algorithm
    from 0 to 1 (this is not likely to have much of an effect), and you
    can also try increasing vm.max_page_launder, e.g. from 32 to
    100 (much larger would not have any effect).  Finally, you can
    try increasing the vm.v_inactive_target.  Do not increase the
    vm.v_free_target.

    Do NOT mess with any of the v_*free* sysctl's, not unless you want
    to destabilize your box!

    Last thing:  Using MAP_NOSYNC has a well known problem when used to
    fill 'holes' in files.  That is, if the history file is being appended
    to by calling ftruncate(), but the new space is not write()n to and
    instead is dirtied via the mmap, you will have a serious fragmentation
    problem with the file.  In order to avoid this problem any file appends
    should occur using write() if possible, or the newly allocated space in
    the file should be filled with zero's using write() prior to being 
    random-accessed by mmap() (which might be easier to implement).

						-Matt

:And, in an earlier message in this thread, concerning something
:related but different as far as I can make out:
:
:disclaimer:  i really don't know what I'm talking about, so be gentle
:when flaming me, thanks
:(reply-to header is valid)
:barry bouwsma, thwarted in all my attempts to build a good readerbox


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message