From owner-freebsd-hackers  Tue Oct 24 15:14:19 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 557B537B479; Tue, 24 Oct 2000 15:14:15 -0700 (PDT)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id e9OMEEV20728;
	Tue, 24 Oct 2000 15:14:14 -0700 (PDT)
Date: Tue, 24 Oct 2000 15:14:14 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Matt Dillon <dillon@earth.backplane.com>
Cc: ps@FreeBSD.ORG, hackers@FreeBSD.ORG
Subject: Re: vm_pageout_scan badness
Message-ID: <20001024151414.P28123@fw.wintelcom.net>
References: <20001024112708.E28123@fw.wintelcom.net> <200010242010.e9OKAJK19739@earth.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.4i
In-Reply-To: <200010242010.e9OKAJK19739@earth.backplane.com>; from dillon@earth.backplane.com on Tue, Oct 24, 2000 at 01:10:19PM -0700
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Matt Dillon <dillon@earth.backplane.com> [001024 13:11] wrote:
>     Ouch.  The original VM code assumed that pages would not often be
>     ripped out from under the pageadaemon, so it felt free to restart
>     whenever.  I think you are absolutely correct in regards to the
>     clustering code causing nearby-page ripouts.

Yes, it would make sense to me that if you did a sequential write
to a file after some time it would be likely that those pages would
be put in order on the inactive queue and when cluster written
'next' would be on a different queue as it was written along with
the preceeding page.

>     I don't have much time available, but let me take a crack at the
>     problem tonight.  I don't think we want to add another workaround to
>     code that already has too many of them.  The solution may be
>     to create a dummy placemarker vm_page_t and to insert it into the pagelist
>     just after the current page after we've locked it and decided we have
>     to do something significant to it.  We would then be able to pick the
>     scan up where we left off using the placemarker.
> 
>     This would allow us to get rid of the restart code entirely, or at least
>     devolve it back into its original design (i.e. something that would not
>     happen very often).  Since we already have cache locality of reference for
>     the list node, the placemarker idea ought to be quite fast.
> 
>     I'll take a crack at implementing the openbsd (or was it netbsd?) partial
>     fsync() code as well, to prevent the update daemon from locking up large
>     files that have lots of dirty pages for long periods of time.

Making the partial fsync would help some people but probably not
these folks.

The people getting hit by this are Yahoo! boxes, they have giant areas
of NOSYNC mmap'd data, the issue is that for them the first scan through
the loop always sees dirty data that needs to be written out.  I think
they also need a _lot_ more than 32 pages cleaned per pass because all
of thier pages need laundering.

Perhaps if you detected how often the routine was being called you
could slowly raise max_page_launder to compensate and lower it
after some time without a shortage.  Perhaps adding a quarter of
'should_have_laundered' to maxlaunder for a short interval.

It might be wise to switch to a 'launder mode' if this sort of
usage pattern is detected and figure some better figure to use than
32, I was hoping you'd have some suggestions for a heuristic to
detect this along the lines of what you have implemented in bufdaemon.

What you could also do is count the amount of pages that could/should have 
been laundered during the first pass and if it exceeds a certain threshold
passing the amount of pages that were free'd via:

		if (m->object->ref_count == 0) {
and:
		if (m->valid == 0) {
and:
		} else if (m->dirty == 0) {

basically if maxlaunder is equal to zero and we miss all those tests
you might want to bump up a counter and if it exceeds a threshold
immediately start rescanning and double(?) maxlaunder.

-Alfred


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message