From owner-freebsd-hackers Tue Oct 24 15:14:19 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 557B537B479; Tue, 24 Oct 2000 15:14:15 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e9OMEEV20728; Tue, 24 Oct 2000 15:14:14 -0700 (PDT) Date: Tue, 24 Oct 2000 15:14:14 -0700 From: Alfred Perlstein To: Matt Dillon Cc: ps@FreeBSD.ORG, hackers@FreeBSD.ORG Subject: Re: vm_pageout_scan badness Message-ID: <20001024151414.P28123@fw.wintelcom.net> References: <20001024112708.E28123@fw.wintelcom.net> <200010242010.e9OKAJK19739@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <200010242010.e9OKAJK19739@earth.backplane.com>; from dillon@earth.backplane.com on Tue, Oct 24, 2000 at 01:10:19PM -0700 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Matt Dillon [001024 13:11] wrote: > Ouch. The original VM code assumed that pages would not often be > ripped out from under the pageadaemon, so it felt free to restart > whenever. I think you are absolutely correct in regards to the > clustering code causing nearby-page ripouts. Yes, it would make sense to me that if you did a sequential write to a file after some time it would be likely that those pages would be put in order on the inactive queue and when cluster written 'next' would be on a different queue as it was written along with the preceeding page. > I don't have much time available, but let me take a crack at the > problem tonight. I don't think we want to add another workaround to > code that already has too many of them. The solution may be > to create a dummy placemarker vm_page_t and to insert it into the pagelist > just after the current page after we've locked it and decided we have > to do something significant to it. We would then be able to pick the > scan up where we left off using the placemarker. > > This would allow us to get rid of the restart code entirely, or at least > devolve it back into its original design (i.e. something that would not > happen very often). Since we already have cache locality of reference for > the list node, the placemarker idea ought to be quite fast. > > I'll take a crack at implementing the openbsd (or was it netbsd?) partial > fsync() code as well, to prevent the update daemon from locking up large > files that have lots of dirty pages for long periods of time. Making the partial fsync would help some people but probably not these folks. The people getting hit by this are Yahoo! boxes, they have giant areas of NOSYNC mmap'd data, the issue is that for them the first scan through the loop always sees dirty data that needs to be written out. I think they also need a _lot_ more than 32 pages cleaned per pass because all of thier pages need laundering. Perhaps if you detected how often the routine was being called you could slowly raise max_page_launder to compensate and lower it after some time without a shortage. Perhaps adding a quarter of 'should_have_laundered' to maxlaunder for a short interval. It might be wise to switch to a 'launder mode' if this sort of usage pattern is detected and figure some better figure to use than 32, I was hoping you'd have some suggestions for a heuristic to detect this along the lines of what you have implemented in bufdaemon. What you could also do is count the amount of pages that could/should have been laundered during the first pass and if it exceeds a certain threshold passing the amount of pages that were free'd via: if (m->object->ref_count == 0) { and: if (m->valid == 0) { and: } else if (m->dirty == 0) { basically if maxlaunder is equal to zero and we miss all those tests you might want to bump up a counter and if it exceeds a threshold immediately start rescanning and double(?) maxlaunder. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message