From owner-freebsd-stable Wed Jun 26 19:19:15 1996 Return-Path: owner-stable Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA13284 for stable-outgoing; Wed, 26 Jun 1996 19:19:15 -0700 (PDT) Received: from deceased.hb.north.de (deceased.hb.north.de [194.94.232.249]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id TAA13276 for ; Wed, 26 Jun 1996 19:19:06 -0700 (PDT) Received: from jelal.hb.north.de by deceased.hb.north.de with uucp (Smail3.1.93) id m0uZ6fU-0016CNC; Thu, 27 Jun 96 04:18:52 +0200 (MET DST) Received: by jelal.hb.north.de (SMail-ST 0.95gcc/2.5+) id AA00351; Thu, 27 Jun 1996 04:13:24 +0100 (CET) Received: (from nox@localhost) by saturn.hb.north.de (8.7.5/8.7.3) id EAA00559; Thu, 27 Jun 1996 04:03:13 +0200 (MET DST) From: Juergen Lock Message-Id: <199606270203.EAA00559@saturn.hb.north.de> Subject: Re: lockups. To: davidg@root.com Date: Thu, 27 Jun 1996 04:03:13 +0200 (MET DST) Cc: jhay@mikom.csir.co.za, stable@FreeBSD.org In-Reply-To: <199606260255.TAA12927@root.com> from David Greenman at "Jun 25, 96 07:55:34 pm" X-Mailer: ELM [version 2.4ME+ PL19 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-stable@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk David Greenman writes: >... > >> difference that you're describing. Change the #if 0 to a #if 1 at the end of > >> /sys/vm/vm_pageout.c. > > > >Thats just what i did, and i put back the remove-cached-objects-that- > >have-no-RSS code below the #if 0'd part. (havent tried the #if 1 alone) > > The other code that was removed has always been outside the while(1) loop, > so it is NEVER executed. Oops, didn't even notice that loop had no exit :) >... > Yes, that appears to be the problem. We're going to turn this code back on > after putting some controls on it to make sure it doesn't cause stability > problems. John will have a fix soon (if not already), so please install the > fix and get back to us ASAP. updated the kernel again (got vm_pageout.c 1.51.4.8, and had to update ipfw(8) as well btw), problem was back. maybe a little less serious but still bad enough. anyway after some experimenting i finally thought what if i just fixed the recursion counting... and yes, now it seems to behave itself again: Index: vm_pageout.c =================================================================== RCS file: /home/cvs/cvs/src/sys/vm/vm_pageout.c,v retrieving revision 1.51.4.8 diff -u -r1.51.4.8 vm_pageout.c --- vm_pageout.c 1996/06/26 08:19:48 1.51.4.8 +++ vm_pageout.c 1996/06/27 00:49:00 @@ -366,21 +366,23 @@ if (count == 0) count = 1; - (*recursion)++; if (*recursion > 5) return 0; if (object->pager && (object->pager->pg_type == PG_DEVICE)) return 0; + (*recursion)++; if (object->shadow) { if (object->shadow->ref_count == 1) dcount += vm_pageout_object_deactivate_pages(map, object->shadow, count / 2 + 1, map_remove_only, recursion); else vm_pageout_object_deactivate_pages(map, object->shadow, count, 1, recursion); } - if (object->paging_in_progress || !vm_object_lock_try(object)) + if (object->paging_in_progress || !vm_object_lock_try(object)) { + (*recursion)--; return dcount; + } /* * scan the objects entire memory queue @@ -461,6 +463,7 @@ p = next; } vm_object_unlock(object); + (*recursion)--; return dcount; } of course because of the limited recursion i still have some swapped-out processed with rss > 0 but thats apparently not enough critical mass anymore to trigger the thrashing, at least on this system. (or maybe make the recursion limit (the 5) configurable and allocate stack accordingly, would that be possible? i guess then that should be good enough until a `real' fix is ready...) does that help? cheers Juergen