Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Dec 2001 01:18:51 +0100
From:      Uwe Doering <gemini@geminix.org>
To:        FreeBSD-gnats-submit@freebsd.org
Cc:        gemini@geminix.org
Subject:   kern/32659: VM and VNODE leak with vm.swap_idle_enabled=1
Message-ID:  <E16DE9j-000Neh-00@geminix.geminix.org>

next in thread | raw e-mail | index | archive | help

>Number:         32659
>Category:       kern
>Synopsis:       VM and VNODE leak with vm.swap_idle_enabled=1
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Dec 09 16:20:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Uwe Doering
>Release:        FreeBSD 4.4-STABLE i386
>Organization:
Private UNIX Site
>Environment:
System: FreeBSD srv00.private.geminix.org 4.4-STABLE FreeBSD 4.4-STABLE #23: Sun Dec  9 11:14:31 MET 2001     root@srv00.private.geminix.org:/usr/src/sys/compile/SERVER  i386

SMP kernel on ServerWorks chipset

>Description:
In /usr/src/sys/vm/vm_glue.c (around line 482) there are two
if() clauses that deal with VM_SWAP_IDLE, the flag triggered by
"sysctl vm.swap_idle_enabled=1".

For this code to actually work the two if() clauses need to be logical
mirrors. However, they aren't. The case
"p->p_slptime == swap_idle_threshold2" isn't covered at all. Since we
have a granularity of one second here it won't take a busy system
too long to pass the code with both variables equal.

What happens then is that no swap-out takes place and the for() loop
just iterates to the next process. However, since we already
incremented "vm->vm_refcnt" by one at this point and don't drop it by
calling vmspace_free() we effectively have a stuck VM object now.

When the process later exits this also prevents the VNODE of
the backing object from being released, so we have a stuck VNODE
as well. In my case it was "/bin/sh".

Originally, I started my investigation because I couldn't "umount"
the respective filesystem without "-f", for no apparent reason.
I found out that the VNODE of "/bin/sh" had a usage count of 12,
without any matching processes running. After some days of searching
I finally traced it back to the code location given above.

>How-To-Repeat:
Set "vm.swap_idle_enabled=1" and run
"make -j4 -DMAKE_KERBEROS4 -DMAKE_KERBEROS5 buildworld" in
an endless loop for some hours. Then try to "umount" the filesystem.

Normally this is a problem since I think you can't "umount" the root
filesystem where "/bin/sh" resides on an active system. So instead
you may want to try it via the VN driver. I used one of these file
based filesystems containing a system set up for jail() and actually
ran the "buildworld" loop inside a jail. Then it shouldn't be a
problem to "umount" the filesystem and see how it balks.

>Fix:
When you know where to look the reason for the problem and its fix
are obvious. I suggest fixing it in a way outlined by the patch
below. With that change the problem went away and I didn't notice
any side effects.

--- vm_glue.c.orig	Wed Nov 21 02:43:57 2001
+++ vm_glue.c	Fri Dec  7 16:42:09 2001
@@ -482,17 +482,14 @@
 			}
 			vm_map_unlock(&vm->vm_map);
 			/*
-			 * If the process has been asleep for awhile and had
-			 * most of its pages taken away already, swap it out.
+			 * All possibilities of sparing the process its
+			 * grim fate have been exhausted above, so bow
+			 * to the inevitable now and swap it out.
 			 */
-			if ((action & VM_SWAP_NORMAL) ||
-				((action & VM_SWAP_IDLE) &&
-				 (p->p_slptime > swap_idle_threshold2))) {
-				swapout(p);
-				vmspace_free(vm);
-				didswap++;
-				goto retry;
-			}
+			swapout(p);
+			vmspace_free(vm);
+			didswap++;
+			goto retry;
 		}
 	}
 	/*
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E16DE9j-000Neh-00>