From owner-freebsd-current@FreeBSD.ORG Sun Mar 6 01:20:41 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 02E4816A4D6; Sun, 6 Mar 2005 01:20:41 +0000 (GMT) Received: from bloodwood.hunterlink.net.au (smtp-local.hunterlink.net.au [203.12.144.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F9FD43D58; Sun, 6 Mar 2005 01:20:37 +0000 (GMT) (envelope-from boris@brooknet.com.au) Received: from localhost (ppp2DA6.dyn.pacific.net.au [61.8.45.166]) j261KTlt010181; Sun, 6 Mar 2005 12:20:30 +1100 Received: by localhost (Postfix, from userid 1001) id 701FB17D8; Sun, 6 Mar 2005 12:21:46 +1100 (EST) To: FreeBSD-gnats-submit@freebsd.org From: Sam Lawrance X-send-pr-version: 3.113 X-GNATS-Notify: Message-Id: <20050306012146.701FB17D8@localhost> Date: Sun, 6 Mar 2005 12:21:46 +1100 (EST) cc: current@freebsd.org Subject: Swapped out procs not brought in immediately after child exits X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Sam Lawrance List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 01:20:41 -0000 >Submitter-Id: current-users >Originator: Sam Lawrance >Confidential: no >Synopsis: Swapped out procs not brought in immediately after child exits >Severity: non-critical >Priority: medium >Category: kern >Class: sw-bug >Release: FreeBSD 5.4-PRERELEASE i386 >Environment: System: FreeBSD dirk.no.domain 5.4-PRERELEASE FreeBSD 5.4-PRERELEASE #10: Sun Ma r 6 10:45:13 EST 2005 root@dirk.no.domain:/usr/testbuild/src5/sys/i386/compile/G ENERIC i386 >Description: I run -stable on my lonely box, but AFAICS this affects current. This problem is similar in flavour to one that I reported a while ago, since fixed. Here's an example. Below we have a login, shell and su which have swapped out, and a shell which is active: root 4291 0.0 0.0 1664 0 v3 IWs - 0:00.00 login [pam] (login) sam 4298 0.0 0.0 2260 0 v3 IW - 0:00.00 -bash (bash) root 4299 0.0 0.0 1644 0 v3 IW - 0:00.00 su root 4300 0.0 0.4 2952 1132 v3 S+ 3:23PM 0:00.66 su (bash) When 4300 exits, it will sit in the zombie state for a long time, waiting for 4299 to be swapped in. Same for 4299 and 4298. The kernel call stack for 4300 would be something like exit1 kern_exit wakeup (parent process as wait channel) sleepq_broadcast sleepq_resume_thread (on parent process) setrunnable In setrunnable, curthread->td_pflags is flagged with TDP_WAKEPROC0 to indicate the vm scheduler should be awoken to do its thing. David Xu's original change was to check for TDP_WAKEPROC0 in critical_exit() and wakeup(&proc0) from there. Things were arranged this way in order to prevent an LOR between sched_lock and sleepqueue locks. That scheme doesn't take into account that exit1() does a critical_enter() that has no corresponding critical_exit() in that thread (because the exiting thread grabs sched_lock which is held across cpu_throw). So the wakeup is not done, and we just have to wait for the vm's tsleep on proc0 to time out. The same thing might occur across other exit points, but I don't know what they are. >How-To-Repeat: Run a shell somewhere (first). Su or run another shell or similar (second). Wait until the first shell has swapped out (might require running some other memory hogs). Exit the second shell. Notice that the second shell takes a long time to exit. >Fix: A possible solution might be to wakeup(&proc0) after waking the parent and before grabbing sched_lock: Index: kern_exit.c =================================================================== RCS file: /home/ncvs/FreeBSD/src/sys/kern/kern_exit.c,v retrieving revision 1.256 diff -u -r1.256 kern_exit.c --- kern_exit.c 29 Jan 2005 14:03:41 -0000 1.256 +++ kern_exit.c 6 Mar 2005 01:17:35 -0000 @@ -503,6 +503,7 @@ mtx_unlock_spin(&sched_lock); wakeup(p->p_pptr); PROC_UNLOCK(p->p_pptr); + wakeup(&proc0); mtx_lock_spin(&sched_lock); critical_exit();