From owner-freebsd-bugs@FreeBSD.ORG Sun Mar 6 01:30:20 2005 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 440B316A4D5 for ; Sun, 6 Mar 2005 01:30:20 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id CA52643D1D for ; Sun, 6 Mar 2005 01:30:19 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.1/8.13.1) with ESMTP id j261UJmA067197 for ; Sun, 6 Mar 2005 01:30:19 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.1/8.13.1/Submit) id j261UJgd067196; Sun, 6 Mar 2005 01:30:19 GMT (envelope-from gnats) Resent-Date: Sun, 6 Mar 2005 01:30:19 GMT Resent-Message-Id: <200503060130.j261UJgd067196@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Sam Lawrance Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 02E4816A4D6; Sun, 6 Mar 2005 01:20:41 +0000 (GMT) Received: from bloodwood.hunterlink.net.au (smtp-local.hunterlink.net.au [203.12.144.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F9FD43D58; Sun, 6 Mar 2005 01:20:37 +0000 (GMT) (envelope-from boris@brooknet.com.au) Received: from localhost (ppp2DA6.dyn.pacific.net.au [61.8.45.166]) j261KTlt010181; Sun, 6 Mar 2005 12:20:30 +1100 Received: by localhost (Postfix, from userid 1001) id 701FB17D8; Sun, 6 Mar 2005 12:21:46 +1100 (EST) Message-Id: <20050306012146.701FB17D8@localhost> Date: Sun, 6 Mar 2005 12:21:46 +1100 (EST) From: Sam Lawrance To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 cc: current@FreeBSD.org Subject: kern/78474: Swapped out procs not brought in immediately after child exits X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Sam Lawrance List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 01:30:20 -0000 >Number: 78474 >Category: kern >Synopsis: Swapped out procs not brought in immediately after child exits >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Mar 06 01:30:19 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Sam Lawrance >Release: FreeBSD 5.4-PRERELEASE i386 >Organization: >Environment: System: FreeBSD dirk.no.domain 5.4-PRERELEASE FreeBSD 5.4-PRERELEASE #10: Sun Ma r 6 10:45:13 EST 2005 root@dirk.no.domain:/usr/testbuild/src5/sys/i386/compile/G ENERIC i386 >Description: I run -stable on my lonely box, but AFAICS this affects current. This problem is similar in flavour to one that I reported a while ago, since fixed. Here's an example. Below we have a login, shell and su which have swapped out, and a shell which is active: root 4291 0.0 0.0 1664 0 v3 IWs - 0:00.00 login [pam] (login) sam 4298 0.0 0.0 2260 0 v3 IW - 0:00.00 -bash (bash) root 4299 0.0 0.0 1644 0 v3 IW - 0:00.00 su root 4300 0.0 0.4 2952 1132 v3 S+ 3:23PM 0:00.66 su (bash) When 4300 exits, it will sit in the zombie state for a long time, waiting for 4299 to be swapped in. Same for 4299 and 4298. The kernel call stack for 4300 would be something like exit1 kern_exit wakeup (parent process as wait channel) sleepq_broadcast sleepq_resume_thread (on parent process) setrunnable In setrunnable, curthread->td_pflags is flagged with TDP_WAKEPROC0 to indicate the vm scheduler should be awoken to do its thing. David Xu's original change was to check for TDP_WAKEPROC0 in critical_exit() and wakeup(&proc0) from there. Things were arranged this way in order to prevent an LOR between sched_lock and sleepqueue locks. That scheme doesn't take into account that exit1() does a critical_enter() that has no corresponding critical_exit() in that thread (because the exiting thread grabs sched_lock which is held across cpu_throw). So the wakeup is not done, and we just have to wait for the vm's tsleep on proc0 to time out. The same thing might occur across other exit points, but I don't know what they are. >How-To-Repeat: Run a shell somewhere (first). Su or run another shell or similar (second). Wait until the first shell has swapped out (might require running some other memory hogs). Exit the second shell. Notice that the second shell takes a long time to exit. >Fix: A possible solution might be to wakeup(&proc0) after waking the parent and before grabbing sched_lock: Index: kern_exit.c =================================================================== RCS file: /home/ncvs/FreeBSD/src/sys/kern/kern_exit.c,v retrieving revision 1.256 diff -u -r1.256 kern_exit.c --- kern_exit.c 29 Jan 2005 14:03:41 -0000 1.256 +++ kern_exit.c 6 Mar 2005 01:17:35 -0000 @@ -503,6 +503,7 @@ mtx_unlock_spin(&sched_lock); wakeup(p->p_pptr); PROC_UNLOCK(p->p_pptr); + wakeup(&proc0); mtx_lock_spin(&sched_lock); critical_exit(); >Release-Note: >Audit-Trail: >Unformatted: