From owner-freebsd-stable@FreeBSD.ORG Mon Jun 30 15:01:00 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C15B837B401 for ; Mon, 30 Jun 2003 15:01:00 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 52C4243FE1 for ; Mon, 30 Jun 2003 15:01:00 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9/8.12.6) with ESMTP id h5UM0vVI025973 for ; Mon, 30 Jun 2003 15:00:58 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9/8.12.6/Submit) id h5UM0vg2025972; Mon, 30 Jun 2003 15:00:57 -0700 (PDT) Date: Mon, 30 Jun 2003 15:00:57 -0700 (PDT) From: Matthew Dillon Message-Id: <200306302200.h5UM0vg2025972@apollo.backplane.com> To: freebsd-stable@freebsd.org Subject: sysctl_out_proc() race in stable w/ patch X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jun 2003 22:01:01 -0000 There is a rather serious race with copyout() and process termination in -stable. sysctl_kern_proc() loops through the allproc list writing the results to user memory. If it stalls during the copyout (e.g. the user memory has to take a vm_fault) and the process is ripped out from under it it will go looping into never never land. The solution is very simple, simply PHOLD()/PRELE() the process during the copyout() and in exit1() wait for the lock count to go to 0. The patch below is a hack and cannot be directly applied due to other changes in my tree, but if someone would like to tackle this for 4.x I've provided you with a good start. The problem doesn't appear to be an issue in -current with the PROC lock held, though I am unsure if lock reentrancy is possible when a vm_fault is taken. -Matt Index: kern/kern_exit.c =================================================================== RCS file: /cvs/src/sys/kern/kern_exit.c,v retrieving revision 1.12 diff -u -r1.12 kern_exit.c --- kern/kern_exit.c 30 Jun 2003 19:50:31 -0000 1.12 +++ kern/kern_exit.c 30 Jun 2003 21:44:32 -0000 @@ -451,7 +451,19 @@ ... - KASSERT(p->p_lock == 0, ("p_lock not 0! %p", p)); + + /* + * Other kernel threads may be in the middle of + * accessing the proc. For example, kern/kern_proc.c + * could be blocked writing proc data to a sysctl. + * At the moment, if this occurs, we are not woken + * up and rely on a one-second retry. + */ + if (p->p_lock) { + printf("Diagnostic: waiting for p_lock\n"); + while (p->p_lock) + tsleep(p, PWAIT, "reap2", hz); + } /* charge childs scheduling cpu usage to parent */ if (curproc->p_pid != 1) { Index: kern/kern_proc.c =================================================================== RCS file: /cvs/src/sys/kern/kern_proc.c,v retrieving revision 1.7 diff -u -r1.7 kern_proc.c --- kern/kern_proc.c 28 Jun 2003 02:36:43 -0000 1.7 +++ kern/kern_proc.c 30 Jun 2003 21:46:25 -0000 @@ -529,8 +529,9 @@ if (!PRISON_CHECK(cr1, p->p_ucred)) continue; - + PHOLD(p); error = sysctl_out_proc(p, req, doingzomb); + PRELE(p); if (error) return (error); }