From owner-freebsd-current@FreeBSD.ORG Thu Mar 31 18:37:22 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B42F1065675; Thu, 31 Mar 2011 18:37:22 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D96138FC17; Thu, 31 Mar 2011 18:37:20 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 8C89946B03; Thu, 31 Mar 2011 14:37:20 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 276B58A01B; Thu, 31 Mar 2011 14:37:20 -0400 (EDT) From: John Baldwin To: Attilio Rao Date: Thu, 31 Mar 2011 14:37:19 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201103311418.31658.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201103311437.19682.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Thu, 31 Mar 2011 14:37:20 -0400 (EDT) Cc: freebsd-current@freebsd.org, Svatopluk Kraus Subject: Re: schedcpu() in /sys/kern/sched_4bsd.c calls thread_lock() on thread with un-initialized td_lock X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2011 18:37:22 -0000 On Thursday, March 31, 2011 2:20:11 pm Attilio Rao wrote: > 2011/3/31 John Baldwin : > > On Thursday, March 31, 2011 12:34:31 pm Attilio Rao wrote: > >> 2011/3/31 John Baldwin : > >> > On Thursday, March 31, 2011 7:32:26 am Svatopluk Kraus wrote: > >> >> Hi, > >> >> > >> >> I've got a page fault (because of NULL td_lock) in > >> >> thread_lock_flags() called from schedcpu() in /sys/kern/sched_4bsd.c > >> >> file. During process fork, new thread is linked to new process which > >> >> is linked to allproc list and both allproc_lock and new process lock > >> >> are unlocked before sched_fork() is called, where new thread td_lock > >> >> is initialized. Only PRS_NEW process status is on sentry but not > >> >> checked in schedcpu(). > >> > > >> > I think this should fix it: > >> > > >> > Index: sched_4bsd.c > >> > =================================================================== > >> > --- sched_4bsd.c (revision 220190) > >> > +++ sched_4bsd.c (working copy) > >> > @@ -463,6 +463,10 @@ schedcpu(void) > >> > sx_slock(&allproc_lock); > >> > FOREACH_PROC_IN_SYSTEM(p) { > >> > PROC_LOCK(p); > >> > + if (p->p_state == PRS_NEW) { > >> > + PROC_UNLOCK(p); > >> > + continue; > >> > + } > >> > FOREACH_THREAD_IN_PROC(p, td) { > >> > awake = 0; > >> > thread_lock(td); > >> > > >> > >> I don't really think this fix is right because otherwise, when using > >> sched_4bsd anytime we are going to scan the thread list within a proc > >> we need to check for PRS_NEW. > >> > >> We likely need to change the init scheme for the td_lock by having a > >> scheduler primitive setting it and doing that on thread_init() UMA > >> constructor, or similar approach. > > > > But the thread state isn't valid anyway. 4BSD shouldn't be touching the > > thread since it is in an incomplete / undefined state. > > Yep, in this case I'd then want to just add the threads to proc once > they are fully initialized. > > It is pointless (and dangerous) to replicate this check all over, > besides we want scheduler agnostic code, which means every iterations > of p_threads will need to check for a valid state of threads. Yes, we do have to check for PRS_NEW in many places with the current approach, but we need some way to reserve the PID to avoid duplicates and unless we expand the scope of allproc in fork by a whole lot or stop using the allproc list to track "pids in use", we will be stuck with some sort of "process is still being built" sentry. -- John Baldwin