From owner-freebsd-arch@FreeBSD.ORG Mon Mar 12 16:16:24 2007 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 54AA916A40B for ; Mon, 12 Mar 2007 16:16:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.freebsd.org (Postfix) with ESMTP id E095213C448 for ; Mon, 12 Mar 2007 16:16:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mutex.atlanta.corp.yahoo.com (nat-outside.atlanta.corp.yahoo.com [63.172.193.57]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id l2CGG494062817; Mon, 12 Mar 2007 11:16:05 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: LI Xin Date: Mon, 12 Mar 2007 10:17:25 -0400 User-Agent: KMail/1.9.1 References: <45F2C2CB.5000204@delphij.net> In-Reply-To: <45F2C2CB.5000204@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200703121017.25782.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [66.23.211.162]); Mon, 12 Mar 2007 11:16:05 -0500 (EST) X-Virus-Scanned: ClamAV 0.88.3/2823/Mon Mar 12 04:55:20 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-3.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: MingyanGuo , freebsd-arch@freebsd.org Subject: Re: locking reasoning within fork1() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Mar 2007 16:16:24 -0000 On Saturday 10 March 2007 09:38, LI Xin wrote: > Hi, > > During the AsiaBSDCon DevSummit we have go through the current KSE and > some userland threading code, and I think that brings me back to the > fork1() vs others races. > > The current logic, especially the locking order found in fork1() looks > not very ideal according to my read. I have pursued some code from > other BSDs, and I think we might want to address the following problems: > > - At which point we should consider that a process really exists? At > this point, there is no clear point that we can call a process as > "really born". It looks to me that PRS_NEW just indicate that a process > is not "fully initialized", but it does not provide information about > "how much initialization did we done". This would make several > operation very questionable, and is more error-prone. As Guo (cc'ed) > pointed out, there are chances that kill(0, ..) and kill(-1, ..) would > not cover PRS_NEW processes, there might be also some other places where > should take care of. This is why I had advocated using a sleep so that consumers either ignore PRS_NEW processes or wait until they are completely initalized and PRS_NORMAL. > - The locking scheme does not look pretty. We grab and release locks > again and again, and it might be more optimal to collapse some work > together, and re-consider synchornization with other parts of the kernel. To a large extent this reorganization has already been done where possible. > - Certain parts of struct proc is mostly not accessed frequently. For > the sake of better exploit of cache, we may want to consider to move > certain parts out from the struct. You mean to the bottom of the struct maybe? I'm not sure the overhead of having separately allocated structures and extra pointer indirections will do anything but hurt. > - The PID allocation is somewhat expensive when there are a lot of > processes. This might not be a very big deal, though, but given that it > requires to hold a sx_xlock, our scalability could be limited due to > this. tjr@ has a proposed hash based PID allocation patch in his p4 > branch, and NetBSD have an O(1) algorithm that may worth to have a look at. This has been brought up before, and when tjr's stuff was tested it didn't help IIRC. Part of the issue here is that pid space is not just a simple walk of processes, but also of process groups and sessions. If you didn't want to walk all the data structures you'd have to have some sort of PID reference counting for the 3 possible references on a pid: process, pgrp, and session. -- John Baldwin