From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 12 16:16:24 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 54AA916A40B
	for <freebsd-arch@freebsd.org>; Mon, 12 Mar 2007 16:16:24 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net
	[66.23.211.162])
	by mx1.freebsd.org (Postfix) with ESMTP id E095213C448
	for <freebsd-arch@freebsd.org>; Mon, 12 Mar 2007 16:16:23 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from mutex.atlanta.corp.yahoo.com
	(nat-outside.atlanta.corp.yahoo.com [63.172.193.57])
	(authenticated bits=0)
	by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id l2CGG494062817;
	Mon, 12 Mar 2007 11:16:05 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: LI Xin <delphij@delphij.net>
Date: Mon, 12 Mar 2007 10:17:25 -0400
User-Agent: KMail/1.9.1
References: <45F2C2CB.5000204@delphij.net>
In-Reply-To: <45F2C2CB.5000204@delphij.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200703121017.25782.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [66.23.211.162]);
	Mon, 12 Mar 2007 11:16:05 -0500 (EST)
X-Virus-Scanned: ClamAV 0.88.3/2823/Mon Mar 12 04:55:20 2007 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: MingyanGuo <guomingyan@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: locking reasoning within fork1()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Mar 2007 16:16:24 -0000

On Saturday 10 March 2007 09:38, LI Xin wrote:
> Hi,
> 
> During the AsiaBSDCon DevSummit we have go through the current KSE and
> some userland threading code, and I think that brings me back to the
> fork1() vs others races.
> 
> The current logic, especially the locking order found in fork1() looks
> not very ideal according to my read.  I have pursued some code from
> other BSDs, and I think we might want to address the following problems:
> 
>  - At which point we should consider that a process really exists?  At
> this point, there is no clear point that we can call a process as
> "really born".  It looks to me that PRS_NEW just indicate that a process
> is not "fully initialized", but it does not provide information about
> "how much initialization did we done".  This would make several
> operation very questionable, and is more error-prone.  As Guo (cc'ed)
> pointed out, there are chances that kill(0, ..) and kill(-1, ..) would
> not cover PRS_NEW processes, there might be also some other places where
> should take care of.

This is why I had advocated using a sleep so that consumers either ignore 
PRS_NEW processes or wait until they are completely initalized and 
PRS_NORMAL.

>  - The locking scheme does not look pretty.  We grab and release locks
> again and again, and it might be more optimal to collapse some work
> together, and re-consider synchornization with other parts of the kernel.

To a large extent this reorganization has already been done where possible.

>  - Certain parts of struct proc is mostly not accessed frequently.  For
> the sake of better exploit of cache, we may want to consider to move
> certain parts out from the struct.

You mean to the bottom of the struct maybe?  I'm not sure the overhead of 
having separately allocated structures and extra pointer indirections will do 
anything but hurt.

>  - The PID allocation is somewhat expensive when there are a lot of
> processes.  This might not be a very big deal, though, but given that it
> requires to hold a sx_xlock, our scalability could be limited due to
> this.  tjr@ has a proposed hash based PID allocation patch in his p4
> branch, and NetBSD have an O(1) algorithm that may worth to have a look at.

This has been brought up before, and when tjr's stuff was tested it didn't 
help IIRC.  Part of the issue here is that pid space is not just a simple 
walk of processes, but also of process groups and sessions.  If you didn't 
want to walk all the data structures you'd have to have some sort of PID 
reference counting for the 3 possible references on a pid: process, pgrp, and 
session.

-- 
John Baldwin