From owner-cvs-src@FreeBSD.ORG Wed Oct 24 19:14:47 2007 Return-Path: Delivered-To: cvs-src@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7502E16A468 for ; Wed, 24 Oct 2007 19:14:47 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outB.internet-mail-service.net (outB.internet-mail-service.net [216.240.47.225]) by mx1.freebsd.org (Postfix) with ESMTP id 53AD613C4B5 for ; Wed, 24 Oct 2007 19:14:46 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 24 Oct 2007 12:14:39 -0700 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (nat.ironport.com [63.251.108.100]) by idiom.com (Postfix) with ESMTP id B744A1267F6; Wed, 24 Oct 2007 12:14:38 -0700 (PDT) Message-ID: <471F99B3.6060802@elischer.org> Date: Wed, 24 Oct 2007 12:14:59 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= References: <200710231754.l9NHsGLH090312@repoman.freebsd.org> <86y7dsuby9.fsf@ds4.des.no> In-Reply-To: <86y7dsuby9.fsf@ds4.des.no> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Julian Elischer , cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/kern kern_fork.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Oct 2007 19:14:47 -0000 Dag-Erling Smørgrav wrote: > Julian Elischer writes: >> This removes a reproducible lockup in NFS. > > Could you elaborate on that? > > DES facts: There is an error in the single-threading mode selected in fork (some "optimization" code that was added at some time (maybe by me)) that suspends threads that are already sleeping with PCATCH by simply adding the suspended bit. Turns out this is a bad idea. NFS sometimes sleeps with a vnode lock held, with PCATCH set. (and is this a candidate for the above) now, the mechanism: thread A does an NFS operation, locks an NFS vnode, and sleeps with PCATCH for some reply from the server. thread B enters NFS but hits the locked vnode and waits (NO PCATCH) thread C does fork() thread A is suspended and can not proceed. (bug but let's get past that) it is counted as quiesced for the thread_single so, Thread B can not proceed and so can not be suspended and counted as quiesced (also bug I think) so thread C never reached 'single threading state' (B is not yet quiesced) and can not proceed. so thread A can not be reawakened. etc. There are so many bugs here that one loses count, however it turns out that the whole idea of single-threading in the fork is unneeded due to all the locking introduced for all the components altered in fork(). so: to fix the problem: use another mode of thread_single() that counts threads quiesced differently and doesn't do the suspend stupidity. but having fixed that, the whole thing can be removed anyhow. (analysis by davidxu, alc, me, alfred in concert)