From owner-freebsd-smp Thu Jan 2 14:36:17 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D251F37B401; Thu, 2 Jan 2003 14:36:16 -0800 (PST) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 322AA43EA9; Thu, 2 Jan 2003 14:36:16 -0800 (PST) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.6/8.12.6) with ESMTP id h02MaFro029301 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jan 2003 17:36:15 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id h02MaAh48780; Thu, 2 Jan 2003 17:36:10 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15892.48858.553808.84435@grasshopper.cs.duke.edu> Date: Thu, 2 Jan 2003 17:36:10 -0500 (EST) To: freebsd-smp@freebsd.org Cc: freebsd-net@freebsd.org Subject: SMP status of networking subsystem? X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Just a quick question.. Where do we stand on bringing the networking subsystem out from under Giant? The mbuf system is soon to be safe, thanks to Alan Cox, so this allows INTR_MPSAFE drivers. However, swi:net is still under Giant, as are many of the important socket functions (sendto(), recvfrom(), etc). Thanks, Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jan 2 16:11:58 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 66A7937B401; Thu, 2 Jan 2003 16:11:57 -0800 (PST) Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 06DCD43E4A; Thu, 2 Jan 2003 16:11:57 -0800 (PST) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by rwcrmhc53.attbi.com (rwcrmhc53) with ESMTP id <2003010300115605300k6r7ne>; Fri, 3 Jan 2003 00:11:56 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id QAA68846; Thu, 2 Jan 2003 16:11:55 -0800 (PST) Date: Thu, 2 Jan 2003 16:11:54 -0800 (PST) From: Julian Elischer To: Andrew Gallatin Cc: freebsd-smp@freebsd.org, freebsd-net@freebsd.org Subject: Re: SMP status of networking subsystem? In-Reply-To: <15892.48858.553808.84435@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Thu, 2 Jan 2003, Andrew Gallatin wrote: > > Just a quick question.. Where do we stand on bringing the networking > subsystem out from under Giant? > > The mbuf system is soon to be safe, thanks to Alan Cox, so this allows > INTR_MPSAFE drivers. However, swi:net is still under Giant, as are > many of the important socket functions (sendto(), recvfrom(), etc). netgraph has been partly done for a while but I need to do some more to complete it.. I need to add an 'MP-safe' timer/callback fascility (that can be used instead of nodes rolling hteir own and I need a fascility to assit hardware-triggered events queue data to the netgraph part of themselves, (i.e. move under the netgraph umbrella). > > Thanks, > > Drew > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-net" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Thu Jan 2 19: 8:11 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A85AE37B401; Thu, 2 Jan 2003 19:08:06 -0800 (PST) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id A9BDD43E4A; Thu, 2 Jan 2003 19:08:05 -0800 (PST) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 7F74BAE162; Thu, 2 Jan 2003 19:08:05 -0800 (PST) Date: Thu, 2 Jan 2003 19:08:05 -0800 From: Alfred Perlstein To: arch@freebsd.org Cc: smp@freebsd.org, jhb@freebsd.org, dillon@freebsd.org, jake@freebsd.org, tanimura@freebsd.org, alc@freebsd.org Subject: Need help fixing lock ordering with filedesc, proc and pipe. Message-ID: <20030103030805.GS26140@elvis.mu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org [please keep me on cc list, I'm not subscribed to most lists] So far I have several fixes, for the lock order reversal with pipes and sigio. I'm not happy with most of them and the one I am happy with I came up with while writing this email, so please pick on it the most. :) Anyhow, I'd really appreciate feedback from those that understand lock order and smp issues. :) I'm going to outline and explain them here and hopefully someone will come up with an alternate or convince me that one of them is right. First a refresher/explanation. Each proc has a filedesc that holds the process's file descriptors. This structure needs locking because other threads within the process or other processes (via rfork()) may want to manipulate the array of file pointers and needs exclusive access to do so safely. Each proc has a lock, this lock covers many, many, many fields of the proc struct including the signal state. Each pipe (or any other object behind a 'struct file') has a lock, this lock is user to maintain consitancy over the object itself, examples: bytes to be transfered, how many readers/writers, etc. From now on I will refer to pipe as 'object'. The problem is that we have a lock order reversal because of these three transitions. Thanks to John Baldwin and his Witness subsystem it was initially made apparent and finally tracked down. In a sysctl we have: allproc -> proc -> filedesc This access is done in a sideways manner, meaning that another proc is examining the filedesc of a running process. The proc lock is grabbed because there is a mistaken impression that this is the safe way to grab a filedesc in a 'sideways' manner. (actually if you look at the code you'll realize that this is a mistaken assumption and is unsafe as fdfree() does not actually lock the proc lock when releasing the filedesc structure. (oops!)) In select we have: filedesc -> object This access is done as an efficiency mechanism, if we do not hold the filedesc locked while calling the fo_poll() routine on each object we must perform an order of magnitude more mutex operations per file being poll()'d/select()'d on. We would have to allocate a temporary array to hold all the file pointers, atomically bump and release thier reference counts each time we looped through the file scan. In each object's write/read routine we have: object -> proc This is because traditionally wakeup()/psignal() didn't block the process, so we had "mutual exclusion" or "Giant" theoretically protecting all of our data. The lock ordering here is done whenever someone requests SIGIO when data arrives on an object. Data arrives which locks the object and it then calls into pgsigio to post SIGIO which grabs the proc lock. All that said, we wind up with an ordering that's broken: proc -> filedesc -> object -> proc Here's my solutions: (implicit "fix the problem" with each 'pro') .) add another mutex to struct proc, this will protect the filedesc reference. pros: simple fix works nicely. cons: bloating proc with yet another mutex, yet another lock aquisition in the exit() path. .) don't hold the filedesc lock over the fo_poll routines in select()/poll() pros: fo_poll routines can now block, our routine is somewhat closer to linux's implementation so we have a reference. cons: fo_poll routines should not block, expensive mutex operations at least 4 lock/unlock and 2 atomic ops per file, extra malloc in the select/poll codepath, extra complexity, our routine is somewhat closer to linux's implementation. :) .) drop the object lock when performing pgsigio pros: none. cons: possible race conditions added to code paths, it's hard to mpsafe something that used to run under Giant if you must make due with a lock you can't hold onto all the time, this sort of issue will happen with sockets, vnodes and kqueues as well. .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead split it into a routine that close()'s all the files in the first pass and return whether or not we were the last accessor to this particular filedesc. Later in kern_exit() right before moving it to the zombie list, while we have the allproc lock, we NULL out the filedesc pointer and free it if we were the last accessor. We also need to handle the case of exec() unsharing the filedescriptor table, I think we can do this the same way at the cost of a possible allproc lock piggybacked or placed in the uncommon case of exec with shared descriptor tables. pros: should work, minimal change, we pigggyback the locking operations on existing ones (allproc). cons: fdfree now needs to lock the filedesc while closing each file, not such a big deal. allproc needed whenever sideways access to a single proc's filedesc is required, allproc must be held whenever performing any sideways filedesc operation I really like the last solution (piggyback on allproc). But since this _will_ dictate how we continue to do locking in the future I'd like to open it for discussion. Of course alternate solutions would be nice to hear. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jan 3 1:21:22 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EEADD37B401 for ; Fri, 3 Jan 2003 01:21:21 -0800 (PST) Received: from mx0.estimese.net (finom.estimese.net [195.168.3.50]) by mx1.FreeBSD.org (Postfix) with SMTP id 0FE4C43EB2 for ; Fri, 3 Jan 2003 01:21:20 -0800 (PST) (envelope-from zero@estimese.net) Received: (qmail 3286 invoked by uid 69); 3 Jan 2003 09:20:09 -0000 Date: Fri, 3 Jan 2003 10:20:09 +0100 From: Robert Bopko To: Peter Wemm Cc: smp@FreeBSD.org Subject: Re: Update on SE7500 P4 SMP.. Message-ID: <20030103092009.GA3229@finom.estimese.net> References: <20021230203509.DE1922A88D@canning.wemm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20021230203509.DE1922A88D@canning.wemm.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org hi peter, i tried both values, but same panic message appears again. On Mon, Dec 30, 2002 at 12:35:09PM -0800, Peter Wemm wrote: > This is a long shot, but has anybody tried correcting this value on their > machine: sys/i386/i386/mpapic.c: > #define bus_clock() 66000000 > > Try changing it to 400000000 or 533000000 and see if that helps. I dont > know what the local apic timer clock is based on.. it might be the > quadrupled clock, the native FSB clock (100000000 or 133000000) or > something else. But since we have IPI delivery problems and the IPI's do > work on NetBSD (which calibrates this timer), this is a logical place to > tinker with. > > Cheers, > -Peter > -- > Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com > "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jan 3 10: 5:19 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5F9637B401; Fri, 3 Jan 2003 10:05:17 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 399E443EC5; Fri, 3 Jan 2003 10:05:17 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h03I5GYM064164; Fri, 3 Jan 2003 10:05:17 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h03I5Gt8064163; Fri, 3 Jan 2003 10:05:16 -0800 (PST) Date: Fri, 3 Jan 2003 10:05:16 -0800 (PST) From: Matthew Dillon Message-Id: <200301031805.h03I5Gt8064163@apollo.backplane.com> To: Alfred Perlstein Cc: arch@FreeBSD.ORG, smp@FreeBSD.ORG, jhb@FreeBSD.ORG, jake@FreeBSD.ORG, tanimura@FreeBSD.ORG, alc@FreeBSD.ORG Subject: Re: Need help fixing lock ordering with filedesc, proc and pipe. References: <20030103030805.GS26140@elvis.mu.org> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hmm. There seem to be multiple issues here but would defering the psigio also be a workable solution? That is, simply add a flag to the filedesc and return and the caller (or his caller) is responsible for checking the flag and calling psigio in a more controlled situation. This way sigio can be initiated deep in the code without any worries. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jan 3 10:20:17 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4D96937B401; Fri, 3 Jan 2003 10:20:16 -0800 (PST) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 00D6943E4A; Fri, 3 Jan 2003 10:20:16 -0800 (PST) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id C3A98AE162; Fri, 3 Jan 2003 10:20:15 -0800 (PST) Date: Fri, 3 Jan 2003 10:20:15 -0800 From: Alfred Perlstein To: Matthew Dillon Cc: arch@FreeBSD.ORG, smp@FreeBSD.ORG, jhb@FreeBSD.ORG, jake@FreeBSD.ORG, tanimura@FreeBSD.ORG, alc@FreeBSD.ORG Subject: Re: Need help fixing lock ordering with filedesc, proc and pipe. Message-ID: <20030103182015.GB33821@elvis.mu.org> References: <20030103030805.GS26140@elvis.mu.org> <200301031805.h03I5Gt8064163@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200301031805.h03I5Gt8064163@apollo.backplane.com> User-Agent: Mutt/1.4i Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * Matthew Dillon [030103 10:05] wrote: > Hmm. There seem to be multiple issues here but would defering the > psigio also be a workable solution? That is, simply add a flag to > the filedesc and return and the caller (or his caller) is responsible > for checking the flag and calling psigio in a more controlled situation. > This way sigio can be initiated deep in the code without any worries. No. Not every object has a filedesc in front of it. Think of a kernel subsystem that allocates a socket directly via socreate. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jan 3 10:22:38 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12E9837B401 for ; Fri, 3 Jan 2003 10:22:37 -0800 (PST) Received: from mail.speakeasy.net (mail15.speakeasy.net [216.254.0.215]) by mx1.FreeBSD.org (Postfix) with ESMTP id BAE1D43ED1 for ; Fri, 3 Jan 2003 10:22:34 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: (qmail 31668 invoked from network); 3 Jan 2003 18:22:38 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail15.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 3 Jan 2003 18:22:38 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id h03IMWUT064213; Fri, 3 Jan 2003 13:22:32 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <200301031805.h03I5Gt8064163@apollo.backplane.com> Date: Fri, 03 Jan 2003 13:22:39 -0500 (EST) From: John Baldwin To: Matthew Dillon Subject: Re: Need help fixing lock ordering with filedesc, proc and pipe. Cc: alc@FreeBSD.ORG, tanimura@FreeBSD.ORG, jake@FreeBSD.ORG, smp@FreeBSD.ORG, arch@FreeBSD.ORG, Alfred Perlstein Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 03-Jan-2003 Matthew Dillon wrote: > Hmm. There seem to be multiple issues here but would defering the > psigio also be a workable solution? That is, simply add a flag to > the filedesc and return and the caller (or his caller) is responsible > for checking the flag and calling psigio in a more controlled situation. > This way sigio can be initiated deep in the code without any worries. I would like that as well if it can be cleanly done that way. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jan 3 10:22:44 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C58037B405 for ; Fri, 3 Jan 2003 10:22:41 -0800 (PST) Received: from mail.speakeasy.net (mail12.speakeasy.net [216.254.0.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0165443ED8 for ; Fri, 3 Jan 2003 10:22:38 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: (qmail 8382 invoked from network); 3 Jan 2003 18:22:42 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail12.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 3 Jan 2003 18:22:42 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id h03IMZUT064216; Fri, 3 Jan 2003 13:22:35 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20030103030805.GS26140@elvis.mu.org> Date: Fri, 03 Jan 2003 13:22:42 -0500 (EST) From: John Baldwin To: Alfred Perlstein Subject: RE: Need help fixing lock ordering with filedesc, proc and pipe. Cc: alc@freebsd.org, tanimura@freebsd.org, jake@freebsd.org, dillon@freebsd.org, smp@freebsd.org, arch@freebsd.org Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 03-Jan-2003 Alfred Perlstein wrote: > The problem is that we have a lock order reversal because of these three > transitions. Thanks to John Baldwin and his Witness subsystem it was > initially made apparent and finally tracked down. > > In a sysctl we have: allproc -> proc -> filedesc > This access is done in a sideways manner, meaning that another proc is > examining the filedesc of a running process. The proc lock is grabbed > because there is a mistaken impression that this is the safe way to > grab a filedesc in a 'sideways' manner. (actually if you look at the > code you'll realize that this is a mistaken assumption and is unsafe > as fdfree() does not actually lock the proc lock when releasing the > filedesc structure. (oops!)) Hmm, we should fix fdfree() then. :-/ Note that the rest of the kernel assumes it is ok to access p_fd directly without a lock for curproc but that should be safe. This function has other problems though. It does a SYSCTL_OUT while holding locks which is potentially ugly. Probably what it should be doing is something like: PROC_LOCK(p); ... (pid and uid) fdp = p->p_fd; if (fdp == NULL) { PROC_UNLOCK(p); } FILEDESC_LOCK(fdp); bump filedesc ref count PROC_UNLOCK(). Then I think you can drop the FILEDESC_LOCK around SYSCTL_OUT and if files close out from under you at the end of the list, so be it. This still has the same lock order problem however. You wouldn't have to wire in the user buffer for the sysctl though. *shrug* > In select we have: filedesc -> object > This access is done as an efficiency mechanism, if we do not hold > the filedesc locked while calling the fo_poll() routine on each > object we must perform an order of magnitude more mutex operations > per file being poll()'d/select()'d on. > We would have to allocate a temporary array to hold all the file > pointers, atomically bump and release thier reference counts each > time we looped through the file scan. > > In each object's write/read routine we have: object -> proc > This is because traditionally wakeup()/psignal() didn't block the > process, so we had "mutual exclusion" or "Giant" theoretically protecting > all of our data. The lock ordering here is done whenever someone requests > SIGIO when data arrives on an object. Data arrives which locks the object > and it then calls into pgsigio to post SIGIO which grabs the proc lock. > > All that said, we wind up with an ordering that's broken: > proc -> filedesc -> object -> proc > > Here's my solutions: (implicit "fix the problem" with each 'pro') > > .) add another mutex to struct proc, this will protect the filedesc reference. > pros: simple fix works nicely. > cons: bloating proc with yet another mutex, > yet another lock aquisition in the exit() path. Alternative to this (not that bad when you think about it, and really kind of like your last solution) is to use one global lock to protect all the p_fd pointers in the !curproc-read case. I.e. instead of adding a mutex to each proc, just have a global pfd_mutex lock. > .) drop the object lock when performing pgsigio > pros: none. > cons: possible race conditions added to code paths, it's hard to > mpsafe something that used to run under Giant if you must > make due with a lock you can't hold onto all the time, > this sort of issue will happen with sockets, vnodes and kqueues > as well. If you defer the sending of sigio until after you are done messing with the object then I think you can safely do it w/o needing the object to be locked. SIGIO has inherent userland races anyways. > .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead > split it into a routine that close()'s all the files in the first pass > and return whether or not we were the last accessor to this particular > filedesc. > Later in kern_exit() right before moving it to the zombie list, > while we have the allproc lock, we NULL out the filedesc pointer and free > it if we were the last accessor. We also need to handle the case > of exec() unsharing the filedescriptor table, I think we can do this > the same way at the cost of a possible allproc lock piggybacked or > placed in the uncommon case of exec with shared descriptor tables. I would rather not leave bogus filedesc's lying arond that have zero reference counts. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message From owner-freebsd-smp Fri Jan 3 10:38:24 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 70FFB37B401; Fri, 3 Jan 2003 10:38:20 -0800 (PST) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0DBA443E4A; Fri, 3 Jan 2003 10:38:20 -0800 (PST) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id CED16AE24A; Fri, 3 Jan 2003 10:38:19 -0800 (PST) Date: Fri, 3 Jan 2003 10:38:19 -0800 From: Alfred Perlstein To: John Baldwin Cc: alc@freebsd.org, tanimura@freebsd.org, jake@freebsd.org, dillon@freebsd.org, smp@freebsd.org, arch@freebsd.org Subject: Re: Need help fixing lock ordering with filedesc, proc and pipe. Message-ID: <20030103183819.GC33821@elvis.mu.org> References: <20030103030805.GS26140@elvis.mu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * John Baldwin [030103 10:24] wrote: > > On 03-Jan-2003 Alfred Perlstein wrote: > > The problem is that we have a lock order reversal because of these three > > transitions. Thanks to John Baldwin and his Witness subsystem it was > > initially made apparent and finally tracked down. > > > > In a sysctl we have: allproc -> proc -> filedesc > > This access is done in a sideways manner, meaning that another proc is > > examining the filedesc of a running process. The proc lock is grabbed > > because there is a mistaken impression that this is the safe way to > > grab a filedesc in a 'sideways' manner. (actually if you look at the > > code you'll realize that this is a mistaken assumption and is unsafe > > as fdfree() does not actually lock the proc lock when releasing the > > filedesc structure. (oops!)) > > Hmm, we should fix fdfree() then. :-/ Note that the rest of the kernel > assumes it is ok to access p_fd directly without a lock for curproc but > that should be safe. > > This function has other problems though. It does a SYSCTL_OUT while > holding locks which is potentially ugly. Probably what it should be > doing is something like: It looks like it uses sysctl_wire_old_buffer() to ensure that copyout won't block. > > In select we have: filedesc -> object > > This access is done as an efficiency mechanism, if we do not hold > > the filedesc locked while calling the fo_poll() routine on each > > object we must perform an order of magnitude more mutex operations > > per file being poll()'d/select()'d on. > > We would have to allocate a temporary array to hold all the file > > pointers, atomically bump and release thier reference counts each > > time we looped through the file scan. > > > > In each object's write/read routine we have: object -> proc > > This is because traditionally wakeup()/psignal() didn't block the > > process, so we had "mutual exclusion" or "Giant" theoretically protecting > > all of our data. The lock ordering here is done whenever someone requests > > SIGIO when data arrives on an object. Data arrives which locks the object > > and it then calls into pgsigio to post SIGIO which grabs the proc lock. > > > > All that said, we wind up with an ordering that's broken: > > proc -> filedesc -> object -> proc > > > > Here's my solutions: (implicit "fix the problem" with each 'pro') > > > > .) add another mutex to struct proc, this will protect the filedesc reference. > > pros: simple fix works nicely. > > cons: bloating proc with yet another mutex, > > yet another lock aquisition in the exit() path. > > Alternative to this (not that bad when you think about it, and really kind of > like your last solution) is to use one global lock to protect all the p_fd > pointers in the !curproc-read case. I.e. instead of adding a mutex to each > proc, just have a global pfd_mutex lock. Yes, a global mutex in the process fork/exit/exec path. Not sure if I like that, but I could live with it. I had thought of it, but felt guilty proposing it so it must have stuck in the back of my head. > > .) drop the object lock when performing pgsigio > > pros: none. > > cons: possible race conditions added to code paths, it's hard to > > mpsafe something that used to run under Giant if you must > > make due with a lock you can't hold onto all the time, > > this sort of issue will happen with sockets, vnodes and kqueues > > as well. > > If you defer the sending of sigio until after you are done messing with > the object then I think you can safely do it w/o needing the object to be > locked. SIGIO has inherent userland races anyways. This idea is deceptively simple. Look at the pipe code and tell me where this is possible. Sometimes you have to post sigio in a loop where sleeping occurs, so you can't just postpone it until just before you return, you have to drop the object lock. > > .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead > > split it into a routine that close()'s all the files in the first pass > > and return whether or not we were the last accessor to this particular > > filedesc. > > Later in kern_exit() right before moving it to the zombie list, > > while we have the allproc lock, we NULL out the filedesc pointer and free > > it if we were the last accessor. We also need to handle the case > > of exec() unsharing the filedescriptor table, I think we can do this > > the same way at the cost of a possible allproc lock piggybacked or > > placed in the uncommon case of exec with shared descriptor tables. > > I would rather not leave bogus filedesc's lying arond that have zero reference > counts. This is a really bad assesment of this option. Please give it some more thought, it wasn't completetly apparent when I first started coming up with solutions but it's still my favorite. And the zero refcount filedesc is the _only_ problem with it. :) The filedesc is only held onto for a short period after it is unlinked from it's parent proc. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message