From owner-freebsd-arch Sun Sep 24 3:31:38 2000 Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173]) by hub.freebsd.org (Postfix) with ESMTP id 97B8437B422; Sun, 24 Sep 2000 03:31:31 -0700 (PDT) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.0/8.11.0) with ESMTP id e8OASfC46009; Sun, 24 Sep 2000 11:28:41 +0100 (BST) (envelope-from brian@hak.lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.0/8.11.0) with ESMTP id e8OAQVx26206; Sun, 24 Sep 2000 11:26:31 +0100 (BST) (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.1.1 10/15/1999 To: Greg Lehey Cc: Chuck Paterson , Archie Cobbs , Brian Somers , Joerg Micheel , Matthew Jacob , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@freebsd.org, brian@Awfulhak.org Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest.c randomdev.c yarrow.c yarro) In-Reply-To: Message from Greg Lehey of "Sun, 24 Sep 2000 15:42:16 +0930." <20000924154216.D512@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 24 Sep 2000 11:26:31 +0100 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > 1. Because "mutexes" (I really hate this term; I wish I could find a > better one) only have an implied count of one, they can also have > the concept of an owner, which we use. > > 2. Because the mutex has an owner, only the owner can release it. > > 3. The mutex can also be "recursive" (it's really iterative, I > suppose): the owner can take it several times. The only reason > for this appears to be sloppy coding, but in the short term I > think we're agreed that we can't dispose of that. I agree - the idea of recursive mutices evil and should go, but the idea of an owner should not. It's nice to be able to write code that KASSERTs that it already owns a given mutex. > Greg > -- > Finger grog@lemis.com for PGP public key > See complete headers for address and phone numbers -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 4:50:41 2000 Delivered-To: freebsd-arch@freebsd.org Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by hub.freebsd.org (Postfix) with ESMTP id EF88B37B422 for ; Sun, 24 Sep 2000 04:50:32 -0700 (PDT) Received: (from des@localhost) by flood.ping.uio.no (8.9.3/8.9.3) id NAA48203; Sun, 24 Sep 2000 13:50:27 +0200 (CEST) (envelope-from des@ofug.org) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Barry Pederson Cc: arch@FreeBSD.ORG Subject: Re: Snapshots in the Fast Filesystem References: <200007060342.UAA23667@beastie.mckusick.com> <39CD0C1B.324AA1C5@geocities.com> From: Dag-Erling Smorgrav Date: 24 Sep 2000 13:50:26 +0200 In-Reply-To: Barry Pederson's message of "Sat, 23 Sep 2000 15:01:31 -0500" Message-ID: Lines: 14 User-Agent: Gnus/5.0802 (Gnus v5.8.2) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Barry Pederson writes: > Kirk gives the example of mounting a snapshot by using a 'vn0c' device - > I was wondering if the 'c' part of those device names is significant? Yes. These files are raw FS images, not labeled slices, so the only existing partition is 'c'. > Could you mount additional snapshots using 'vn0a', 'vn0b' and so on? No. One file, one device. DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 9:38:16 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 3958A37B424; Sun, 24 Sep 2000 09:38:14 -0700 (PDT) Received: from bird (bird.feral.com [192.67.166.155]) by feral.com (8.9.3/8.9.3) with ESMTP id JAA19290; Sun, 24 Sep 2000 09:37:30 -0700 Date: Sun, 24 Sep 2000 09:37:27 -0700 (PDT) From: Matthew Jacob Reply-To: mjacob@feral.com To: Brian Somers Cc: Greg Lehey , Chuck Paterson , Archie Cobbs , Joerg Micheel , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@freebsd.org Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest.c randomdev.c yarrow.c yarro) In-Reply-To: <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > I agree - the idea of recursive mutices evil and should go, but the > idea of an owner should not. It's nice to be able to write code that > KASSERTs that it already owns a given mutex. I'm not sure I agree. Having lived through Solaris hell with recursive mutex panics, I rather like the BSD/OS approach. Yes, possibly allows for sloppy coding. If you get rid of this, though, you can extend the switchover and pain for SMP at least a year. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 10:33: 8 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8012D37B422; Sun, 24 Sep 2000 10:33:03 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8OHX3p27873; Sun, 24 Sep 2000 10:33:03 -0700 (PDT) Date: Sun, 24 Sep 2000 10:33:03 -0700 From: Alfred Perlstein To: arch@freebsd.org Cc: cp@freebsd.org, bmilekic@freebsd.org Subject: need advice, fsetown annoyances and mpsafeness. Message-ID: <20000924103303.M9141@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Take this scenario into account: 1) one sets a socket S for SIGIO delivery on an event to pid N 2) N exits 3) evetually before 'S' is destroyed a second process happens to get pid N. 4) an event happens on 'S' and the wrong process 'N' is notified. Well this isn't possible in FreeBSD because we hang a struct sigio off of the object that is going to be delivering signals as well as the struct proc/pgrp that is to recieve them. When the proc/pgrp is destroyed funsetownlst() is called on the list of sigio structs hanging from the proc/pgrp. What it then does is walk through the sigio structs hung from itself and using a back-pointer that points to the pointer within the object (socket/tty) it raises splhigh and NULLs it out, lowers spl, then frees the sigio. s = splhigh(); *(sigio->sio_myref) = NULL; splx(s); If an object is destroyed it is responsible for freeing the attached sigio struct in nearly the same way... raising splhigh and delinking itself from the list of sigios attached to the proc/pgrp This is a problem because it's pretty complicated to make mpsafe. Solutions come to mind: 1) embedding the sigio within the object. problems: structure bloat, not really sure if it helps 2) removing the burden of sigio destruction from the proc/pgrp destruction routines, instead the proc can just walk the sigios and set a flag is set such that the sigio is not to be delivered, it is then entirely up to the object (socket/tty) to free() the sigio. the sigio linked list manipulation can be hinged off the process mutex we will need to add to the proc and pgrp structures. if a sigio is going to be changed you must aquire the proc/pgrp lock of the process/group you are removing the structure from before doing the unlinking and change otherwise you race against process exit. Option 2 seems a lot clearer to me and it also seems to address all the problems here without any hackish like solution I'm going to be investigating the BSD/os way of handling this, but it seems that they don't take into account for pid wraparound at a glance. Questions? Comments? I'm really looking for either, encouragement (hey '2' looks cool), alternate locking suggestions, or a redesign of the sigio way that is more mpsafe. Is anyone else starting to hate monolithic kernel design? :) thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 11: 8:32 2000 Delivered-To: freebsd-arch@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id CFE8737B424; Sun, 24 Sep 2000 11:08:29 -0700 (PDT) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id e8OI8fi18906; Sun, 24 Sep 2000 13:08:41 -0500 (CDT) (envelope-from jlemon) Date: Sun, 24 Sep 2000 13:08:41 -0500 From: Jonathan Lemon To: Alfred Perlstein Cc: arch@FreeBSD.ORG, cp@FreeBSD.ORG, bmilekic@FreeBSD.ORG Subject: Re: need advice, fsetown annoyances and mpsafeness. Message-ID: <20000924130841.A2487@prism.flugsvamp.com> References: <20000924103303.M9141@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <20000924103303.M9141@fw.wintelcom.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, Sep 24, 2000 at 10:33:03AM -0700, Alfred Perlstein wrote: > 2) removing the burden of sigio destruction from the proc/pgrp destruction > routines, instead the proc can just walk the sigios and set a > flag is set such that the sigio is not to be delivered, it is > then entirely up to the object (socket/tty) to free() the sigio. > > the sigio linked list manipulation can be hinged off the process > mutex we will need to add to the proc and pgrp structures. > > if a sigio is going to be changed you must aquire the proc/pgrp lock > of the process/group you are removing the structure from before > doing the unlinking and change otherwise you race against process > exit. > > Option 2 seems a lot clearer to me and it also seems to address all > the problems here without any hackish like solution > > I'm going to be investigating the BSD/os way of handling this, but > it seems that they don't take into account for pid wraparound at > a glance. > > Questions? Comments? kqueue has a similar problem, and resolves this in a similar fashion as above. A knote can be attached to a process, which may exit; in this case, the process just walks down the list and sets a flag, the structure is then destroyed when kevent gets around to examining it. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 11:33:29 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id B9A1E37B422 for ; Sun, 24 Sep 2000 11:33:25 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id LAA10553 for ; Sun, 24 Sep 2000 11:33:24 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id LAA00463; Sun, 24 Sep 2000 11:33:23 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Sun, 24 Sep 2000 11:33:23 -0700 (PDT) Message-Id: <200009241833.LAA00463@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Subject: Re: Mutexes and semaphores In-Reply-To: <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org> References: <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org>, Brian Somers wrote: > > 3. The mutex can also be "recursive" (it's really iterative, I > > suppose): the owner can take it several times. The only reason > > for this appears to be sloppy coding, but in the short term I > > think we're agreed that we can't dispose of that. > > I agree - the idea of recursive mutices evil and should go, but the > idea of an owner should not. It's nice to be able to write code that > KASSERTs that it already owns a given mutex. I disagree that recursive mutexes are bad, and I don't think "sloppy coding" is the right way to look at them. I would argue that recursive mutexes allow robust code to be written based solely on knowledge of the immediately surrounding code, and that is a Good Thing. There are plenty of reasonable situations where you have a block of code (say, a function) and a certain mutex needs to be locked while it executes. The function might be called from several different places. Maybe all of the call sites already hold the mutex, and maybe they don't. Maybe it is hard to say for sure. Maybe new calls will be added in the future which will add further uncertainty. With recursive mutexes you can make the code robust by locking the mutex inside the called function. This robustness is certain and it is independent of what is going on in the rest of the system. Just look at the traditional kernel with respect to the spl*() calls. Imagine if it were illegal to call an spl function which would block one or more interrupts which were already blocked. That kind of restriction would make the code must less robust and much harder to maintain. There is a place for both recursive and non-recursive mutexes in a sound and robust design. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 11:34:14 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id D758A37B422; Sun, 24 Sep 2000 11:34:11 -0700 (PDT) Received: from modemcable136.203-201-24.mtl.mc.videotron.ca ([24.201.203.136]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G1E00MDVM8XXY@falla.videotron.net>; Sun, 24 Sep 2000 14:34:10 -0400 (EDT) Date: Sun, 24 Sep 2000 14:37:52 -0400 (EDT) From: Bosko Milekic Subject: Re: need advice, fsetown annoyances and mpsafeness. In-reply-to: <20000924103303.M9141@fw.wintelcom.net> To: Alfred Perlstein Cc: arch@FreeBSD.ORG, cp@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 24 Sep 2000, Alfred Perlstein wrote: > What it then does is walk through the sigio structs hung from itself > and using a back-pointer that points to the pointer within the > object (socket/tty) it raises splhigh and NULLs it out, lowers spl, > then frees the sigio. > > s = splhigh(); > *(sigio->sio_myref) = NULL; > splx(s); Why can't this be done with an atomic operation? If you're holding the sigio struct, then are you not also ensuring that sigio->sio_myref won't change. Setting the pointer within the object to NULL should be atomic in itself, AFAIK. I'm wondering what would happen if the object is destroyed just before you splhigh() up there (in other words, did you leave something out of the example you posted above?) Assuming something was left out, then I'm wondering if it would be profitable in this case to distinguish between the nature of the object and optionally provide a pointer to a mutex in the sigio struct which should be aquired in order to do this manipulation. > thanks, > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > "I have the heart of a child; I keep it in a jar on my desk." Cheers, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 11:45:38 2000 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 222FB37B424 for ; Sun, 24 Sep 2000 11:45:34 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.0/8.9.3) with ESMTP id e8OIjWN31096 for ; Sun, 24 Sep 2000 20:45:32 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-Reply-To: Your message of "Sun, 24 Sep 2000 11:33:23 PDT." <200009241833.LAA00463@vashon.polstra.com> Date: Sun, 24 Sep 2000 20:45:32 +0200 Message-ID: <31094.969821132@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200009241833.LAA00463@vashon.polstra.com>, John Polstra writes: >I disagree that recursive mutexes are bad, and I don't think "sloppy >coding" is the right way to look at them. I would argue that >recursive mutexes allow robust code to be written based solely on >knowledge of the immediately surrounding code, and that is a Good >Thing. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 12:20:17 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 0047837B424 for ; Sun, 24 Sep 2000 12:20:11 -0700 (PDT) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id PAA08469; Sun, 24 Sep 2000 15:19:55 -0400 (EDT) Date: Sun, 24 Sep 2000 15:19:55 -0400 (EDT) From: Daniel Eischen To: arch@FreeBSD.ORG Cc: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-Reply-To: <200009241833.LAA00463@vashon.polstra.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 24 Sep 2000, John Polstra wrote: > In article <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org>, > Brian Somers wrote: > > > 3. The mutex can also be "recursive" (it's really iterative, I > > > suppose): the owner can take it several times. The only reason > > > for this appears to be sloppy coding, but in the short term I > > > think we're agreed that we can't dispose of that. > > > > I agree - the idea of recursive mutices evil and should go, but the > > idea of an owner should not. It's nice to be able to write code that > > KASSERTs that it already owns a given mutex. > > I disagree that recursive mutexes are bad, and I don't think "sloppy > coding" is the right way to look at them. I would argue that > recursive mutexes allow robust code to be written based solely on > knowledge of the immediately surrounding code, and that is a Good > Thing. > > There are plenty of reasonable situations where you have a block of > code (say, a function) and a certain mutex needs to be locked while > it executes. The function might be called from several different > places. Maybe all of the call sites already hold the mutex, and > maybe they don't. Maybe it is hard to say for sure. Maybe new calls > will be added in the future which will add further uncertainty. With > recursive mutexes you can make the code robust by locking the mutex > inside the called function. This robustness is certain and it is > independent of what is going on in the rest of the system. But you can't then use a recursive mutex in conjunction with msleep (cv_wait) which forces you to use yet another mutex. This is fine, but it adds confusion for the programmer. Another thing, is in our support for recursive mutexes is that they make the calling conventions overly complex (with the silly flag argumuents to mtx_enter()). If we are going to support recursive mutex, I think it would be better to add separate calls/macros/data types to support them, so the the mtx mutexes can be simplified. Calls to mtx_enter with the recursive mutex type wouldn't even compile. My $0.02 for what it's worth... -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 12:50:16 2000 Delivered-To: freebsd-arch@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 821F637B422; Sun, 24 Sep 2000 12:50:08 -0700 (PDT) Received: from berserker.bsdi.com (cp@LOCALHOST [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id NAA25438; Sun, 24 Sep 2000 13:48:45 -0600 (MDT) Message-Id: <200009241948.NAA25438@berserker.bsdi.com> To: Greg Lehey Cc: Archie Cobbs , Brian Somers , Joerg Micheel , Matthew Jacob , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@freebsd.org Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest.c randomdev.c yarrow.c yarro) In-reply-to: Your message of "Sun, 24 Sep 2000 15:42:16 +0930." <20000924154216.D512@wantadilla.lemis.com> From: Chuck Paterson Date: Sun, 24 Sep 2000 13:48:45 -0600 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG First a general comment. The main reason to not hold a mutex across an async event is not because it won't work, but because it means that we loose the ability to detect dead locks. If process A holds mutex bar during a wait for async event, such as msleep(), then it becomes a requirment that the process which is going to wake up process A doesn't block on mutex foo, or have any dependencies even many removed on something that requires mutex bar. Greg Lehey wrote on: Sun, 24 Sep 2000 15:42:16 +0930 }On Saturday, 23 September 2000 at 21:02:49 -0600, Chuck Paterson wrote: }> }>> Once you have the spin lock primitive, you can easily build }>> semaphores, sleep queues, etc. A semaphore is just a counter plus }>> a sleep queue -- all protected by the spin lock. }>> }>> A MUTEX is just a sepaphore whose initial count is 1. }>> }>> ?? }> }> In general this might be true, but in specific it isn't. } }As you know, I used to say exactly the same thing as Archie, but I've }realized that this implied count of 1 causes a couple of important }differences. I'm still working on a clearer definition, but what I've }seen so far is: } }1. Because "mutexes" (I really hate this term; I wish I could find a } better one) only have an implied count of one, they can also have } the concept of an owner, which we use. } }2. Because the mutex has an owner, only the owner can release it. } }3. The mutex can also be "recursive" (it's really iterative, I } suppose): the owner can take it several times. The only reason } for this appears to be sloppy coding, but in the short term I } think we're agreed that we can't dispose of that. } I have to disagree with item 3. Take the simple situation of function a() needing lock foo and function b() needing lock foo. If b() is some times called from a() and sometimes not then the recursiveness of foo is saving state. The same state will have to be passed explicitly and tested b() in either case, all that is really done is providing an automatic way of passing this state in, and saving a few cycles because we don't have to set up a variable and pass it in. }One thing that I don't think is important is the duration of }ownership. We currently use mutexes for short periods of time, which }is why we have the spin version. } }At Tandem, we only used semaphores, but they always had a count of 1, }so they were effectively very close to our mutexes. They didn't allow }recursion, which is the Right Thing in a system designed from the }ground up, but they also didn't have owners. One of the most frequent }complicated problems we had were system hangs (deadlocks), and we }frequently couldn't figure out who had done what and why. Having }owners is a great debug aid. } }> The sleep version of mutexs have no spin lock. Spin locks are more }> expensive than the mutices currently in FreeBSD and BSD/OS. In }> order to acquire a spin locks interrupts must be blocked, which }> isn't the case for mutices which are not contested. } }If we can expect that the mutex will, on average, be freed in less }time than it would take to schedule a new process, spin locks can be a }better alternative. Otherwise we wouldn't need them at all. } I think the previous graph is an over simplification. In general the following is closer to metric for your suggestion is: POC percentage of acquisitons which have a conflict CCS average cost of context switch AHT average hold time SLS how much is saved acquiring a sleep lock instead of a spin lock if ((CCS - (AHT / 2) * POC > SLS) use spin lock In the future when we have smarter code in the case where we have a conflict then the percentage of time we pay the CCS will drop. The place where spin locks are required is where a context switch is not permissible. }Anyway, this doesn't directly relate to semaphores. We have the basic }issue of atomicity, which in general can be handled without spin }locks, and that would apply to semaphores just as much as to mutexes. } }Greg }-- }Finger grog@lemis.com for PGP public key }See complete headers for address and phone numbers Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 12:53:25 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id DB4A637B422; Sun, 24 Sep 2000 12:53:13 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8OJrCI01203; Sun, 24 Sep 2000 12:53:12 -0700 (PDT) Date: Sun, 24 Sep 2000 12:53:12 -0700 From: Alfred Perlstein To: Bosko Milekic Cc: arch@FreeBSD.ORG, cp@FreeBSD.ORG Subject: Re: need advice, fsetown annoyances and mpsafeness. Message-ID: <20000924125311.Q9141@fw.wintelcom.net> References: <20000924103303.M9141@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bmilekic@technokratis.com on Sun, Sep 24, 2000 at 02:37:52PM -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Bosko Milekic [000924 11:34] wrote: > > > On Sun, 24 Sep 2000, Alfred Perlstein wrote: > > > What it then does is walk through the sigio structs hung from itself > > and using a back-pointer that points to the pointer within the > > object (socket/tty) it raises splhigh and NULLs it out, lowers spl, > > then frees the sigio. > > > > s = splhigh(); > > *(sigio->sio_myref) = NULL; > > splx(s); > > Why can't this be done with an atomic operation? If you're holding > the sigio struct, then are you not also ensuring that sigio->sio_myref > won't change. Setting the pointer within the object to NULL should be > atomic in itself, AFAIK. I'm wondering what would happen if the object is > destroyed just before you splhigh() up there (in other words, did you > leave something out of the example you posted above?) > Assuming something was left out, then I'm wondering if it would be > profitable in this case to distinguish between the nature of the object > and optionally provide a pointer to a mutex in the sigio struct which > should be aquired in order to do this manipulation. It's really a lot more evil than you think. The race is in the object (socket/tty) checking the pointer and then dereferencing it. A broken solution is to lock the sigio struct or provide a backreference to the socket/tty lock, after banging my head against my desk for some time I came across this solution: (assuming pfind/pgfind return the proc/pgrp locked) /* * called by the owner of a sigio struct such as a tty/socket to remove * a struct sigio from itself, called at object destruction or at the * the time that sigio/sigurg is no longer wanted/needed * it will lock and unlock the proc/pgrp target of the sigio */ void funsetown_obj(sigio) struct sigio *sigio; { pid_t pid; if (sigio == NULL) return; /* * ok this is somewhat tricky, we examine what the sigio is attached * to, whatever it is proc/pgrp we need to use the search functions * to ensure atomicity. If we get back ESRCH that's ok, that means * we lost the race, just free it. * if we get back a pointer we then need to make sure that the pgid * hasn't been NULLed out because we lost the race between looking * at the sigio and locking the proc/pgrp * (most likely pid/pgid wraparound) */ pid = sigio->sio_pgid; if (pid < 0) { struct pgrp *p; if ((pgrp = pgfind(pid)) != NULL) { /* funsetown_proc would have set this to zero */ if (sigio->sio_pgid != 0) SLIST_REMOVE(&sigio->sio_pgrp->pg_sigiolst, sigio, sigio, sio_pgsigio); PGRP_UNLOCK(&sigio->sio_pgrp); } } else if (pid > 0) { struct proc *p; if ((p = pfind(pid)) != NULL) { /* funsetown_proc would have set this to zero */ if (sigio->sio_pgid != 0) SLIST_REMOVE(&sigio->sio_proc->p_sigiolst, sigio, sigio, sio_pgsigio); PROC_UNLOCK(&sigio->sio_proc); } } out: crfree(sigio->sio_ucred); FREE(sigio, M_SIGIO); } /* * NULL out a sigio struct attached to a process/pgrp * must be called with the object (struct proc/pgrp) locked * this is to be called from the perspective of the process/pgrp * * called from the proc/pgid at teardown * proc/pgid must be locked */ void funsetown_proc(sigio) struct sigio *sigio; { int s; if (sigio == NULL) return; if (sigio->sio_pgid < 0) { SLIST_REMOVE(&sigio->sio_pgrp->pg_sigiolst, sigio, sigio, sio_pgsigio); } else /* if ((*sigiop)->sio_pgid > 0) */ { SLIST_REMOVE(&sigio->sio_proc->p_sigiolst, sigio, sigio, sio_pgsigio); } sigio->sio_pgid = 0; } /* * Free a list of sigio structures. * * called from the proc/pgid at teardown * proc/pgid must be locked */ void funsetownlst(sigiolst) struct sigiolst *sigiolst; { struct sigio *sigio; while ((sigio = SLIST_FIRST(sigiolst)) != NULL) funsetown(sigio); } Questions? Comments? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 13:33:54 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id DD4BE37B424 for ; Sun, 24 Sep 2000 13:33:51 -0700 (PDT) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.9.3/8.9.3) with SMTP id QAA46561; Sun, 24 Sep 2000 16:33:33 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sun, 24 Sep 2000 16:33:33 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Barry Pederson Cc: arch@freebsd.org Subject: Re: Snapshots in the Fast Filesystem In-Reply-To: <39CD0C1B.324AA1C5@geocities.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, 23 Sep 2000, Barry Pederson wrote: > Is there (or will there be) some way to get a list of snapshots that > have been created on a filesystem? Kirk suggests following a convention > for naming snapshot files, but if that doesn't happen for some reason, > it would be good to have some foolproof way of determining what snaps > exist. Otherwise, I suppose you could search a filesystem for files > that -appear- to be almost as large as the filesystem itself, but that > seems kind of a kludge - and I don't know if I'd want to trust a script > to interpret those results correctly. I won't address the other issues discussed in your email, although I do have some thoughts on them, but will address this one. Snapshot files have the SF_SNAPSHOT file flag set on them -- I believe this is not cleared by ufs_getattr() and hence is probably exposed via stat(). I'm not sure our ls -ol output understands the snapshot flag, but a custom modification to ls, or a manual tool for stating and identifying files with the flag set sounds like it should work. That said, I haven't tried this :-). Given that snapshots should only be created by privileged users, hopefully you won't have the opportunity to lose one. I've been creating my snapshots under /.snapshot on the file system, matching my /.attribute file for extended attributes. In future versions of snapshots, it might be spiffy to expose mounted snapshots of directories under a .snapshot directory in each subdirectory, in the style of NetApp. You can certainly imagine the current implementation permitting it, given sufficient boredom on the part of Kirk. Robert N M Watson robert@fledge.watson.org http://www.watson.org/~robert/ PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 TIS Labs at Network Associates, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 14: 1:58 2000 Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (tun.AwfulHak.org [194.242.139.173]) by hub.freebsd.org (Postfix) with ESMTP id E416B37B424; Sun, 24 Sep 2000 14:01:43 -0700 (PDT) Received: from hak.lan.Awfulhak.org (root@hak.lan.awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.0/8.11.0) with ESMTP id e8OKuvC15873; Sun, 24 Sep 2000 21:56:57 +0100 (BST) (envelope-from brian@hak.lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.0/8.11.0) with ESMTP id e8OKrJx29096; Sun, 24 Sep 2000 21:53:19 +0100 (BST) (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200009242053.e8OKrJx29096@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.1.1 10/15/1999 To: mjacob@feral.com Cc: Brian Somers , Greg Lehey , Chuck Paterson , Archie Cobbs , Joerg Micheel , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@FreeBSD.org, brian@Awfulhak.org Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest.c randomdev.c yarrow.c yarro) In-Reply-To: Message from Matthew Jacob of "Sun, 24 Sep 2000 09:37:27 PDT." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 24 Sep 2000 21:53:19 +0100 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > I agree - the idea of recursive mutices evil and should go, but the > > idea of an owner should not. It's nice to be able to write code that > > KASSERTs that it already owns a given mutex. > > I'm not sure I agree. Having lived through Solaris hell with recursive mutex > panics, I rather like the BSD/OS approach. > > Yes, possibly allows for sloppy coding. If you get rid of this, though, you > can extend the switchover and pain for SMP at least a year. Maybe a whinge rather than an ASSERT in the mutex code would be more appropriate. I've had recursive mutex panics in Solaris, and it meant I was doing something wrong. A panic was a bit harsh, but it still led me to note that I was misusing the kstat stuff and made me fix my code - something I wouldn't have done if it wasn't pointed out for me. > -matt -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 14:18:44 2000 Delivered-To: freebsd-arch@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id 7EF4F37B424; Sun, 24 Sep 2000 14:18:41 -0700 (PDT) Received: from newsguy.com (p04-dn01kiryunisiki.gunma.ocn.ne.jp [211.0.245.5]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id GAA24755; Mon, 25 Sep 2000 06:18:39 +0900 (JST) Message-ID: <39CE6F78.DF545ED@newsguy.com> Date: Mon, 25 Sep 2000 06:17:44 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: Robert Watson Cc: Barry Pederson , arch@FreeBSD.ORG Subject: Re: Snapshots in the Fast Filesystem References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Robert Watson wrote: > > I won't address the other issues discussed in your email, although I do > have some thoughts on them, but will address this one. Snapshot files > have the SF_SNAPSHOT file flag set on them -- I believe this is not > cleared by ufs_getattr() and hence is probably exposed via stat(). I'm > not sure our ls -ol output understands the snapshot flag, but a custom > modification to ls, or a manual tool for stating and identifying files > with the flag set sounds like it should work. That said, I haven't tried > this :-). In addition to ls, find could make good use of understanding said flag. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.secret.bsdconspiracy.net "I demand that my picture show a handsome face, even if it doesn't look like me." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 17:44:48 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 7372F37B424; Sun, 24 Sep 2000 17:44:46 -0700 (PDT) Received: from bird (bird.feral.com [192.67.166.155]) by feral.com (8.9.3/8.9.3) with ESMTP id RAA01181; Sun, 24 Sep 2000 17:44:17 -0700 Date: Sun, 24 Sep 2000 17:44:17 -0700 (PDT) From: Matthew Jacob Reply-To: mjacob@feral.com To: Brian Somers Cc: Greg Lehey , Chuck Paterson , Archie Cobbs , Joerg Micheel , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest.c randomdev.c yarrow.c yarro) In-Reply-To: <200009242053.e8OKrJx29096@hak.lan.Awfulhak.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > Maybe a whinge rather than an ASSERT in the mutex code would be more > appropriate. I've had recursive mutex panics in Solaris, and it > meant I was doing something wrong. A panic was a bit harsh, but it > still led me to note that I was misusing the kstat stuff and made me > fix my code - something I wouldn't have done if it wasn't pointed out > for me. Sure. And when we the network stack and CAM and the VFS layer are re-thought out to know how to deal with reentrancy, then I'll be happy to have non-recursive locks. You're missing the point. If you're on Solaris, you are making a mistake in your coding if you're recursing. If you're on FreeBSD, then too many things have still to be redesigned to make that claim. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 20:15:53 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 0A2E637B422; Sun, 24 Sep 2000 20:15:47 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id UAA10301; Sun, 24 Sep 2000 20:13:10 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpdAAAXaayeu; Sun Sep 24 20:13:08 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id UAA04888; Sun, 24 Sep 2000 20:15:30 -0700 (MST) From: Terry Lambert Message-Id: <200009250315.UAA04888@usr05.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest To: cp@bsdi.com (Chuck Paterson) Date: Mon, 25 Sep 2000 03:15:30 +0000 (GMT) Cc: grog@wantadilla.lemis.com (Greg Lehey), archie@whistle.com (Archie Cobbs), brian@awfulhak.org (Brian Somers), joerg@cs.waikato.ac.nz (Joerg Micheel), mjacob@feral.com (Matthew Jacob), frank@exit.com (Frank Mayhar), jhb@pike.osd.bsdi.com (John Baldwin), markm@FreeBSD.ORG (Mark Murray), FreeBSD-arch@FreeBSD.ORG In-Reply-To: <200009241948.NAA25438@berserker.bsdi.com> from "Chuck Paterson" at Sep 24, 2000 01:48:45 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > First a general comment. The main reason to not hold a mutex across > an async event is not because it won't work, but because it means > that we loose the ability to detect dead locks. If process A holds > mutex bar during a wait for async event, such as msleep(), then it > becomes a requirment that the process which is going to wake up > process A doesn't block on mutex foo, or have any dependencies even > many removed on something that requires mutex bar. Yes. The appropriate tool for doing this type of thing is a condition variable. The condition is tested under mutex protection. If false, the thread blocks on the variable and atomically releases the mutex. When the condition is satisfied, the variable is changed (again, under the protection of the mutex), and one or more threads waiting on the condition are signalled. The thread(s) signalled will attempt to reacquire the mutex, and, when successful, examine the variable, and take appropriate action, which might be to go back to sleep, if the condition is no longer satisfied, due to a lost race. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 20:18:49 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 4875937B43F for ; Sun, 24 Sep 2000 20:18:30 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id UAA00568 for ; Sun, 24 Sep 2000 20:17:03 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpdAAAN1aicb; Sun Sep 24 20:16:51 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id UAA04938 for arch@FreeBSD.ORG; Sun, 24 Sep 2000 20:18:12 -0700 (MST) From: Terry Lambert Message-Id: <200009250318.UAA04938@usr05.primenet.com> Subject: Re: Mutexes and semaphores To: arch@FreeBSD.ORG Date: Mon, 25 Sep 2000 03:18:12 +0000 (GMT) In-Reply-To: <200009241833.LAA00463@vashon.polstra.com> from "John Polstra" at Sep 24, 2000 11:33:23 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > There are plenty of reasonable situations where you have a block of > code (say, a function) and a certain mutex needs to be locked while > it executes. The function might be called from several different > places. Maybe all of the call sites already hold the mutex, and > maybe they don't. Maybe it is hard to say for sure. Maybe new calls > will be added in the future which will add further uncertainty. With > recursive mutexes you can make the code robust by locking the mutex > inside the called function. This robustness is certain and it is > independent of what is going on in the rest of the system. This is evil. You are using a mutex to protect code, when you should be using it to protect data. If you want to protect code, you should use a semaphore, instead. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 20:31: 0 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 0207937B424; Sun, 24 Sep 2000 20:30:57 -0700 (PDT) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id UAA16383; Sun, 24 Sep 2000 20:27:46 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp02.primenet.com, id smtpdAAA.TaGOF; Sun Sep 24 20:27:31 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id UAA05163; Sun, 24 Sep 2000 20:30:08 -0700 (MST) From: Terry Lambert Message-Id: <200009250330.UAA05163@usr05.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: mjacob@feral.com Date: Mon, 25 Sep 2000 03:30:08 +0000 (GMT) Cc: brian@Awfulhak.org (Brian Somers), grog@wantadilla.lemis.com (Greg Lehey), cp@bsdi.com (Chuck Paterson), archie@whistle.com (Archie Cobbs), joerg@cs.waikato.ac.nz (Joerg Micheel), frank@exit.com (Frank Mayhar), jhb@pike.osd.bsdi.com (John Baldwin), markm@FreeBSD.ORG (Mark Murray), FreeBSD-arch@FreeBSD.ORG In-Reply-To: from "Matthew Jacob" at Sep 24, 2000 05:44:17 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Sure. And when we the network stack and CAM and the VFS layer are re-thought > out to know how to deal with reentrancy, then I'll be happy to have > non-recursive locks. This is easy: mark them non-reentrant. You can either acquire a mutex on descent into them and release it on exit/sleep, or (and this is better), have a per-module mutex that's acquired on the descent/wakeup and released on the ascent, if the flag is present. This will let the modules be corrected on a per FS and per CAM driver basis, while maintaining legacy compatability. We do not need another ethnic clensing of the drivers, such as what we went through when CAM went in, or when the X.25 and ISODE stuff was murdered. > You're missing the point. If you're on Solaris, you are making a mistake in > your coding if you're recursing. If you're on FreeBSD, then too many things > have still to be redesigned to make that claim. I think he understands that, I just think he's unwilling to live with a kludge, which will have no incentive to be de-kludged, as it wouldn't actually not work. It's much better to be able to _know_ what code is OK and what code isn't, instead of pretending that it's all OK, when it's not. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 20:39:31 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 61DF437B424; Sun, 24 Sep 2000 20:39:28 -0700 (PDT) Received: from bird (bird.feral.com [192.67.166.155]) by feral.com (8.9.3/8.9.3) with ESMTP id UAA01491; Sun, 24 Sep 2000 20:38:56 -0700 Date: Sun, 24 Sep 2000 20:38:56 -0700 (PDT) From: Matthew Jacob Reply-To: mjacob@feral.com To: Terry Lambert Cc: Brian Somers , Greg Lehey , Chuck Paterson , Archie Cobbs , Joerg Micheel , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009250330.UAA05163@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > This is easy: mark them non-reentrant. You can either acquire a > mutex on descent into them and release it on exit/sleep, or (and > this is better), have a per-module mutex that's acquired on the > descent/wakeup and released on the ascent, if the flag is present. > This will let the modules be corrected on a per FS and per CAM > driver basis, while maintaining legacy compatability. We do not > need another ethnic clensing of the drivers, such as what we went > through when CAM went in, or when the X.25 and ISODE stuff was > murdered. Hmm, but I sure don't want the pain of the 'unsafe_driver' mutex that Sun went thru. Still- your point has a lot of maerit. > > > > You're missing the point. If you're on Solaris, you are making a mistake in > > your coding if you're recursing. If you're on FreeBSD, then too many things > > have still to be redesigned to make that claim. > > I think he understands that, I just think he's unwilling to live > with a kludge, which will have no incentive to be de-kludged, as > it wouldn't actually not work. Whatever... :-) > > It's much better to be able to _know_ what code is OK and what > code isn't, instead of pretending that it's all OK, when it's not. Aw, that's not what I was getting at. I think getting the current set going should be allowed to proceed as is. If there is a roadmap for strengthening the semantics, great. Just don't make the bar too high at first. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 21:12:43 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 89BC937B422; Sun, 24 Sep 2000 21:12:33 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id UAA06514; Sun, 24 Sep 2000 20:00:28 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpdAAAwnaWqm; Sun Sep 24 20:00:07 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id UAA04620; Sun, 24 Sep 2000 20:02:13 -0700 (MST) From: Terry Lambert Message-Id: <200009250302.UAA04620@usr05.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest To: grog@wantadilla.lemis.com (Greg Lehey) Date: Mon, 25 Sep 2000 03:02:13 +0000 (GMT) Cc: cp@bsdi.com (Chuck Paterson), archie@whistle.com (Archie Cobbs), brian@awfulhak.org (Brian Somers), joerg@cs.waikato.ac.nz (Joerg Micheel), mjacob@feral.com (Matthew Jacob), frank@exit.com (Frank Mayhar), jhb@pike.osd.bsdi.com (John Baldwin), markm@FreeBSD.ORG (Mark Murray), FreeBSD-arch@FreeBSD.ORG In-Reply-To: <20000924154216.D512@wantadilla.lemis.com> from "Greg Lehey" at Sep 24, 2000 03:42:16 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >> A MUTEX is just a sepaphore whose initial count is 1. > >> > >> ?? > > > > In general this might be true, but in specific it isn't. > > As you know, I used to say exactly the same thing as Archie, but I've > realized that this implied count of 1 causes a couple of important > differences. I'm still working on a clearer definition, but what I've > seen so far is: > > 1. Because "mutexes" (I really hate this term; I wish I could find a > better one) only have an implied count of one, they can also have > the concept of an owner, which we use. > > 2. Because the mutex has an owner, only the owner can release it. > > 3. The mutex can also be "recursive" (it's really iterative, I > suppose): the owner can take it several times. The only reason > for this appears to be sloppy coding, but in the short term I > think we're agreed that we can't dispose of that. > > One thing that I don't think is important is the duration of > ownership. We currently use mutexes for short periods of time, which > is why we have the spin version. Actually, that's crucial, since it defines the conflict domain; you can acquire a heavy lock after contending with a spin lock for the right to acquire the heavy lock. In most cases, where the heavy lock is held a short time, you won't have any contention, and thus can quickly grant the resource. In the case of a long held resource, the contention domain is such that the resource is probably contended, and has waiters outstanding; this means that the release case (and thus the acquisition case) must be much heavier weight. I recommend: This second is a rather good glossary, which is duplicated in many places on the net. > At Tandem, we only used semaphores, but they always had a count of 1, > so they were effectively very close to our mutexes. They didn't allow > recursion, which is the Right Thing in a system designed from the > ground up, but they also didn't have owners. One of the most frequent > complicated problems we had were system hangs (deadlocks), and we > frequently couldn't figure out who had done what and why. Having > owners is a great debug aid. I think that we need to be very clear on one thing: you can recurse on a semaphore, but a true mutex will not permit recursion; it is a light weight object, and has very little content. It lacks a recurse count, and many other attributes of semaphores. Microsoft actually got this right in Windows, surprisingly. When you attempt to get a mutex you already hold, you are shooting yourself in the foot; it means that you didn't track the resource sufficiently. Usually this occurs when a mutex is acquired at one level, and released at another, or worse, when it is acquired in the wrong place (e.g. a subroutine called several times from a higher level routine, which should be acquiring the mutex instead). Disallowing recursion, mutex ownership is therefore implicit by virtue of the holder of the mutex holding it. In the case of a starvation or deadly embrace deadlock, one need only get a stack trace of processes currently in the kernel to determine where the problem lives; however, an owner would make this rather automatic, and could aid debugging, as you say. I do have a problem with this approach, however, since it makes it much more likely that people will be sloppy, and then wait for deadlocks to be reported, rather than thinking through their code and ensuring that deadlocks are not possible in the first place. The idea that fixing deadlocks in released code, rather than releasing only code without deadlocks, is an acceptable approach needs to be discouraged. > If we can expect that the mutex will, on average, be freed in less > time than it would take to schedule a new process, spin locks can be a > better alternative. Otherwise we wouldn't need them at all. > > Anyway, this doesn't directly relate to semaphores. We have the basic > issue of atomicity, which in general can be handled without spin > locks, and that would apply to semaphores just as much as to mutexes. The advantage to a test-and-set spin prior to acquisition of a mutex is that the mutex can be acquired without taking a cache synchronization hit between processors, which would otherwise be necessary. Some cache synchronization events will inevitably occur, but they will be much less frequent. The mutexes themselves can be in non-cached pages, to accomplish this. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Sep 24 22:33:34 2000 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843290.broadbandoffice.net [64.47.83.26]) by hub.freebsd.org (Postfix) with ESMTP id 7FA8437B43C for ; Sun, 24 Sep 2000 22:33:31 -0700 (PDT) Received: (from dillon@localhost) by earth.backplane.com (8.11.0/8.9.3) id e8P5XKg79352; Sun, 24 Sep 2000 22:33:20 -0700 (PDT) (envelope-from dillon) Date: Sun, 24 Sep 2000 22:33:20 -0700 (PDT) From: Matt Dillon Message-Id: <200009250533.e8P5XKg79352@earth.backplane.com> To: John Polstra Cc: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores References: <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org> <200009241833.LAA00463@vashon.polstra.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :In article <200009241026.e8OAQVx26206@hak.lan.Awfulhak.org>, :Brian Somers wrote: :> > 3. The mutex can also be "recursive" (it's really iterative, I :> > suppose): the owner can take it several times. The only reason :> > for this appears to be sloppy coding, but in the short term I :> > think we're agreed that we can't dispose of that. :> :> I agree - the idea of recursive mutices evil and should go, but the :> idea of an owner should not. It's nice to be able to write code that :> KASSERTs that it already owns a given mutex. : :I disagree that recursive mutexes are bad, and I don't think "sloppy :coding" is the right way to look at them. I would argue that :recursive mutexes allow robust code to be written based solely on :knowledge of the immediately surrounding code, and that is a Good :Thing. : :There are plenty of reasonable situations where you have a block of :code (say, a function) and a certain mutex needs to be locked while :it executes. The function might be called from several different :places. Maybe all of the call sites already hold the mutex, and :maybe they don't. Maybe it is hard to say for sure. Maybe new calls :will be added in the future which will add further uncertainty. With :recursive mutexes you can make the code robust by locking the mutex :inside the called function. This robustness is certain and it is :independent of what is going on in the rest of the system. : :Just look at the traditional kernel with respect to the spl*() calls. :... I gotta gree with John on this. Recursive mutexes can be coded properly. The best example of this is when you have a module which implements an API, and to simplify the code you want one API function to call another in the same module. The case where one API function may wish to call another is one that occurs quite often in the kernel. For example, managing ref counts on objects. If you don't have recursive mutexes, then you do not have the ability to call your own API recursively (at least not without creating a mess). You are instead forced to split the API into a high-level and a low-level piece in order to be able to bypass the high-level piece. Yuch. The syscall API is a good example of what happens when you can't call your own API. For the FreeBSD kernel (and most UNIX kernels that I know), it is relatively dangerous for one system call to call another system call's entry point. The inability has created a mess out of things like NFS and other code elements that use internal descriptors. The last embedded OS I did allowed system calls to make system calls and it was like night and day. Things like in-kernel high-level descriptor use became trivial. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 1:28:37 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 1E1DF37B424; Mon, 25 Sep 2000 01:28:34 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id BAA05586; Mon, 25 Sep 2000 01:26:56 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp03.primenet.com, id smtpdAAARlaOZk; Mon Sep 25 01:26:46 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id BAA12659; Mon, 25 Sep 2000 01:27:56 -0700 (MST) From: Terry Lambert Message-Id: <200009250827.BAA12659@usr02.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: mjacob@feral.com Date: Mon, 25 Sep 2000 08:27:55 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), brian@Awfulhak.org (Brian Somers), grog@wantadilla.lemis.com (Greg Lehey), cp@bsdi.com (Chuck Paterson), archie@whistle.com (Archie Cobbs), joerg@cs.waikato.ac.nz (Joerg Micheel), frank@exit.com (Frank Mayhar), jhb@pike.osd.bsdi.com (John Baldwin), markm@FreeBSD.ORG (Mark Murray), FreeBSD-arch@FreeBSD.ORG In-Reply-To: from "Matthew Jacob" at Sep 24, 2000 08:38:56 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > This is easy: mark them non-reentrant. You can either acquire a > > mutex on descent into them and release it on exit/sleep, or (and > > this is better), have a per-module mutex that's acquired on the > > descent/wakeup and released on the ascent, if the flag is present. > > This will let the modules be corrected on a per FS and per CAM > > driver basis, while maintaining legacy compatability. We do not > > need another ethnic clensing of the drivers, such as what we went > > through when CAM went in, or when the X.25 and ISODE stuff was > > murdered. > > Hmm, but I sure don't want the pain of the 'unsafe_driver' mutex that Sun went > thru. Still- your point has a lot of maerit. One of the best things that UnixWare had going for it was the ability to continue using legacy drivers, file systems, and streams stacks, while those components that were reentrant were capable of giving better performance. UnixWare on a UP box was capable of 30% better performance, even after all of the SMP overhead, simply because the system was mostly reentrant (this with the terrible hit that the network stack took trying to use ODI drivers). I think that no matter how you slice it, it has to be possible for something to be done right, and to tell the difference between those things that are and aren't reentrant, easily and unequivocally. With the suggested mutex recursion (please -- use a counting semaphore, not a mutex, if you are going to permit recursion!), the only way would be to instrument the mutex acquisition macro to whine to the console any time the count increments after it reaches a value of 1. If you are willing to whine about recursion, then I suppose that having recursion would not be that bad; but turn off the whining, and there's little incentive to fix it, since to many people's minds, it won't be broken. 8-(. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 3: 4:46 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 4689837B422 for ; Mon, 25 Sep 2000 03:04:43 -0700 (PDT) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id GAA16555; Mon, 25 Sep 2000 06:04:18 -0400 (EDT) Date: Mon, 25 Sep 2000 06:04:18 -0400 (EDT) From: Daniel Eischen To: Matt Dillon Cc: John Polstra , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-Reply-To: <200009250533.e8P5XKg79352@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 24 Sep 2000, Matt Dillon wrote: > > I gotta gree with John on this. Recursive mutexes can be coded > properly. The best example of this is when you have a module which > implements an API, and to simplify the code you want one API function > to call another in the same module. > > The case where one API function may wish to call another is one that > occurs quite often in the kernel. For example, managing ref counts > on objects. If you don't have recursive mutexes, then you do not > have the ability to call your own API recursively (at least not > without creating a mess). You are instead forced to split the API > into a high-level and a low-level piece in order to be able to bypass the > high-level piece. Yuch. > > The syscall API is a good example of what happens when you can't > call your own API. For the FreeBSD kernel (and most UNIX kernels that I know), > it is relatively dangerous for one system call to call another system call's > entry point. The inability has created a mess out of things like NFS and > other code elements that use internal descriptors. The last embedded OS I > did allowed system calls to make system calls and it was like night > and day. Things like in-kernel high-level descriptor use became trivial. Mutexes should protect data. If you want to allow recursive ownership of data, then keep your own owner and ref count field in the protected data and use the mutex properly (release it after setting the owner or incrementing the ref count). You don't need to hold the mutex, and now you can use the same mutex for msleep/cv_wait. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 4:56:33 2000 Delivered-To: freebsd-arch@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id 8730137B42C; Mon, 25 Sep 2000 04:56:30 -0700 (PDT) Received: from newsguy.com (p11-dn02kiryunisiki.gunma.ocn.ne.jp [211.0.245.76]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id UAA25291; Mon, 25 Sep 2000 20:55:35 +0900 (JST) Message-ID: <39CF3CFF.47E3E8F6@newsguy.com> Date: Mon, 25 Sep 2000 20:54:39 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: Terry Lambert Cc: Greg Lehey , Chuck Paterson , Archie Cobbs , Brian Somers , Joerg Micheel , Matthew Jacob , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest References: <200009250302.UAA04620@usr05.primenet.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry Lambert wrote: > > In the case of a starvation or deadly embrace deadlock, one need > only get a stack trace of processes currently in the kernel to > determine where the problem lives; however, an owner would make > this rather automatic, and could aid debugging, as you say. I do > have a problem with this approach, however, since it makes it much > more likely that people will be sloppy, and then wait for deadlocks > to be reported, rather than thinking through their code and ensuring > that deadlocks are not possible in the first place. The idea that Just in case you haven't noticed, you just defended lack of debugging aids on the grounds that people will code better in their absence. Let's take this opportunity and make us completely incompatible with gdb too. Without gdb, people will have to think much better about their code, since debugging will be very hard. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.secret.bsdconspiracy.net "I demand that my picture show a handsome face, even if it doesn't look like me." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 8:13:53 2000 Delivered-To: freebsd-arch@freebsd.org Received: from tinker.exit.com (tinker.exit.com [206.223.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 0E96D37B422 for ; Mon, 25 Sep 2000 08:13:51 -0700 (PDT) Received: from realtime.exit.com (realtime [206.223.0.5]) by tinker.exit.com (8.11.0/8.11.0) with ESMTP id e8PFDjO08881; Mon, 25 Sep 2000 08:13:45 -0700 (PDT) (envelope-from frank@exit.com) Received: (from frank@localhost) by realtime.exit.com (8.11.0/8.11.0) id e8PFET802275; Mon, 25 Sep 2000 08:14:29 -0700 (PDT) (envelope-from frank) From: Frank Mayhar Message-Id: <200009251514.e8PFET802275@realtime.exit.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009250827.BAA12659@usr02.primenet.com> from Terry Lambert at "Sep 25, 2000 08:27:55 am" To: Terry Lambert Date: Mon, 25 Sep 2000 07:59:12 -0700 (PDT) Cc: mjacob@feral.com, Brian Somers , Greg Lehey , Chuck Paterson , Archie Cobbs , Joerg Micheel , John Baldwin , Mark Murray , FreeBSD-arch@FreeBSD.ORG.ORG Reply-To: frank@exit.com Organization: Exit Consulting X-Copyright0: Copyright 2000 Frank Mayhar. All Rights Reserved. X-Copyright1: Permission granted for electronic reproduction as Usenet News or email only. X-Mailer: ELM [version 2.4ME+ PL68 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry Lambert wrote: > With the suggested mutex recursion (please -- use a counting > semaphore, not a mutex, if you are going to permit recursion!), That's basically what it is, more or less. > If you are willing to whine about recursion, then I suppose > that having recursion would not be that bad; but turn off > the whining, and there's little incentive to fix it, since > to many people's minds, it won't be broken. 8-(. Well, I can't speak for FreeBSD, but as far as BSD/OS goes, I plan to fix this stuff. I cut my teeth on SVR4.2 ES/MP, so I'm not used to recursive locks anyway, and I quite agree that if the code _needs_ a recursive lock, there's more going on there and the possibility of deadlocks is high. My code doesn't use recursive locks. Yeah, it's more work, but it's well worth it in the long run. I think it's a relatively small price to pay for long-term reliability and for not needing to go back and reexamine everything down the road a bit. (I hope this makes sense; I haven't had my coffee yet. :-/) -- Frank Mayhar frank@exit.com http://www.exit.com/ Exit Consulting http://store.exit.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 9:16: 7 2000 Delivered-To: freebsd-arch@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id DDC2D37B424; Mon, 25 Sep 2000 09:16:04 -0700 (PDT) Received: from bird (bird.feral.com [192.67.166.155]) by feral.com (8.9.3/8.9.3) with ESMTP id JAA03796; Mon, 25 Sep 2000 09:15:20 -0700 Date: Mon, 25 Sep 2000 09:15:20 -0700 (PDT) From: Matthew Jacob Reply-To: mjacob@feral.com To: "Daniel C. Sobral" Cc: Terry Lambert , Greg Lehey , Chuck Paterson , Archie Cobbs , Brian Somers , Joerg Micheel , Frank Mayhar , John Baldwin , Mark Murray , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest In-Reply-To: <39CF3CFF.47E3E8F6@newsguy.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > Let's take this opportunity and make us completely incompatible with gdb > too. Without gdb, people will have to think much better about their > code, since debugging will be very hard. Since it usually doesn't work on the alpha, it won't be much of a difference to me. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 9:59:47 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id 4966C37B42C for ; Mon, 25 Sep 2000 09:59:45 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id JAA15818; Mon, 25 Sep 2000 09:59:38 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id JAA02227; Mon, 25 Sep 2000 09:59:37 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Mon, 25 Sep 2000 09:59:37 -0700 (PDT) Message-Id: <200009251659.JAA02227@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Cc: tlambert@primenet.com Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009250827.BAA12659@usr02.primenet.com> References: <200009250827.BAA12659@usr02.primenet.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <200009250827.BAA12659@usr02.primenet.com>, Terry Lambert wrote: > With the suggested mutex recursion (please -- use a counting > semaphore, not a mutex, if you are going to permit recursion!), Please explain why you think a counting semaphore has anything to do with recursion. To support recursion a mutual exclusion primitive has to support the concept of ownership. I.e., if you already own it you can acquire it recursively, but if somebody else owns it then you cannot. A counting semaphore does not support that concept. The count is not a recursion count at all. Search google for "counting semaphore" and you'll find any number of introductory class notes on semaphores. Or cut right to the chase and go to a typical one at http://www.erc.msstate.edu/~ioana/POWERPOINT/CS4163/slides/Threads/tsld022.htm John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 10: 5:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id 7462237B424 for ; Mon, 25 Sep 2000 10:05:47 -0700 (PDT) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id e8PH5ki40585; Mon, 25 Sep 2000 10:05:46 -0700 (PDT) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.0/8.11.0) id e8PH3rn36503; Mon, 25 Sep 2000 10:03:53 -0700 (PDT) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Mon, 25 Sep 2000 10:03:53 -0700 (PDT) Organization: BSD, Inc. From: John Baldwin To: Daniel Eischen Subject: Re: Mutexes and semaphores Cc: arch@FreeBSD.ORG Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 24-Sep-00 Daniel Eischen wrote: > On Sun, 24 Sep 2000, John Polstra wrote: > But you can't then use a recursive mutex in conjunction with msleep > (cv_wait) which forces you to use yet another mutex. This is fine, > but it adds confusion for the programmer. This is a problem. However, for one thing we currently have a KASSERT() that panic's if you msleep() on a recursed mutex. However, one could also change msleep() to function like mi_switch() does with Giant and have it fully release the lock before sleeping, but this probably would not be a Good Thing. > Another thing, is in > our support for recursive mutexes is that they make the calling > conventions overly complex (with the silly flag argumuents to > mtx_enter()). Uhhh. With the exception of the mtx_enter() for sched_lock in mi_switch() that specifies M_RLIKELY, all of the mutex flags currently in use have _nothing_ to do with recursion. MTX_DEF/MTX_SPIN are used to distinguish spin locks from sleep locks. The use of those flags is another matter for discussion, but the flags have very, very little to do with recursion. > If we are going to support recursive mutex, I think it would be > better to add separate calls/macros/data types to support them, > so the the mtx mutexes can be simplified. Calls to mtx_enter > with the recursive mutex type wouldn't even compile. Err, the recursive nature of the mutexes is very trivial. It doesn't affect the complexity of the mutexes at all. Most of the "complexity" in the mutex code lies in putting processes to sleep and waking them back up again for sleep locks, and in the currently broken and disabled code to propagate a sleeping process' priority to the process holding the mutex it is waiting for. > My $0.02 for what it's worth... > > -- > Dan Eischen -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.Baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 10:24:47 2000 Delivered-To: freebsd-arch@freebsd.org Received: from field.videotron.net (field.videotron.net [205.151.222.108]) by hub.freebsd.org (Postfix) with ESMTP id 84B6137B422; Mon, 25 Sep 2000 10:24:30 -0700 (PDT) Received: from modemcable136.203-201-24.mtl.mc.videotron.ca ([24.201.203.136]) by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G1G00CEDDOINM@field.videotron.net>; Mon, 25 Sep 2000 13:24:19 -0400 (EDT) Date: Mon, 25 Sep 2000 13:28:03 -0400 (EDT) From: Bosko Milekic Subject: Re: need advice, fsetown annoyances and mpsafeness. In-reply-to: <20000924125311.Q9141@fw.wintelcom.net> To: Alfred Perlstein Cc: arch@FreeBSD.ORG, cp@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 24 Sep 2000, Alfred Perlstein wrote: > It's really a lot more evil than you think. Yeah, I noticed after sending the Email. > The race is in the object (socket/tty) checking the pointer and > then dereferencing it. > > A broken solution is to lock the sigio struct or provide a backreference > to the socket/tty lock, after banging my head against my desk for some time I came across this solution: This looks somewhat like what you mentionned in (2) in your earlier post. The sigio struct will only be freed by the object. I think this is a reasonable solution. > (assuming pfind/pgfind return the proc/pgrp locked) > [...] > /* > * ok this is somewhat tricky, we examine what the sigio is attached > * to, whatever it is proc/pgrp we need to use the search functions > * to ensure atomicity. If we get back ESRCH that's ok, that means > * we lost the race, just free it. > * if we get back a pointer we then need to make sure that the pgid > * hasn't been NULLed out because we lost the race between looking > * at the sigio and locking the proc/pgrp > * (most likely pid/pgid wraparound) > */ > pid = sigio->sio_pgid; > > if (pid < 0) { > struct pgrp *p; > > if ((pgrp = pgfind(pid)) != NULL) { > /* funsetown_proc would have set this to zero */ > if (sigio->sio_pgid != 0) > SLIST_REMOVE(&sigio->sio_pgrp->pg_sigiolst, sigio, > sigio, sio_pgsigio); > PGRP_UNLOCK(&sigio->sio_pgrp); > } > } else if (pid > 0) { > struct proc *p; > > if ((p = pfind(pid)) != NULL) { > /* funsetown_proc would have set this to zero */ > if (sigio->sio_pgid != 0) > SLIST_REMOVE(&sigio->sio_proc->p_sigiolst, sigio, > sigio, sio_pgsigio); > PROC_UNLOCK(&sigio->sio_proc); > } > } > > out: > crfree(sigio->sio_ucred); > FREE(sigio, M_SIGIO); > } Looks good. > /* > * NULL out a sigio struct attached to a process/pgrp > * must be called with the object (struct proc/pgrp) locked > * this is to be called from the perspective of the process/pgrp > * > * called from the proc/pgid at teardown > * proc/pgid must be locked > */ > void > funsetown_proc(sigio) > struct sigio *sigio; > { > int s; > > if (sigio == NULL) > return; > if (sigio->sio_pgid < 0) { > SLIST_REMOVE(&sigio->sio_pgrp->pg_sigiolst, sigio, > sigio, sio_pgsigio); > } else /* if ((*sigiop)->sio_pgid > 0) */ { > SLIST_REMOVE(&sigio->sio_proc->p_sigiolst, sigio, > sigio, sio_pgsigio); > } > sigio->sio_pgid = 0; > } > > /* > * Free a list of sigio structures. > * > * called from the proc/pgid at teardown > * proc/pgid must be locked > */ > void > funsetownlst(sigiolst) > struct sigiolst *sigiolst; > { > struct sigio *sigio; > > while ((sigio = SLIST_FIRST(sigiolst)) != NULL) > funsetown(sigio); > } > > > Questions? Comments? Question: You don't seem to be protecting the actual sigiolst list with a lock. What happens if you've got two different processes manipulating the list? Each one may be locked, but regardless, your list can still be trashed. > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > "I have the heart of a child; I keep it in a jar on my desk." Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 10:33:10 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 1ED8B37B422; Mon, 25 Sep 2000 10:33:08 -0700 (PDT) Received: from modemcable136.203-201-24.mtl.mc.videotron.ca ([24.201.203.136]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G1G00033E320Y@falla.videotron.net>; Mon, 25 Sep 2000 13:33:03 -0400 (EDT) Date: Mon, 25 Sep 2000 13:36:47 -0400 (EDT) From: Bosko Milekic Subject: Re: need advice, fsetown annoyances and mpsafeness. In-reply-to: To: Alfred Perlstein Cc: arch@FreeBSD.ORG, cp@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 25 Sep 2000, Bosko Milekic wrote: > Question: You don't seem to be protecting the actual sigiolst list > with a lock. What happens if you've got two different processes > manipulating the list? Each one may be locked, but regardless, your list > can still be trashed. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nevermind, please disregard. *blushes* Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 10:36:23 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 415C837B422; Mon, 25 Sep 2000 10:36:22 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8PHaKS29850; Mon, 25 Sep 2000 10:36:20 -0700 (PDT) Date: Mon, 25 Sep 2000 10:36:20 -0700 From: Alfred Perlstein To: Bosko Milekic Cc: arch@FreeBSD.ORG, cp@FreeBSD.ORG Subject: Re: need advice, fsetown annoyances and mpsafeness. Message-ID: <20000925103620.W9141@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bmilekic@technokratis.com on Mon, Sep 25, 2000 at 01:36:47PM -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Bosko Milekic [000925 10:33] wrote: > > On Mon, 25 Sep 2000, Bosko Milekic wrote: > > > Question: You don't seem to be protecting the actual sigiolst list > > with a lock. What happens if you've got two different processes > > manipulating the list? Each one may be locked, but regardless, your list > > can still be trashed. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Nevermind, please disregard. > > *blushes* You understand that it's blocked by the lock on the process/pgrp right? :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 11: 0:10 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 54B7337B446; Mon, 25 Sep 2000 11:00:03 -0700 (PDT) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id NAA21201; Mon, 25 Sep 2000 13:59:45 -0400 (EDT) Date: Mon, 25 Sep 2000 13:59:45 -0400 (EDT) From: Daniel Eischen To: John Baldwin Cc: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 25 Sep 2000, John Baldwin wrote: > > On 24-Sep-00 Daniel Eischen wrote: > > On Sun, 24 Sep 2000, John Polstra wrote: > > But you can't then use a recursive mutex in conjunction with msleep > > (cv_wait) which forces you to use yet another mutex. This is fine, > > but it adds confusion for the programmer. > > This is a problem. However, for one thing we currently have a > KASSERT() that panic's if you msleep() on a recursed mutex. However, > one could also change msleep() to function like mi_switch() does > with Giant and have it fully release the lock before sleeping, but > this probably would not be a Good Thing. A compile error is much better than a kernel panic. > > > Another thing, is in > > our support for recursive mutexes is that they make the calling > > conventions overly complex (with the silly flag argumuents to > > mtx_enter()). > > > Uhhh. With the exception of the mtx_enter() for sched_lock in > mi_switch() that specifies M_RLIKELY, all of the mutex flags > currently in use have _nothing_ to do with recursion. > MTX_DEF/MTX_SPIN are used to distinguish spin locks from sleep > locks. The use of those flags is another matter for discussion, > but the flags have very, very little to do with recursion. One of the reasons given for the mutex macros and flags is that the mutex type/options can be given without having to check the type/options in the mutex structure. If this isn't true, then get rid of the hideous flags to mtx_enter and mtx_exit. Optimize for a free lock, and take the hit and call a C program if the lock is held to check the mutex type and do the appropriate thing. > > > If we are going to support recursive mutex, I think it would be > > better to add separate calls/macros/data types to support them, > > so the the mtx mutexes can be simplified. Calls to mtx_enter > > with the recursive mutex type wouldn't even compile. > > Err, the recursive nature of the mutexes is very trivial. It > doesn't affect the complexity of the mutexes at all. Most of > the "complexity" in the mutex code lies in putting processes to > sleep and waking them back up again for sleep locks, and in the > currently broken and disabled code to propagate a sleeping > process' priority to the process holding the mutex it is waiting > for. I still claim that recursive mutexes should not be supported by our standard kernel mutex. If you want to add another set of data types and functions for recursive mutexes, OK fine. But with proper coding techniques, I don't see the need to hold a mutex after fiddling with whatever data item is being protected. Take the mutex, set the owner or increase the ref count held in the data item to be protected, and then release the mutex either with mtx_exit() or msleep(). -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 12:31:57 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 485D337B440 for ; Mon, 25 Sep 2000 12:31:30 -0700 (PDT) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id MAA19775; Mon, 25 Sep 2000 12:28:37 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp02.primenet.com, id smtpdAAAimaiAM; Mon Sep 25 12:28:20 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id MAA29117; Mon, 25 Sep 2000 12:31:09 -0700 (MST) From: Terry Lambert Message-Id: <200009251931.MAA29117@usr02.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: arch@freebsd.org Date: Mon, 25 Sep 2000 19:31:09 +0000 (GMT) Cc: tlambert@primenet.com In-Reply-To: <200009251659.JAA02227@vashon.polstra.com> from "John Polstra" at Sep 25, 2000 09:59:37 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > With the suggested mutex recursion (please -- use a counting > > semaphore, not a mutex, if you are going to permit recursion!), > > Please explain why you think a counting semaphore has anything to do > with recursion. To support recursion a mutual exclusion primitive > has to support the concept of ownership. I.e., if you already own > it you can acquire it recursively, but if somebody else owns it then > you cannot. A counting semaphore does not support that concept. The > count is not a recursion count at all. Search google for "counting > semaphore" and you'll find any number of introductory class notes on > semaphores. Or cut right to the chase and go to a typical one at > > http://www.erc.msstate.edu/~ioana/POWERPOINT/CS4163/slides/Threads/tsld022.htm Recursion should be such an exceptional condition that it should be implemented with a seperate struct and a counting semaphore. Counting semaphores have owners,; mutexes do not. Therefore they are a more appropriate primitive. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 12:38:44 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id BF13E37B424; Mon, 25 Sep 2000 12:38:33 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id MAA08146; Mon, 25 Sep 2000 12:38:48 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdAAAbdaa0p; Mon Sep 25 12:38:42 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id MAA29311; Mon, 25 Sep 2000 12:38:23 -0700 (MST) From: Terry Lambert Message-Id: <200009251938.MAA29311@usr02.primenet.com> Subject: Re: Mutexes and semaphores To: jhb@FreeBSD.ORG (John Baldwin) Date: Mon, 25 Sep 2000 19:38:22 +0000 (GMT) Cc: eischen@vigrid.com (Daniel Eischen), arch@FreeBSD.ORG In-Reply-To: from "John Baldwin" at Sep 25, 2000 10:03:53 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > But you can't then use a recursive mutex in conjunction with msleep > > (cv_wait) which forces you to use yet another mutex. This is fine, > > but it adds confusion for the programmer. > > This is a problem. However, for one thing we currently have a > KASSERT() that panic's if you msleep() on a recursed mutex. However, > one could also change msleep() to function like mi_switch() does > with Giant and have it fully release the lock before sleeping, but > this probably would not be a Good Thing. No. It would not be a good thing. Consider that I may be sleeping on the acquisition of the third out of three mutexes. > > If we are going to support recursive mutex, I think it would be > > better to add separate calls/macros/data types to support them, > > so the the mtx mutexes can be simplified. Calls to mtx_enter > > with the recursive mutex type wouldn't even compile. > > Err, the recursive nature of the mutexes is very trivial. It > doesn't affect the complexity of the mutexes at all. Yes, it does. Ownership precludes hand-off. Recusrion support implies permission and tacit approval. A mutex is not recursive. There are things you simply can not implement when recursion is permitted for all of your primitives. The most obvious argument is still that a mutex is intended to protect data, not code. Recursion is only required if the mutex is actually protecting reentrancy of code, not access to data. How would you implement vop_lookup() using a recusring mutex; considering the ownership handoff which must occur? You will need a non-recursing mutex to protect yout recursing mutex during the process of changing the owner (consider an ihash reclaim during lookup, or ownership of a vnode mutex on a vnode retrieved from the DNLC). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 12:40:31 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id E0D4037B43C for ; Mon, 25 Sep 2000 12:40:24 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id MAA17111; Mon, 25 Sep 2000 12:40:21 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id MAA02445; Mon, 25 Sep 2000 12:40:21 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Mon, 25 Sep 2000 12:40:21 -0700 (PDT) Message-Id: <200009251940.MAA02445@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Cc: tlambert@primenet.com Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009251931.MAA29117@usr02.primenet.com> References: <200009251931.MAA29117@usr02.primenet.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <200009251931.MAA29117@usr02.primenet.com>, Terry Lambert wrote: > > Search google for "counting > > semaphore" and you'll find any number of introductory class notes on > > semaphores. Or cut right to the chase and go to a typical one at > > > > http://www.erc.msstate.edu/~ioana/POWERPOINT/CS4163/slides/Threads/tsld022.htm > > Recursion should be such an exceptional condition that it should > be implemented with a seperate struct and a counting semaphore. > > Counting semaphores have owners,; mutexes do not. Therefore they > are a more appropriate primitive. You are wrong. Counting semaphores do not keep track of owners. The count has nothing to do with that at all. The count holds the number of available "units" of whatever resource the semaphore is controlling access to. It is the number of "P" operations that can be done without blocking. That is completely different from the recursion count of a recursive mutex, which keeps track of the number of times the current owner has acquired the mutex, and therefore the number of releases the owner must do before somebody else can acquire the mutex. Don't take my word for it. Do the Google search as I suggested, or go to the sample URL I gave you, or read any decent book or tutorial on the subject. Or, since you've cited Windows as having done it right, read their documentation on semaphores. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 13: 6:26 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 54FA637B422 for ; Mon, 25 Sep 2000 13:06:22 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id NAA18667; Mon, 25 Sep 2000 13:06:37 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdAAA6QaWzK; Mon Sep 25 13:06:28 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id NAA00200; Mon, 25 Sep 2000 13:06:09 -0700 (MST) From: Terry Lambert Message-Id: <200009252006.NAA00200@usr02.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: arch@freebsd.org Date: Mon, 25 Sep 2000 20:06:09 +0000 (GMT) Cc: tlambert@primenet.com In-Reply-To: <200009251940.MAA02445@vashon.polstra.com> from "John Polstra" at Sep 25, 2000 12:40:21 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > Recursion should be such an exceptional condition that it should > > be implemented with a seperate struct and a counting semaphore. > > > > Counting semaphores have owners,; mutexes do not. Therefore they > > are a more appropriate primitive. > > You are wrong. Counting semaphores do not keep track of owners. OK. Let's be pedantic. Neither do mutexes. Counting semaphores are a more appropriate primitive, as the "resource" which is counted is the ownership capability. As others have pointed out (Archie, etc.), a semaphore with a count of 1 is appropriate. When the count goes 1->0, then we can consider that ownership has been relinquished. > The count has nothing to do with that at all. The count holds the > number of available "units" of whatever resource the semaphore is > controlling access to. It is the number of "P" operations that can > be done without blocking. That is completely different from the > recursion count of a recursive mutex, which keeps track of the number > of times the current owner has acquired the mutex, and therefore the > number of releases the owner must do before somebody else can acquire > the mutex. I never stated that the recursion count would be implemented in the semaphore count of a counting semaphore. Please read the first quoted sentence again. Ownership and recursion are kept in the seperate struct. > Don't take my word for it. Do the Google search as I suggested, or go > to the sample URL I gave you, or read any decent book or tutorial on > the subject. Or, since you've cited Windows as having done it right, > read their documentation on semaphores. Windows did semaphores right. Windows did mutexes wrong. Like idiots, they permitted recursion. Since any user space thread or timer can run on any kernel thread, and the mutex holder is based on the kernel thread ID, not the higher level context ID currently mapped to the thread, you can have situations where you have a resource contended by two user space entities mapped to a single kernel thread backing object (consider that FreeBSD will act similarly with N:M threads). To get around this, you have to implement non-recusing mutexes using a semaphore of count 1. Matt Day, Mark Muhlestein, and I ran into this when we implemented the syncd as a timer outcall when we ported the Heidemann stacking VFS framework to Windows 95, and implemented soft updates in FFS on Windows 95, back in 1995-1996. With recrusion permitted mutexes, people will find themselves reinventing this for FreeBSD in order ot get non-recursing mutexes for similar situations. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 13:25: 0 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id BF33337B422 for ; Mon, 25 Sep 2000 13:24:57 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id NAA17416; Mon, 25 Sep 2000 13:24:54 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id NAA02690; Mon, 25 Sep 2000 13:24:54 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Mon, 25 Sep 2000 13:24:54 -0700 (PDT) Message-Id: <200009252024.NAA02690@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Cc: tlambert@primenet.com Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009252006.NAA00200@usr02.primenet.com> References: <200009252006.NAA00200@usr02.primenet.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <200009252006.NAA00200@usr02.primenet.com>, Terry Lambert wrote: > > You are wrong. Counting semaphores do not keep track of owners. > > OK. Let's be pedantic. Neither do mutexes. I didn't say "mutex", I said "recursive mutex". Recursive mutexes do indeed keep track of their owners. > Counting semaphores are a more appropriate primitive, as the > "resource" which is counted is the ownership capability. As > others have pointed out (Archie, etc.), a semaphore with a > count of 1 is appropriate. When the count goes 1->0, then > we can consider that ownership has been relinquished. Actually, when the count goes 1->0, ownership has been acquired, not relinquished. The count represents the number of available units, and that is the case in every definition and every implementation of semaphores I have ever seen (which is quite a few, beginning in the early 70's.). It's even true in the rather baroque implementation of semop(3). > I never stated that the recursion count would be implemented in > the semaphore count of a counting semaphore. Please read the > first quoted sentence again. Ownership and recursion are kept > in the seperate struct. Fine, then you don't need a counting semaphore at all, as a simple non-recursive mutex will do the same job just as well and more efficiently. > To get around this, you have to implement non-recusing mutexes > using a semaphore of count 1. A semaphore with a count of 1, when used for mutual exclusion, behaves exactly the same as a simple mutex. I don't understand why you brought up counting semaphores at all. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 14:23:26 2000 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843290.broadbandoffice.net [64.47.83.26]) by hub.freebsd.org (Postfix) with ESMTP id 43DAE37B42C for ; Mon, 25 Sep 2000 14:23:22 -0700 (PDT) Received: (from dillon@localhost) by earth.backplane.com (8.11.0/8.9.3) id e8PLN5F84806; Mon, 25 Sep 2000 14:23:05 -0700 (PDT) (envelope-from dillon) Date: Mon, 25 Sep 2000 14:23:05 -0700 (PDT) From: Matt Dillon Message-Id: <200009252123.e8PLN5F84806@earth.backplane.com> To: Daniel Eischen Cc: John Polstra , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores References: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :Mutexes should protect data. If you want to allow recursive ownership of :data, then keep your own owner and ref count field in the protected data :and use the mutex properly (release it after setting the owner or :incrementing the ref count). You don't need to hold the mutex, and :now you can use the same mutex for msleep/cv_wait. : :-- :Dan Eischen Mutexes protect data *CONSISTENCY*, not data. There is a big difference. Probably 95% of the kernel assumes data consistency throughout any given routine. If that routine must call other routines (and most do), then you have a major issue to contend with in regards to how to maintain consistency across the call. There are several ways to deal with it: * The subroutine calls are not allowed to block - lots of examples of this in the VM and other subsystems. * You use a heavy-weight lock instead of a mutex - an example of this would be the VFS subsystem (vnode locks). * You engineer the code to allow data to change out from under it at certain points (such as when something blocks) - probably the best example is vm_fault in the VM subsystem. Unfortunately, all but the first can lead to serious bugs. Consider how many bugs have been fixed in the VFS and VM subsystems just in the last year that have been related to data consistency issues and you'll understand. The first issue - not allowing a subroutine call to block, when such a case exists, is the perfect place to put a recursive mutex. If you don't use a recursive mutex at that point then you wind up having to reengineer and rewrite big pieces of the code, or you wind up writing lots of little tag routines to do end-runs around the mutexes or to pass a flag that indicates that the mutex is already held and should not be obtained again, and so forth. Remember, I'm not talking about subsystem A calling subsystem B here, I'm talking about subsystem A calling itself. That is, a situation where you are not obtaining several different mutexes but are instead obtaining the same mutex several times. Frankly, fewer bugs will be introduced into the code by avoiding the reengineering and using recursive mutexes at appropriate points. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 14:39:24 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 7AD7537B424 for ; Mon, 25 Sep 2000 14:39:08 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8PLcsH08288; Mon, 25 Sep 2000 14:38:54 -0700 (PDT) Date: Mon, 25 Sep 2000 14:38:54 -0700 From: Alfred Perlstein To: Matt Dillon Cc: Daniel Eischen , John Polstra , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores Message-ID: <20000925143853.J9141@fw.wintelcom.net> References: <200009252123.e8PLN5F84806@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <200009252123.e8PLN5F84806@earth.backplane.com>; from dillon@earth.backplane.com on Mon, Sep 25, 2000 at 02:23:05PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Matt Dillon [000925 14:23] wrote: > : > :Mutexes should protect data. If you want to allow recursive ownership of > :data, then keep your own owner and ref count field in the protected data > :and use the mutex properly (release it after setting the owner or > :incrementing the ref count). You don't need to hold the mutex, and > :now you can use the same mutex for msleep/cv_wait. > : > :-- > :Dan Eischen > > Mutexes protect data *CONSISTENCY*, not data. There is a big difference. > Probably 95% of the kernel assumes data consistency throughout any given > routine. If that routine must call other routines (and most do), then > you have a major issue to contend with in regards to how to maintain > consistency across the call. > > There are several ways to deal with it: > > * The subroutine calls are not allowed to block - lots of examples of > this in the VM and other subsystems. > > * You use a heavy-weight lock instead of a mutex - an example > of this would be the VFS subsystem (vnode locks). > > * You engineer the code to allow data to change out from under > it at certain points (such as when something blocks) - probably > the best example is vm_fault in the VM subsystem. > > Unfortunately, all but the first can lead to serious bugs. Consider > how many bugs have been fixed in the VFS and VM subsystems just in the > last year that have been related to data consistency issues and you'll > understand. > > The first issue - not allowing a subroutine call to block, when such a > case exists, is the perfect place to put a recursive mutex. If you don't > use a recursive mutex at that point then you wind up having to > reengineer and rewrite big pieces of the code, or you wind up writing > lots of little tag routines to do end-runs around the mutexes or to > pass a flag that indicates that the mutex is already held and should > not be obtained again, and so forth. > > Remember, I'm not talking about subsystem A calling subsystem B here, > I'm talking about subsystem A calling itself. That is, a situation > where you are not obtaining several different mutexes but are instead > obtaining the same mutex several times. > > Frankly, fewer bugs will be introduced into the code by avoiding the > reengineering and using recursive mutexes at appropriate points. What's pissing me off here (not to pick on you Matt) is that there's honestly a lot of code to be worked on where the locking issues are pretty simple (expecially when you look at how BSD/os implemented it). We should be coding and discussing existing problems with making the kernel MPsafe instead of what me *might* come across along the road. Whatever we bump into we can always beat to a pulp using lockmgr. :) And honestly, I don't like the idea of recursive mutexes, I'd rather have a super function that locks a pgrp like pg_signal_locked/_unlocked which expects the locks to be held rather than a recursive lock. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 15:35:55 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 6D2AC37B424 for ; Mon, 25 Sep 2000 15:35:52 -0700 (PDT) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id PAA22718; Mon, 25 Sep 2000 15:32:57 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp02.primenet.com, id smtpdAAAc1aqhS; Mon Sep 25 15:32:38 2000 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id PAA07367; Mon, 25 Sep 2000 15:35:28 -0700 (MST) From: Terry Lambert Message-Id: <200009252235.PAA07367@usr07.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: arch@freebsd.org Date: Mon, 25 Sep 2000 22:35:28 +0000 (GMT) Cc: tlambert@primenet.com In-Reply-To: <200009252024.NAA02690@vashon.polstra.com> from "John Polstra" at Sep 25, 2000 01:24:54 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > Counting semaphores are a more appropriate primitive, as the > > "resource" which is counted is the ownership capability. As > > others have pointed out (Archie, etc.), a semaphore with a > > count of 1 is appropriate. When the count goes 1->0, then > > we can consider that ownership has been relinquished. > > Actually, when the count goes 1->0, ownership has been acquired, not > relinquished. The count represents the number of available units, > and that is the case in every definition and every implementation of > semaphores I have ever seen (which is quite a few, beginning in the > early 70's.). It's even true in the rather baroque implementation of > semop(3). Remaining resources vs. acquired resources. Same difference, you knew what I meant, which is what mattered. > Fine, then you don't need a counting semaphore at all, as a simple > non-recursive mutex will do the same job just as well and more > efficiently. Fine. Then we're agreed: non-recursive mutexes are the base unit, and recursion will be implemented on a case by case basis using an additional structure, which contains a non-recursive mutex, a recursion counter, and an owner field. Glad that's settled, until the first time a thread migrates between processors, and we decide we need a semaphore instead of a mutex as a primitive in order to handle sleeps and wakeups that occur with a mutex with a recursion count greater than 0, for some ungodly reason. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 17:46:22 2000 Delivered-To: freebsd-arch@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id E87F237B42C for ; Mon, 25 Sep 2000 17:46:17 -0700 (PDT) Received: from newsguy.com (p27-dn03kiryunisiki.gunma.ocn.ne.jp [210.232.224.156]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id JAA23910; Tue, 26 Sep 2000 09:46:14 +0900 (JST) Message-ID: <39CFF19E.CD689985@newsguy.com> Date: Tue, 26 Sep 2000 09:45:18 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: Terry Lambert Cc: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files References: <200009252006.NAA00200@usr02.primenet.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry Lambert wrote: > > With recrusion permitted mutexes, people will find themselves > reinventing this for FreeBSD in order ot get non-recursing > mutexes for similar situations. Err... recursability is an option. With the present code, unless I understood everything I heard so far completely wrong, you can have a mutex act in either recursive or non-recursive ways. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.secret.bsdconspiracy.net "I demand that my picture show a handsome face, even if it doesn't look like me." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 19: 9:42 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id B6FB937B42C for ; Mon, 25 Sep 2000 19:09:34 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id TAA19160; Mon, 25 Sep 2000 19:09:29 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id TAA03815; Mon, 25 Sep 2000 19:09:28 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Mon, 25 Sep 2000 19:09:28 -0700 (PDT) Message-Id: <200009260209.TAA03815@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Cc: tlambert@primenet.com Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009252235.PAA07367@usr07.primenet.com> References: <200009252235.PAA07367@usr07.primenet.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <200009252235.PAA07367@usr07.primenet.com>, Terry Lambert wrote: > Fine. Then we're agreed: non-recursive mutexes are the base unit, > and recursion will be implemented on a case by case basis using > an additional structure, which contains a non-recursive mutex, a > recursion counter, and an owner field. That's simply a less efficient implementation of a recursive mutex. Why not use the real thing? > Glad that's settled, until the first time a thread migrates between > processors, and we decide we need a semaphore instead of a mutex as > a primitive in order to handle sleeps and wakeups that occur with > a mutex with a recursion count greater than 0, for some ungodly > reason. Now we're back practically to my original question. Explain how a semaphore is going to solve anything here. I don't think it will help one bit. In virtually all cases which require sleeping and being woken up (whether via a condition variable or a semaphore), the basic scenario is the same. Thread A is examining and/or modifying a shared data structure. Now he wants to wait until thread B modifies the data structure and puts it into some desired state. While A was examining/modifying the data structure, he necessarily held a mutex on it in order to get a consistent view. Before he waits, he must release that mutex -- otherwise B won't be able to make the desired modifications. This is true whether the waiting is done with a condition variable or with a semaphore. It really doesn't make much difference which one you use. The only difference is that when using a condition variable the "release mutex and wait" sequence must be atomic, because a condition variable doesn't "remember" a wakeup that happened when nobody was waiting yet. A semaphore does remember it, so there is no need for atomicity with respect to releasing the mutex. That's a pretty minor difference, and it doesn't have anything to do with whether the mutexes are recursive or not. If the mutex is recursively held, there is a problem in that some other code grabbed the mutex and expected it to protect the data structure from being changed underfoot. Using a semaphore to do the waiting doesn't solve that problem, or even address it. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Sep 25 19:50:21 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 48E2C37B424 for ; Mon, 25 Sep 2000 19:50:12 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id TAA15991; Mon, 25 Sep 2000 19:47:40 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpdAAA3daybF; Mon Sep 25 19:47:29 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id TAA08391; Mon, 25 Sep 2000 19:49:58 -0700 (MST) From: Terry Lambert Message-Id: <200009260249.TAA08391@usr05.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: arch@freebsd.org Date: Tue, 26 Sep 2000 02:49:58 +0000 (GMT) Cc: tlambert@primenet.com In-Reply-To: <200009260209.TAA03815@vashon.polstra.com> from "John Polstra" at Sep 25, 2000 07:09:28 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > Fine. Then we're agreed: non-recursive mutexes are the base unit, > > and recursion will be implemented on a case by case basis using > > an additional structure, which contains a non-recursive mutex, a > > recursion counter, and an owner field. > > That's simply a less efficient implementation of a recursive mutex. > Why not use the real thing? No we're back to my original question: where in the code is there a perceived need for mutex recursion, or is this just a case of the mutex code being bloated for no good reason? > > Glad that's settled, until the first time a thread migrates between > > processors, and we decide we need a semaphore instead of a mutex as > > a primitive in order to handle sleeps and wakeups that occur with > > a mutex with a recursion count greater than 0, for some ungodly > > reason. > > Now we're back practically to my original question. Explain how a > semaphore is going to solve anything here. I don't think it will > help one bit. In virtually all cases which require sleeping and > being woken up (whether via a condition variable or a semaphore), the > basic scenario is the same. Thread A is examining and/or modifying a > shared data structure. Now he wants to wait until thread B modifies > the data structure and puts it into some desired state. While A > was examining/modifying the data structure, he necessarily held a > mutex on it in order to get a consistent view. Before he waits, he > must release that mutex -- otherwise B won't be able to make the > desired modifications. This is true whether the waiting is done with > a condition variable or with a semaphore. It really doesn't make much > difference which one you use. The only difference is that when using > a condition variable the "release mutex and wait" sequence must be > atomic, because a condition variable doesn't "remember" a wakeup that > happened when nobody was waiting yet. A semaphore does remember it, > so there is no need for atomicity with respect to releasing the mutex. > That's a pretty minor difference, and it doesn't have anything to do > with whether the mutexes are recursive or not. No, it has to do with how long they are held. If they are never permitted to be held across recursive function calls -- or better, across _ANY_ function calls -- then you can spin on the mutex, instead of hoing to sleep. So a mutex operation becomes: 1) Acquire mutex 2) Frob data protected by mutex 3) Release mutex If someone else needs the same data, they do the same thing. If you want to wait until a condition is true, then use a condition variable, a semaphore, or something else you can wait on in order to be signalled. The idea that you should ever go to sleep waitind for a mutex is antithetical to the very idea. There are likewise, few real situation in which you require to be able to hold two mutexes; these are degenerate cases, which are badly coded. Consider that I may have a vnode freelist protected by a mutex, and a vnode protected by a mutex. The perceived need to hold both of these simultaneously to put something on the freelist is an artifact of wrong-thinking: the pointer use to place a vnode on a freelist are the property of the freelist mutex, not the vnode mutex. Even if you can make a case for this not being true (e.g. moving a vnode from one list to another, using the same pointers in the vnode to track state on both lists, which is really just an acquire/remove/release/aquire/insert/release operation, where you have a window between the removal and the reinsertion), it can be handled by strictly controlling the order of operation on mutex acquisition, and inverting the release order, and backing off in case of conflict. > If the mutex is recursively held, there is a problem in that some > other code grabbed the mutex and expected it to protect the data > structure from being changed underfoot. Worst case, set an "IN_USE" flag on the data in a flags field to bar reentry on a given data item. Best case, fix the broken code. The vnode locking code does this today (I'd argue that it's broken code). > Using a semaphore to do the > waiting doesn't solve that problem, or even address it. It does. Semaphores can be held across a sleep (a wait). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Sep 26 4:30:32 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id AE08B37B422 for ; Tue, 26 Sep 2000 04:30:25 -0700 (PDT) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id HAA00572; Tue, 26 Sep 2000 07:30:05 -0400 (EDT) Date: Tue, 26 Sep 2000 07:29:55 -0400 (EDT) From: Daniel Eischen To: Matt Dillon Cc: John Polstra , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-Reply-To: <200009252123.e8PLN5F84806@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 25 Sep 2000, Matt Dillon wrote: > : > :Mutexes should protect data. If you want to allow recursive ownership of > :data, then keep your own owner and ref count field in the protected data > :and use the mutex properly (release it after setting the owner or > :incrementing the ref count). You don't need to hold the mutex, and > :now you can use the same mutex for msleep/cv_wait. > : > :-- > :Dan Eischen > > Mutexes protect data *CONSISTENCY*, not data. There is a big difference. > Probably 95% of the kernel assumes data consistency throughout any given > routine. If that routine must call other routines (and most do), then > you have a major issue to contend with in regards to how to maintain > consistency across the call. > > There are several ways to deal with it: > > * The subroutine calls are not allowed to block - lots of examples of > this in the VM and other subsystems. > > * You use a heavy-weight lock instead of a mutex - an example > of this would be the VFS subsystem (vnode locks). > > * You engineer the code to allow data to change out from under > it at certain points (such as when something blocks) - probably > the best example is vm_fault in the VM subsystem. > > Unfortunately, all but the first can lead to serious bugs. Consider > how many bugs have been fixed in the VFS and VM subsystems just in the > last year that have been related to data consistency issues and you'll > understand. > > The first issue - not allowing a subroutine call to block, when such a > case exists, is the perfect place to put a recursive mutex. If you don't > use a recursive mutex at that point then you wind up having to > reengineer and rewrite big pieces of the code, or you wind up writing > lots of little tag routines to do end-runs around the mutexes or to > pass a flag that indicates that the mutex is already held and should > not be obtained again, and so forth. > > Remember, I'm not talking about subsystem A calling subsystem B here, > I'm talking about subsystem A calling itself. That is, a situation > where you are not obtaining several different mutexes but are instead > obtaining the same mutex several times. If you absolutley need recursive mutexes, then roll your own and keep the base mutex simple. This is trivial to do and makes the base mutex more efficient without the need to check for recursive ownership. Mutexes should be held for very short amounts of time, and it should be apparent in the encompassing code where the mutex is taken and where it is released. In your example, what do you do in the case of abnormal exits from recursively called code? It makes it far more easier to handle this situation if you roll your own mutex and keep track of the ref count and owner yourself. If you don't, you'll end up adding mtx_exit_and_clear_refcount(). My main concern is not to eliminate recursive mutexes, though I still think they should go. I would like to see all barriers to eliminating the flags/options to mtx_enter() and mtx_exit() removed. The current form of the mutex routines is not an API/ABI we should be using. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Sep 26 18:10:17 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id EDC6537B43C; Tue, 26 Sep 2000 18:10:04 -0700 (PDT) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.9.3/8.9.3) with SMTP id VAA81368; Tue, 26 Sep 2000 21:09:58 -0400 (EDT) (envelope-from rwatson@FreeBSD.org) Date: Tue, 26 Sep 2000 21:09:58 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: freebsd-fs@FreeBSD.org Cc: freebsd-arch@FreeBSD.org, trustedbsd-discuss@TrustedBSD.org Subject: VOP_ACCESS() and new VADMIN/VATTRIB? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG (sorry about flagrant cross-posting -- wanted to make sure that those with interest would have the opportunity to comment) In general, access control for operations within a file system is determined via a recursive VOP_ACCESS() call on the vnode, vis. VOP_OPEN(vp, ...) -> ufs_open(vp, ...) -> VOP_ACCESS(vp, ...) -> ufs_access(vp, ...) Flags are passed to VOP_ACCESS() indicating the specific requests being made on the object, allowing VOP_ACCESS() to implement a variety of discretionary and mandatory policies. VOP_ACCESS(9) documents these flags as VREAD, VWRITE, and VEXEC, reflecting respectively read, write, and execute rights. In recent changes to improve modularity and consistency, Poul-Henning moved most of the mode/ownership-related components of ufs_access() (and from other file systems) into vaccess(). File-system specific components, such as the readonly status of the file system, and UFS file flags, remain in ufs_access(). In the UFS code, VOP_ACCESS() is used fairly routinely to guard access to the data associated with a file or directory. However, there is an additional class of requests relating to file operations wherein checks of inode attributes and characteristics are performed directly, rather than falling back on the central VOP_ACCESS() implementation for the file system. In general these requests relate to administrative actions for the file: the ability to set protection rights for the file (ufs_chmod(), ufs_chown(), and in the ACL implementation, also ufs_setacl()). As a result, these access checks are scattered through the file system implementation, and do not lend themselves to further generalization. I ran into this problem while implementing mandatory access control for FreeBSD: mandatory policies override rights that may be granted by discretionary mechanisms (such as permissions and ACLs), allowing effective partitioning and segregation of the system based on other properties, such as sensitivity and integrity labels. One of example of this is a Biba integrity policy, in which the permissions of a file might allow write access to all users, but the MAC policy forbids this access as it might violate system integrity (for example, incorrectly set permissions on /kernel). Without generalized and centralized access control for all access decisions, it is difficult to cleanly inserts more flexible access control policies. I'd like to propose that an existing VADMIN flag be added determining whether or not the passed credentials are permitted to administer the file. Here is a brief itemization of locations in the code where i->uid checks would be replaced with VOP_ACCESS(vp, ... VADMIN ...) calls, with some possible omissions: File Use ufs_lookup.c Allow owner of a sticky directory to delete any file in it ufs_lookup.c Allow owner of a file to delete it from a sticky directory ufs_vnops.c Allow owner of a file to set non-system file flags ufs_vnops.c Allow owner to modify times on file ufs_vnops.c Allow owner to modify permissions on file ufs_vnops.c Allow owner to modify group of file ufs_vnops.c Allow owner of a file or its parent directory to overwrite that file if its parent directory is sticky There are some other references to i_uid in ufs_vnops.c relating to the QUOTA and SUIDDIR code. It is my belief, although I'd be glad to take comments, that the QUOTA code should remain as is, as it's not for access control but rather accounting. Similarly, the SUIDDIR code should remain as is, as it has to do with whether or not the ownership on a newly created file should be set to reflect the parent directory's ownership instead of the calling credential. The effect of this change would be to allow any rights granted via ownership of a file but not in the VREAD, VWRITE, and VEXEC catagories to the new VADMIN category. As a result, changes to the file system's VOP_ACCESS() code could then grant or deny these requests based on other factors in the credential, including mandatory policies. I selected the name VADMIN based on a similar right in the Andrew File System (AFS), "admin", which permits users or groups with admin rights for a directory to manipulate its access control list. You could imagine adding a new right such as this to the ACL implementation, although I have no plans to do so at this point. Robert N M Watson robert@fledge.watson.org http://www.watson.org/~robert/ PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 TIS Labs at Network Associates, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Sep 26 20:37:33 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E96F237B424 for ; Tue, 26 Sep 2000 20:37:23 -0700 (PDT) Received: from sydney.worldwide.lemis.com (asbestos.linuxcare.com.au [203.17.0.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7EA9D6E2BE9 for ; Tue, 26 Sep 2000 20:37:12 -0700 (PDT) Received: (from grog@localhost) by sydney.worldwide.lemis.com (8.9.3/8.9.3) id OAA08246; Wed, 27 Sep 2000 14:33:18 +1100 (EST) (envelope-from grog) Date: Wed, 27 Sep 2000 14:33:18 +1100 From: Greg Lehey To: Alfred Perlstein Cc: Matt Dillon , Daniel Eischen , John Polstra , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores Message-ID: <20000927143318.H7583@sydney.worldwide.lemis.com> References: <200009252123.e8PLN5F84806@earth.backplane.com> <20000925143853.J9141@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20000925143853.J9141@fw.wintelcom.net>; from bright@wintelcom.net on Mon, Sep 25, 2000 at 02:38:54PM -0700 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Monday, 25 September 2000 at 14:38:54 -0700, Alfred Perlstein wrote: > * Matt Dillon [000925 14:23] wrote: >>> >>> Mutexes should protect data. If you want to allow recursive ownership of >>> data, then keep your own owner and ref count field in the protected data >>> and use the mutex properly (release it after setting the owner or >>> incrementing the ref count). You don't need to hold the mutex, and >>> now you can use the same mutex for msleep/cv_wait. >>> >>> -- >>> Dan Eischen >> >> Mutexes protect data *CONSISTENCY*, not data. There is a big difference. >> Probably 95% of the kernel assumes data consistency throughout any given >> routine. If that routine must call other routines (and most do), then >> you have a major issue to contend with in regards to how to maintain >> consistency across the call. >> >> There are several ways to deal with it: >> >> * The subroutine calls are not allowed to block - lots of examples of >> this in the VM and other subsystems. >> >> * You use a heavy-weight lock instead of a mutex - an example >> of this would be the VFS subsystem (vnode locks). >> >> * You engineer the code to allow data to change out from under >> it at certain points (such as when something blocks) - probably >> the best example is vm_fault in the VM subsystem. >> >> Unfortunately, all but the first can lead to serious bugs. Consider >> how many bugs have been fixed in the VFS and VM subsystems just in the >> last year that have been related to data consistency issues and you'll >> understand. >> >> The first issue - not allowing a subroutine call to block, when such a >> case exists, is the perfect place to put a recursive mutex. If you don't >> use a recursive mutex at that point then you wind up having to >> reengineer and rewrite big pieces of the code, or you wind up writing >> lots of little tag routines to do end-runs around the mutexes or to >> pass a flag that indicates that the mutex is already held and should >> not be obtained again, and so forth. >> >> Remember, I'm not talking about subsystem A calling subsystem B here, >> I'm talking about subsystem A calling itself. That is, a situation >> where you are not obtaining several different mutexes but are instead >> obtaining the same mutex several times. >> >> Frankly, fewer bugs will be introduced into the code by avoiding the >> reengineering and using recursive mutexes at appropriate points. > > What's pissing me off here (not to pick on you Matt) is that there's > honestly a lot of code to be worked on where the locking issues are > pretty simple (expecially when you look at how BSD/os implemented > it). Hmm. I was firmly in the "recursion is sloppiness" camp, but after reading this thread I'm no longer so convinced. I need to think about it. But showing examples where it makes sense doesn't mean it makes sense everywhere, and I would at least say "unnecessary recursion is sloppiness". I think you're looking at the unnecessary cases. > We should be coding and discussing existing problems with making the > kernel MPsafe instead of what me *might* come across along the road. I certainly think that at the moment we should be thinking about structure rather than details. > Whatever we bump into we can always beat to a pulp using lockmgr. :) Well, can anybody put up good arguments for keeping lockmgr in the long term? I'm not saying there aren't any, but I haven't analysed it enough yet. > And honestly, I don't like the idea of recursive mutexes, I'd rather > have a super function that locks a pgrp like > pg_signal_locked/_unlocked which expects the locks to be held rather > than a recursive lock. I think that eliminating recursion requires you to understand the system much better, which brings both advantages and disadvantages. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Sep 26 23: 4:51 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 714F337B423 for ; Tue, 26 Sep 2000 23:04:47 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8R64bU05419; Tue, 26 Sep 2000 23:04:37 -0700 (PDT) Date: Tue, 26 Sep 2000 23:04:37 -0700 From: Alfred Perlstein To: Greg Lehey Cc: Matt Dillon , Daniel Eischen , John Polstra , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores Message-ID: <20000926230436.J9141@fw.wintelcom.net> References: <200009252123.e8PLN5F84806@earth.backplane.com> <20000925143853.J9141@fw.wintelcom.net> <20000927143318.H7583@sydney.worldwide.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <20000927143318.H7583@sydney.worldwide.lemis.com>; from grog@lemis.com on Wed, Sep 27, 2000 at 02:33:18PM +1100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Greg Lehey [000926 20:34] wrote: > On Monday, 25 September 2000 at 14:38:54 -0700, Alfred Perlstein wrote: > > > We should be coding and discussing existing problems with making the > > kernel MPsafe instead of what me *might* come across along the road. > > I certainly think that at the moment we should be thinking about > structure rather than details. I think we've been doing that for two years already and it hasn't bought us squat. > > Whatever we bump into we can always beat to a pulp using lockmgr. :) > > Well, can anybody put up good arguments for keeping lockmgr in the > long term? I'm not saying there aren't any, but I haven't analysed it > enough yet. lockmgr offers many styles of locking over a common lock interface, it allows one to upgrade and downgrade a lock's read/write status without loosing them and i'm pretty sure it also allows for recursion, although that may be broken ATM. > > And honestly, I don't like the idea of recursive mutexes, I'd rather > > have a super function that locks a pgrp like > > pg_signal_locked/_unlocked which expects the locks to be held rather > > than a recursive lock. > > I think that eliminating recursion requires you to understand the > system much better, which brings both advantages and disadvantages. Greg, after looking over this stuff for what seems like centuries I can honestly say that with the exception of VFS and VM the system is pretty straightforward (well proc/pgrp isn't much fun, but it's not deadly). If anyone has any doubts about what type of locking they'll need, then they need to look at the BSD/os code, because they've already done it! What we need to be have is discussions about the way people plan to push the locks in deeper, if it involves recursive mutexes, conditional variables or green moon cheese, I really don't care so as long as it's backed up by a real application for the primatives that they want in our codebase. Right now I can't even do getpid() properly because we don't have read/write-barriers. So far I like what I see in BSD/os, are we going to continue taking advantage of the reference implementation we've been given or wander off into nothingness? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Sep 26 23:13:14 2000 Delivered-To: freebsd-arch@freebsd.org Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9]) by hub.freebsd.org (Postfix) with ESMTP id 703BF37B424; Tue, 26 Sep 2000 23:13:10 -0700 (PDT) Received: from InterJet.elischer.org (InterJet.elischer.org [192.168.1.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA13214; Tue, 26 Sep 2000 23:12:38 -0700 (PDT) Date: Tue, 26 Sep 2000 23:12:37 -0700 (PDT) From: Julian Elischer To: Robert Watson Cc: freebsd-fs@FreeBSD.org, freebsd-arch@FreeBSD.org, trustedbsd-discuss@TrustedBSD.org Subject: Re: VOP_ACCESS() and new VADMIN/VATTRIB? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I agree with all you have said here. On Tue, 26 Sep 2000, Robert Watson wrote: > > > In general, access control for operations within a file system is > determined via a recursive VOP_ACCESS() call on the vnode, vis. > > VOP_OPEN(vp, ...) -> ufs_open(vp, ...) -> VOP_ACCESS(vp, ...) -> > ufs_access(vp, ...) [...] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 0:16:36 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id AC30437B422 for ; Wed, 27 Sep 2000 00:16:34 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id AAA10796; Wed, 27 Sep 2000 00:15:07 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpdAAAX7aq6u; Wed Sep 27 00:14:58 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id AAA20144; Wed, 27 Sep 2000 00:16:15 -0700 (MST) From: Terry Lambert Message-Id: <200009270716.AAA20144@usr05.primenet.com> Subject: Re: Mutexes and semaphores To: bright@wintelcom.net (Alfred Perlstein) Date: Wed, 27 Sep 2000 07:16:15 +0000 (GMT) Cc: grog@lemis.com (Greg Lehey), dillon@earth.backplane.com (Matt Dillon), eischen@vigrid.com (Daniel Eischen), jdp@polstra.com (John Polstra), arch@FreeBSD.ORG In-Reply-To: <20000926230436.J9141@fw.wintelcom.net> from "Alfred Perlstein" at Sep 26, 2000 11:04:37 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On behalf of Greg Lehey, whose server hates primenet, Alfred P is attributed to have written: > * Greg Lehey [000926 20:34] wrote: > > On Monday, 25 September 2000 at 14:38:54 -0700, Alfred Perlstein wrote: > > > > > We should be coding and discussing existing problems with making the > > > kernel MPsafe instead of what me *might* come across along the road. > > > > I certainly think that at the moment we should be thinking about > > structure rather than details. > > I think we've been doing that for two years already and it hasn't > bought us squat. Don't fabricate, Alfred. The SMP code firwst existed as patches by Jack Vogel, then of Sun Microsystems, against the October 27 1995 source tree. The current SMP code is dervied from patches (very minor ones) I did to bring Jack's work up to date in 1996, and a lot of work by a lot of other people, starting with Peter. So don't say it's been two years when it's really been five. Thanks, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 0:21:36 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mass.osd.bsdi.com (adsl-63-206-90-224.dsl.snfc21.pacbell.net [63.206.90.224]) by hub.freebsd.org (Postfix) with ESMTP id 9780437B424 for ; Wed, 27 Sep 2000 00:21:31 -0700 (PDT) Received: from mass.osd.bsdi.com (localhost [127.0.0.1]) by mass.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id e8R7MkA03362; Wed, 27 Sep 2000 00:22:47 -0700 (PDT) (envelope-from msmith@mass.osd.bsdi.com) Message-Id: <200009270722.e8R7MkA03362@mass.osd.bsdi.com> X-Mailer: exmh version 2.1.1 10/15/1999 To: Terry Lambert Cc: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-reply-to: Your message of "Wed, 27 Sep 2000 07:16:15 -0000." <200009270716.AAA20144@usr05.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 27 Sep 2000 00:22:46 -0700 From: Mike Smith Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > > We should be coding and discussing existing problems with making the > > > > kernel MPsafe instead of what me *might* come across along the road. > > > > > > I certainly think that at the moment we should be thinking about > > > structure rather than details. > > > > I think we've been doing that for two years already and it hasn't > > bought us squat. > > Don't fabricate, Alfred. The SMP code firwst existed as patches > by Jack Vogel, then of Sun Microsystems, against the October 27 > 1995 source tree. The current SMP code is dervied from patches > (very minor ones) I did to bring Jack's work up to date in 1996, > and a lot of work by a lot of other people, starting with Peter. > > So don't say it's been two years when it's really been five. Actually, it's been about a year, plus about four and a half of hot air. I see and hear a lot of talk. Who's doing the real work? Do we see Peter, John or Tor, for example, in this windage competition. 8) Come on folks. Stick to the topic. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 0:23:59 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 6CE9F37B424; Wed, 27 Sep 2000 00:23:51 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id AAA02627; Wed, 27 Sep 2000 00:21:16 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpdAAA2Xa4bf; Wed Sep 27 00:21:10 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id AAA20257; Wed, 27 Sep 2000 00:23:38 -0700 (MST) From: Terry Lambert Message-Id: <200009270723.AAA20257@usr05.primenet.com> Subject: Re: VOP_ACCESS() and new VADMIN/VATTRIB? To: julian@elischer.org (Julian Elischer) Date: Wed, 27 Sep 2000 07:23:38 +0000 (GMT) Cc: rwatson@FreeBSD.ORG (Robert Watson), freebsd-fs@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG, trustedbsd-discuss@TrustedBSD.org In-Reply-To: from "Julian Elischer" at Sep 26, 2000 11:12:37 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Julian Elisher wrote: > I agree with all you have said here. > > On Tue, 26 Sep 2000, Robert Watson wrote: > > In general, access control for operations within a file system is > > determined via a recursive VOP_ACCESS() call on the vnode, vis. > > > > VOP_OPEN(vp, ...) -> ufs_open(vp, ...) -> VOP_ACCESS(vp, ...) -> > > ufs_access(vp, ...) > [...] Perhaps a better question would be "assuming you generalize the references cited using the orioised VADMIN, how many references not using VOP_ACCES() will remain?". I think the generalization and centralization which took place are really bad things, since I think administrative policy is something that I may very well want to set on _both_ a system basis _and_ on a per-FS basis. I also think that read-only-ness of an FS is a mount option having nothing to do with the underlying FS itself. It seems to me that some of the centralization should, in fact, be backed out, since it seems that it would preclude layer recursion in some useful stacking arrangements, much in the same was a non-NULL VOP did when the "default" layer was introduced (with no mechanism to provide default semantics for nely defined VOPs, without a kernel recompile). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 0:43: 9 2000 Delivered-To: freebsd-arch@freebsd.org Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9]) by hub.freebsd.org (Postfix) with ESMTP id E8BE437B43F; Wed, 27 Sep 2000 00:43:07 -0700 (PDT) Received: from InterJet.elischer.org (InterJet.elischer.org [192.168.1.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id AAA13542; Wed, 27 Sep 2000 00:43:06 -0700 (PDT) Date: Wed, 27 Sep 2000 00:43:05 -0700 (PDT) From: Julian Elischer To: Mike Smith Cc: Terry Lambert , arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-Reply-To: <200009270722.e8R7MkA03362@mass.osd.bsdi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > Actually, it's been about a year, plus about four and a half of hot air. > I see and hear a lot of talk. Who's doing the real work? Do we see > Peter, John or Tor, for example, in this windage competition. 8) > > Come on folks. Stick to the topic. > The point that was brought up a little while ago is more germaine to the discussion: Is there any documentation regarding the interaction between "lock manager" and the mutexes? There appears now to be several different sets of locking code in the kernel and I'm getting thoroughly confused in my efforts to try 'catch up' with what's going on in the SMP world.. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 0:46:48 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mass.osd.bsdi.com (adsl-63-206-90-224.dsl.snfc21.pacbell.net [63.206.90.224]) by hub.freebsd.org (Postfix) with ESMTP id EF9E937B424 for ; Wed, 27 Sep 2000 00:46:45 -0700 (PDT) Received: from mass.osd.bsdi.com (localhost [127.0.0.1]) by mass.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id e8R7luA03450; Wed, 27 Sep 2000 00:47:56 -0700 (PDT) (envelope-from msmith@mass.osd.bsdi.com) Message-Id: <200009270747.e8R7luA03450@mass.osd.bsdi.com> X-Mailer: exmh version 2.1.1 10/15/1999 To: Julian Elischer Cc: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores In-reply-to: Your message of "Wed, 27 Sep 2000 00:43:05 PDT." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 27 Sep 2000 00:47:56 -0700 From: Mike Smith Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The point that was brought up a little while ago is more germaine to the > discussion: > > Is there any documentation regarding the interaction between "lock > manager" and the mutexes? There appears now to be several different > sets of locking code in the kernel and I'm getting thoroughly confused in > my efforts to try 'catch up' with what's going on in the SMP > world.. You're correct, it is germane. The simple answer is that right now, there is essentially no documentation. There needs to be focussed discussion on the topic you raise, and many others. Note that the lock manager is responsible for broking relatively long-term locks on filesystem objects, whilst mutexes are used for protecting critical paths or data structures. They're largely (but not entirely) orthagonal. It would be to your advantage to look at how the BSD/OS code has been altered, to get a feel for one possible way of doing it. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 5:54: 1 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 0A77437B42C; Wed, 27 Sep 2000 05:53:39 -0700 (PDT) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.9.3/8.9.3) with SMTP id IAA88998; Wed, 27 Sep 2000 08:53:06 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Wed, 27 Sep 2000 08:53:06 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Terry Lambert Cc: Julian Elischer , freebsd-fs@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG, trustedbsd-discuss@TrustedBSD.org Subject: Re: VOP_ACCESS() and new VADMIN/VATTRIB? In-Reply-To: <200009270723.AAA20257@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 27 Sep 2000, Terry Lambert wrote: > Perhaps a better question would be "assuming you generalize > the references cited using the orioised VADMIN, how many > references not using VOP_ACCES() will remain?". My goal was to identify the application of ownership righs on files and directories (i.e., rights not granted by the discretionary permission maks of ACL). As it turns out, this class of checks maps extremely well into the current use of ip-i_uid in the src/sys/ufs/ufs tree, resulting in very few remaining references. As I refered to, the remaining references generally fall into two categories: first, the quota code which uses the file uid to determine how to account for use (index into dqget()), to determine when it should or should not report quota limit problems (uprintf() to warn of quota conditions), and to determine whether the current credential cr_uid matches the owner of the parent directory of a newly created file when SUIDDIR is enabled. In general, these are not access control decisions, rather strict use of the cr_uid as an identifier, meaning that abstraction of VADMIN as a category successfully removes all remaining uid-based authorization code in UFS. > I think the generalization and centralization which took > place are really bad things, since I think administrative > policy is something that I may very well want to set on > _both_ a system basis _and_ on a per-FS basis. I think there are both reasonable arguments for and against the generalization in vaccess(). One important advantage of the generalization is that it reduces the number of instances of permission-based authorization checks, allowing easier auditing and modification of the policy. For example, when I introduced support for POSIX.1e capabilities in my source tree, I needed only replace one instance of suser() rather than dozens scattered through the source tree. It also makes it easier to audit the use of privilege for correctness and logging purposes if it can be centrally identified. There is probably a decent argument that vaccess(), while a good idea, does not have an API lending itself to future expansion and flexibility: it directly accepts file uid, gid, and mode fields, and does not have a policy-related argument that could be used by the caller to specify how centralized checking should apply in the context of the current file system. > I also think that read-only-ness of an FS is a mount > option having nothing to do with the underlying FS itself. However, I think it is also arguable that the read-only-ness of a file system is not a security property, but in some cases a media property. That is to say, some file systems should be read-only by virtue of the underlying storage medium or file system type. Often, file systems are mounted read-only for security reasons, which is "different". vaccess() abstracts only the generalized security decision, not determination of per-file system or per-mount options. I think it would be reasonable to argue that we should attempt to distinguish security and non-security mount options, and provide the file system an opportunity to pass the security mount options to generalized security checking code, and that the current single read-only flag does not distinguish the security and file system properties that might be desirable. That said, I think there's also an argument that you would only process the read-only property centrally if you were willing to allow super-user privilege to override that protection. I.e., vaccess() performs discretionary and mandatory access checks, with privilege allowing the overriding of those protections. If the protections should not be overriden by appropriate privilege, they should not be processed as security protections in vacess(), which would further distinguish read-only mounting and a read-only security status. > It seems to me that some of the centralization should, in > fact, be backed out, since it seems that it would preclude > layer recursion in some useful stacking arrangements, much > in the same was a non-NULL VOP did when the "default" layer > was introduced (with no mechanism to provide default > semantics for nely defined VOPs, without a kernel recompile). I'm not sure I follow this argument. Each file system's VOP_ACCESS() implementation invokes vaccess() based on arguments it provides, and only if it chooses. For example, only file systems making use of a per-file uid/gid/mode currently invoke vaccess(). Coda does not invoke it, and in my ACLs tree, UFS doesn't invoke it, instead, vaccess_acl() in kern_acl.c. vaccess() is not a default VOP, rather, a helper function for VOP_ACCESS() implementations with common security properties. VOP_ACCESS() -> ufs_access() -> vaccess() Given this description, do you believe there would be limits imposed on stacked file system support? Robert N M Watson robert@fledge.watson.org http://www.watson.org/~robert/ PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 TIS Labs at Network Associates, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 8:29:39 2000 Delivered-To: freebsd-arch@freebsd.org Received: from sandman.sandgate.com (sandman.sandgate.com [38.161.139.2]) by hub.freebsd.org (Postfix) with ESMTP id ACE8637B423 for ; Wed, 27 Sep 2000 08:29:24 -0700 (PDT) Received: from vectra (a157.COMCAT.COM [207.86.230.157]) by sandman.sandgate.com (8.10.0/8.10.0) with SMTP id e8RFTRx30267 for ; Wed, 27 Sep 2000 11:29:29 -0400 (EDT) From: "Sue Wainer" To: Subject: Kernel configuration with new device drivers Date: Wed, 27 Sep 2000 11:29:19 -0400 Message-ID: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0012_01C02876.2D0750C0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. ------=_NextPart_000_0012_01C02876.2D0750C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit What is the proper way to specify new driver source modules for a kernel configuration? E.g., the config manual page mentions /sys/i386/conf/files.ERNIE. How does this file get picked up when runing "config ERNIE"? Should files.i386 be modified to include it? ------=_NextPart_000_0012_01C02876.2D0750C0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
What is the proper = way to=20 specify new driver source modules for a kernel=20 configuration?
E.g., the config = manual page=20 mentions /sys/i386/conf/files.ERNIE. How does this = file
get picked up when = runing=20 "config ERNIE"? Should files.i386 be modified to include=20 it?
------=_NextPart_000_0012_01C02876.2D0750C0-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 9:27:11 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id B137537B424 for ; Wed, 27 Sep 2000 09:27:07 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8RGR0H19790; Wed, 27 Sep 2000 09:27:00 -0700 (PDT) Date: Wed, 27 Sep 2000 09:27:00 -0700 From: Alfred Perlstein To: Sue Wainer Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Kernel configuration with new device drivers Message-ID: <20000927092700.X9141@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from wainer@sandgate.com on Wed, Sep 27, 2000 at 11:29:19AM -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Sue Wainer [000927 08:29] wrote: > What is the proper way to specify new driver source modules for a kernel > configuration? > E.g., the config manual page mentions /sys/i386/conf/files.ERNIE. How does > this file > get picked up when runing "config ERNIE"? Should files.i386 be modified to > include it? If it's archetecture neutral you want to use: /usr/src/sys/conf/files If it's i386 specific you want to use: /usr/src/sys/conf/files.i386 The reason for the /sys/i386/conf/files.ERNIE file is so you can have a local modification without it being wiped out by cvsup. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 11:54:38 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id 5A7BA37B422 for ; Wed, 27 Sep 2000 11:53:56 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id LAA27778; Wed, 27 Sep 2000 11:51:54 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id LAA07258; Wed, 27 Sep 2000 11:51:54 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Wed, 27 Sep 2000 11:51:54 -0700 (PDT) Message-Id: <200009271851.LAA07258@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Cc: tlambert@primenet.com Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files In-Reply-To: <200009260249.TAA08391@usr05.primenet.com> References: <200009260249.TAA08391@usr05.primenet.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <200009260249.TAA08391@usr05.primenet.com>, Terry Lambert wrote: > > > > That's simply a less efficient implementation of a recursive mutex. > > Why not use the real thing? > > No we're back to my original question: > where in the code is there a perceived need for mutex recursion, You'll have to ask somebody else. I never said I knew of any. A couple of folks said recursive mutexes are evil, and I said I disagreed, and explained why. > or is this just a case of the mutex code being bloated for no good > reason? No need to bloat it. It could be a different data type for all I care. > > That's a pretty minor difference, and it doesn't have anything to > > do with whether the mutexes are recursive or not. > > No, it has to do with how long they are held. If they are never > permitted to be held across recursive function calls -- or better, > across _ANY_ function calls -- then you can spin on the mutex, > instead of hoing to sleep. With a mutex or a semaphore, you can spin or you can go to sleep or you can spin for awhile and then go to sleep. That's just an implementation detail. Your desire that a mutex not be held across any function calls seems arbitrary and would lead to unstructured code. It's perfectly legitimate for frobbing the data to involve one or more function calls. It isn't always easy to frob. :-) > So a mutex operation becomes: > > 1) Acquire mutex > 2) Frob data protected by mutex > 3) Release mutex > > If someone else needs the same data, they do the same thing. > > If you want to wait until a condition is true, then use a > condition variable, a semaphore, or something else you can wait > on in order to be signalled. I have no argument with any of that. > The idea that you should ever go to sleep waitind for a mutex is > antithetical to the very idea. Are you assuming that mutexes spin but semaphores sleep? You haven't actually said so explicitly, but I'm starting to think that's your assumption. > There are likewise, few real situation in which you require to be > able to hold two mutexes; these are degenerate cases, which are > badly coded. Well, I wouldn't put it that strongly. Sometimes you have to maintain two independent data structures in a consistent manner, and that involves locking both of them at once. > Consider that I may have a vnode freelist protected by a mutex, and > a vnode protected by a mutex. The perceived need to hold both of > these simultaneously to put something on the freelist is an artifact > of wrong-thinking: the pointer use to place a vnode on a freelist > are the property of the freelist mutex, not the vnode mutex. Agreed. > Even if you can make a case for this not being true (e.g. moving > a vnode from one list to another, using the same pointers in > the vnode to track state on both lists, which is really just an > acquire/remove/release/aquire/insert/release operation, where you > have a window between the removal and the reinsertion), it can be > handled by strictly controlling the order of operation on mutex > acquisition, and inverting the release order, and backing off in > case of conflict. Yes, that's the standard way of avoiding deadlock. Though as far as I can see, the release order doesn't actually matter, since releasing never blocks anybody. > > If the mutex is recursively held, there is a problem in that some > > other code grabbed the mutex and expected it to protect the data > > structure from being changed underfoot. > > Worst case, set an "IN_USE" flag on the data in a flags field to > bar reentry on a given data item. Best case, fix the broken code. > The vnode locking code does this today (I'd argue that it's broken > code). Well, we are dealing with a lot of legacy code that was never designed with threads in mind. I personally believe that the recursive mutex is a reasonable primitive to deal with it, particularly during the transition phase. > > Using a semaphore to do the waiting doesn't solve that problem, or > > even address it. > > It does. Semaphores can be held across a sleep (a wait). You must be assuming that mutexes always spin and semaphores don't. I don't agree with that assumption, but at least it would explain why we can't seem to communicate effectively on this topic. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 12: 9:12 2000 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id 98EBA37B424 for ; Wed, 27 Sep 2000 12:09:09 -0700 (PDT) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.9.3/8.9.3) with ESMTP id MAA27884; Wed, 27 Sep 2000 12:09:01 -0700 (PDT) (envelope-from jdp@polstra.com) From: John Polstra Received: (from jdp@localhost) by vashon.polstra.com (8.9.3/8.9.1) id MAA07294; Wed, 27 Sep 2000 12:09:01 -0700 (PDT) (envelope-from jdp@polstra.com) Date: Wed, 27 Sep 2000 12:09:01 -0700 (PDT) Message-Id: <200009271909.MAA07294@vashon.polstra.com> To: arch@freebsd.org Reply-To: arch@freebsd.org Cc: eischen@vigrid.com Subject: Re: Mutexes and semaphores In-Reply-To: References: Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article , Daniel Eischen wrote: > If you absolutley need recursive mutexes, then roll your own and > keep the base mutex simple. This is trivial to do and makes the > base mutex more efficient without the need to check for recursive > ownership. I think it would make sense to make recursive mutexes a separate type, so they don't complicate the non-recursive ones. But the "roll your own" idea would work against eventually getting rid of recursive mutexes entirely. If they are implemented ad hoc in various places, it will be hard to find them all later. Better to have a standard implementation that's easy to search for. > Mutexes should be held for very short amounts of time, and it should > be apparent in the encompassing code where the mutex is taken and > where it is released. In your example, what do you do in the case > of abnormal exits from recursively called code? It makes it far > more easier to handle this situation if you roll your own mutex > and keep track of the ref count and owner yourself. If you don't, > you'll end up adding mtx_exit_and_clear_refcount(). Yes, that's a good point. > My main concern is not to eliminate recursive mutexes, though I > still think they should go. I would like to see all barriers to > eliminating the flags/options to mtx_enter() and mtx_exit() removed. > The current form of the mutex routines is not an API/ABI we should > be using. I'm not too thrilled with the API myself. John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 12:31:23 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id F051D37B423 for ; Wed, 27 Sep 2000 12:31:07 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8RJV7j25841 for arch@FreeBSD.ORG; Wed, 27 Sep 2000 12:31:07 -0700 (PDT) Date: Wed, 27 Sep 2000 12:31:07 -0700 From: Alfred Perlstein To: arch@FreeBSD.ORG Subject: Re: Mutexes and semaphores Message-ID: <20000927123107.A9141@fw.wintelcom.net> References: <200009271909.MAA07294@vashon.polstra.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <200009271909.MAA07294@vashon.polstra.com>; from jdp@polstra.com on Wed, Sep 27, 2000 at 12:09:01PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * John Polstra [000927 12:09] wrote: > In article > , Daniel > Eischen wrote: > > > If you absolutley need recursive mutexes, then roll your own and > > keep the base mutex simple. This is trivial to do and makes the > > base mutex more efficient without the need to check for recursive > > ownership. > > I think it would make sense to make recursive mutexes a separate > type, so they don't complicate the non-recursive ones. But the "roll > your own" idea would work against eventually getting rid of recursive > mutexes entirely. If they are implemented ad hoc in various places, > it will be hard to find them all later. Better to have a standard > implementation that's easy to search for. As I said earlier, when you find some code that really needs one in order to make a subsystem you're working on mpsafe we'll have a short discussion to make sure it's really needed and if it is, then we'll do it. Right now there's no point in this discussion. > > I'm not too thrilled with the API myself. I think you should begin to use it before hating it, I didn't like it at first, but it's certainly usable. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 14: 0:42 2000 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id A441237B43E for ; Wed, 27 Sep 2000 14:00:38 -0700 (PDT) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id RAA00068; Wed, 27 Sep 2000 17:00:21 -0400 (EDT) Date: Wed, 27 Sep 2000 17:00:20 -0400 (EDT) From: Daniel Eischen To: arch@freebsd.org Cc: arch@freebsd.org Subject: Re: Mutexes and semaphores In-Reply-To: <200009271909.MAA07294@vashon.polstra.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 27 Sep 2000, John Polstra wrote: > In article > , Daniel > Eischen wrote: > > > If you absolutley need recursive mutexes, then roll your own and > > keep the base mutex simple. This is trivial to do and makes the > > base mutex more efficient without the need to check for recursive > > ownership. > > I think it would make sense to make recursive mutexes a separate > type, so they don't complicate the non-recursive ones. But the "roll > your own" idea would work against eventually getting rid of recursive > mutexes entirely. If they are implemented ad hoc in various places, > it will be hard to find them all later. Better to have a standard > implementation that's easy to search for. I'll agree to this; I've suggested it before. But I'd like to go one step further and not make them part of our official API. State that they are subject to change/removal, perhaps complain loudly when compiled with -DKLD_API (-DKLD_MODULE ?) or something. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 14:52:45 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 8C15137B422 for ; Wed, 27 Sep 2000 14:52:24 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id OAA03115; Wed, 27 Sep 2000 14:52:39 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdAAA3AaG3f; Wed Sep 27 14:52:26 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id OAA27175; Wed, 27 Sep 2000 14:52:07 -0700 (MST) From: Terry Lambert Message-Id: <200009272152.OAA27175@usr02.primenet.com> Subject: Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files To: arch@freebsd.org Date: Wed, 27 Sep 2000 21:52:06 +0000 (GMT) Cc: tlambert@primenet.com In-Reply-To: <200009271851.LAA07258@vashon.polstra.com> from "John Polstra" at Sep 27, 2000 11:51:54 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG John Polstra writes: > Terry Lambert wrote: > > The idea that you should ever go to sleep waiting for a mutex is > > antithetical to the very idea. > > Are you assuming that mutexes spin but semaphores sleep? You haven't > actually said so explicitly, but I'm starting to think that's your > assumption. [ ... ] > > It does. Semaphores can be held across a sleep (a wait). > > You must be assuming that mutexes always spin and semaphores don't. I > don't agree with that assumption, but at least it would explain why we > can't seem to communicate effectively on this topic. Yes, this is exactly my assumption. Here's why: You have to ask "why do I use a mutex?"; the answer is "to protect data". Then you have to ask "why do I need to protect data?"; the answer is more complicated, but boils down to "to prevent concurrent access in various and sundry situations which may arise". To my mind, there are only four cases which may result in an attempt at concurrent access: 1) SMP; concurrent access is attempted by another processor 2) Kernel reentrancy as a result of an excpetion (e.g. a page fault) 3) Kernel reentrancy as a result of an interrupt 4) Kernel preemption, as part of a Real Time subsystem In #1, it is not only acceptable to spin, it is preferred, since we know that (A) a mutex is only held for a very short period of time, and (B) we don't want to pay the penalty for going to sleep, since we are talking about stalling a single processor for a very short period of time, and we may have 8 processors, 7 of which would have to pay the penalty for a signal delivery, if a wakeup occurred. In #2 and #3, the degenerate cases are identical. One could conceive of a shared PCI interrupt being handled by different processors, given "bottom-end single threading" and "top-end multithreading". But devices own their own resources; only if we were able to multithread our "bottom-end", the actual hardware event handler that does nothing other than handle the hardware event to the point that it is no longer needed (this implies reenabling the interrupt outside of interrupt context), would there be a contented resource issue to resolve. Further, when operating in interupt or exception context, we are running in the context that was active at the time of the event. This bodes ill for recursive mutex acquisition, since it means that we could change data out from under the active context, despite it holding the mutex. Thus we can not use a mutex to contend resource between interrupt, exception, and normal contexts, if by ownership, we mean one of the three. NT resolves this by turning interrupt and exception contexts into heavy-weight contexts, on a par with a process (different page mapping, etc.). If FreeBSD does not do the same, then there MUST be no resource contention between these domains. If it does the same, then a modification is required: you still spin, but you do an explicit yield to the mutex holder during each cycle through the spin; so we are left needing an owner that is unique between all contexts, not simply between kernel threads. This would only end up being useful on systems where the I/O bus was never contended between drivers in a non-transparent way (e.g. all data movement is based on bus mastered DMA, with hardware contention, and interrupt signalling by the host when a DMA should be started, and by the device, when host processing should be started). In #4, we have the need to sleep, since it is possible that you will be put to sleep involuntarily while holding a mutex. The easiest way to handle this is to merely delay the sleep until all mutexes held by the unique owner have been relinquished. This implies a condition variable in the process structure, an overall mutex hold count, and another mutex to protect the hold count and the condition variable, associated with the context structure unique to the identity "owner". This will permit priority lending that lasts only for the duration of the held contended resource(s). Use of this complicates matters, but the benefit of RT support that would come with it is high, if RT is what floats your boat. A system utilizing this approach could be conditionally compiled as "RT" or "non-RT", using macro substitution, so the hit need not be taken in a "GENERIC" kernel. - So for the most part, unless we are implementing RT and a separate context for each potential concurrent interrupt to the host, and each concurrent exceptional condition (I see a need for a minimum of 2, for the F00F bug, and for 386 kernel page write fault processing), mutexes should spin. > > Even if you can make a case for this not being true (e.g. moving > > a vnode from one list to another, using the same pointers in > > the vnode to track state on both lists, which is really just an > > acquire/remove/release/aquire/insert/release operation, where you > > have a window between the removal and the reinsertion), it can be > > handled by strictly controlling the order of operation on mutex > > acquisition, and inverting the release order, and backing off in > > case of conflict. > > Yes, that's the standard way of avoiding deadlock. Though as far as > I can see, the release order doesn't actually matter, since releasing > never blocks anybody. The inversion is "acquire A/acquire B/process/release A/release B" ensures that the acquisition can occur concurrently in a forward path, in the fact of interrupt/exception/kernel preemption. It prevents a starvation deadlock. In the RT priority lending case (to support kernel preemption), it's possible to have a deadly embrace deadlock without this, as well, based on a low priority task "hogging" a conteded resource using a two mutex strategy, and therefore raising its own priority (this isn't a security issue, since user space mutex code is not an externalization of kernel space mutex code). By permitting the lending context to acquire A and go back into the spin/yield loop for resource B, you preclude this. For constructs like: acquire A/diddle A/release A acquire B/diddle B/release B acquire A/diddle A/release A You would have to recode them as: acquire A/diddle A acquire B/diddle B diddle A/release A release B As previously pointed out, this is most likely with list manipulation involving variables shared between lists that are protected by different mutexes. > > > If the mutex is recursively held, there is a problem in that some > > > other code grabbed the mutex and expected it to protect the data > > > structure from being changed underfoot. > > > > Worst case, set an "IN_USE" flag on the data in a flags field to > > bar reentry on a given data item. Best case, fix the broken code. > > The vnode locking code does this today (I'd argue that it's broken > > code). > > Well, we are dealing with a lot of legacy code that was never designed > with threads in mind. I personally believe that the recursive mutex > is a reasonable primitive to deal with it, particularly during the > transition phase. Well, we have this flag _now_ in the vnode code, so it's not like anything is being saved by not using it. I really don't see much code that has this problem. There's the scheduler, the VM system, and some process structure stuff having to do with fork/exec/_exit, but that seems to be it, after the vnode cruft. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 15: 0:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 285AF37B42C for ; Wed, 27 Sep 2000 15:00:46 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id PAA07262; Wed, 27 Sep 2000 15:01:03 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdAAARCaaZn; Wed Sep 27 15:00:42 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id PAA27464; Wed, 27 Sep 2000 15:00:22 -0700 (MST) From: Terry Lambert Message-Id: <200009272200.PAA27464@usr02.primenet.com> Subject: Re: Mutexes and semaphores To: eischen@vigrid.com (Daniel Eischen) Date: Wed, 27 Sep 2000 22:00:22 +0000 (GMT) Cc: arch@FreeBSD.ORG In-Reply-To: from "Daniel Eischen" at Sep 27, 2000 05:00:20 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Dan Eischen wrote: > On Wed, 27 Sep 2000, John Polstra wrote: > > > If you absolutley need recursive mutexes, then roll your own and > > > keep the base mutex simple. This is trivial to do and makes the > > > base mutex more efficient without the need to check for recursive > > > ownership. > > > > I think it would make sense to make recursive mutexes a separate > > type, so they don't complicate the non-recursive ones. But the "roll > > your own" idea would work against eventually getting rid of recursive > > mutexes entirely. If they are implemented ad hoc in various places, > > it will be hard to find them all later. Better to have a standard > > implementation that's easy to search for. > > I'll agree to this; I've suggested it before. But I'd like to go > one step further and not make them part of our official API. State > that they are subject to change/removal, perhaps complain loudly > when compiled with -DKLD_API (-DKLD_MODULE ?) or something. I'll third this approach. I personally don't see where they would be useful, but de-grunging the API argument list and putting them in a seperate API (mutex_legacy()?) would be vastly preferrable. I like the idea of being able to grep for "mutex_legacy" to be able to distinguish between code that has been hacked for legacy reasons vs. code that has been intentionally made SMP safe. I'd like to see an option in the config file for "LEGACY_MUTEX" suport, and that it be left out until someone actually uses a recursive mutex. Both of these steps would ensure that the code is not dropped on the floor, making it known in no uncertain terms that mutexes of this type are strongly discouraged. This fits well with Alfred P's arguments on planning push-down of locks, instead of merely hacking it. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 19:26:33 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 7C04B37B423 for ; Wed, 27 Sep 2000 19:26:27 -0700 (PDT) Received: from modemcable136.203-201-24.mtl.mc.videotron.ca ([24.201.203.136]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G1K00020S3Z9K@falla.videotron.net> for freebsd-arch@freebsd.org; Wed, 27 Sep 2000 22:26:23 -0400 (EDT) Date: Wed, 27 Sep 2000 22:30:11 -0400 (EDT) From: Bosko Milekic Subject: spinlocks and acquire pseudo-priority To: freebsd-arch@freebsd.org Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I cannot quantify how likely the following is... but logically, it should be more probable when there are more CPUs (at LEAST 3). Say a thread on processor 1 (A) grabs mutex Y, which happens to be a spin-only type mutex. Say thread on processor 2 (B) attempts to grab mutex Y, but fails and starts spinnnig in mtx_enter_hard(). Now say thread on processor 3 (C) attempts to grab mutex Y and makes it to mtx_enter() -- at this very instant before C is about to try it the "easy way" and do its cmpxchgl, A releases mutex Y. Now B is still spinning; in fact, B is in mtx_enter_hard in the while() loop, it had just checked whether the lock was still owned, and it was, so it's just iterating again and incrementing the loop index variable. Before B goes to the top of the loop and hits the comparison statement again (to see whether Y is still owned), C does its cmpxchgl and grabs the lock easily, without any issues whatsoever. B continues to spin and eventually the loop index reaches the "tolerated" values and there's a panic(). Please also note that even if B hits the top of the while loop and decides that the mutex is no longer owned, so it hits the top of the infinite loop and tries to grab it again, just before it grabs it, it could already be had by C. This isn't TOO much of a problem, because the probability is low, but grows with the number of processors. The problem I see is that the index i is never reset to zero and may eventually hit the tolerated values and trigger a panic. Is there something I'm leaving out/forgetting? Thanks, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Sep 27 23: 5:52 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id B744E37B42C for ; Wed, 27 Sep 2000 23:05:40 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8S65dN15596; Wed, 27 Sep 2000 23:05:39 -0700 (PDT) Date: Wed, 27 Sep 2000 23:05:39 -0700 From: Alfred Perlstein To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: spinlocks and acquire pseudo-priority Message-ID: <20000927230538.I7553@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bmilekic@technokratis.com on Wed, Sep 27, 2000 at 10:30:11PM -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Bosko Milekic [000927 19:26] wrote: > > I cannot quantify how likely the following is... but logically, it > should be more probable when there are more CPUs (at LEAST 3). > > Say a thread on processor 1 (A) grabs mutex Y, which happens to be a > spin-only type mutex. > > Say thread on processor 2 (B) attempts to grab mutex Y, but fails and > starts spinnnig in mtx_enter_hard(). > > Now say thread on processor 3 (C) attempts to grab mutex Y and makes > it to mtx_enter() -- at this very instant before C is about to try it the > "easy way" and do its cmpxchgl, A releases mutex Y. Now B is still > spinning; in fact, B is in mtx_enter_hard in the while() loop, it had > just checked whether the lock was still owned, and it was, so it's just > iterating again and incrementing the loop index variable. Before B goes > to the top of the loop and hits the comparison statement again (to see > whether Y is still owned), C does its cmpxchgl and grabs the lock easily, > without any issues whatsoever. B continues to spin and eventually the > loop index reaches the "tolerated" values and there's a panic(). > > Please also note that even if B hits the top of the while loop and > decides that the mutex is no longer owned, so it hits the top of the > infinite loop and tries to grab it again, just before it grabs it, it > could already be had by C. This isn't TOO much of a problem, because the > probability is low, but grows with the number of processors. The problem > I see is that the index i is never reset to zero and may eventually hit > the tolerated values and trigger a panic. > > Is there something I'm leaving out/forgetting? It seems like a possibility, however a spinlock being that contested is most likely a problem and needs to be fixed. It may be a good idea to examine the lock right before panicing to see if the lock state has changed. It may also be a good idea to alternate between a hard spin and a DELAY loop rather then backoff so much. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Sep 28 1:29:49 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 1C04937B422 for ; Thu, 28 Sep 2000 01:29:47 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id BAA16382; Thu, 28 Sep 2000 01:28:20 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp03.primenet.com, id smtpdAAAa7aO9F; Thu Sep 28 01:28:14 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id BAA11680; Thu, 28 Sep 2000 01:29:39 -0700 (MST) From: Terry Lambert Message-Id: <200009280829.BAA11680@usr02.primenet.com> Subject: Re: spinlocks and acquire pseudo-priority To: bmilekic@technokratis.com (Bosko Milekic) Date: Thu, 28 Sep 2000 08:29:39 +0000 (GMT) Cc: freebsd-arch@FreeBSD.ORG In-Reply-To: from "Bosko Milekic" at Sep 27, 2000 10:30:11 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > B continues to spin and eventually the > loop index reaches the "tolerated" values and there's a panic(). > > Please also note that even if B hits the top of the while loop and > decides that the mutex is no longer owned, so it hits the top of the > infinite loop and tries to grab it again, just before it grabs it, it > could already be had by C. This isn't TOO much of a problem, because the > probability is low, but grows with the number of processors. The problem > I see is that the index i is never reset to zero and may eventually hit > the tolerated values and trigger a panic. > > Is there something I'm leaving out/forgetting? You are talking about non-deadlock starvation here. The simple answer is "use "for(;;)" instead of something with a loop index". The fact is, there is just as much probability of C losing a race with B for a contended resource formerly held by A under normal circumstances, as there is for it losing because of the conditions which you describe. The answer is: it doesn't matter -- you only ever use a spinlock to do one of two things: 1) Eat the overhead of a heavyweight non-spinning lock 2) Contend a resource which will be available in a small amount of time anyway, so it doesn't matter whether you get it first or second If you really cared about FIFO, FILO, or prioritization or some other policy based ordering on lock acquisition, you would not use spinlocks; you would use turnstiles and "wake one", or you would use some other policy cognizant mechanism for doing the granting. FWIW, this means you wouldn't use a mutex, either. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Sep 28 5:42:35 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 25E4037B422 for ; Thu, 28 Sep 2000 05:42:29 -0700 (PDT) Received: from modemcable136.203-201-24.mtl.mc.videotron.ca ([24.201.203.136]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G1L00H1GKMQPP@falla.videotron.net> for freebsd-arch@FreeBSD.ORG; Thu, 28 Sep 2000 08:42:26 -0400 (EDT) Date: Thu, 28 Sep 2000 08:46:15 -0400 (EDT) From: Bosko Milekic Subject: Re: spinlocks and acquire pseudo-priority In-reply-to: <20000927230538.I7553@fw.wintelcom.net> To: Alfred Perlstein Cc: freebsd-arch@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 27 Sep 2000, Alfred Perlstein wrote: > It seems like a possibility, however a spinlock being that contested is > most likely a problem and needs to be fixed. Not necessarily. It may occur in a big resource starvation where many threads just end up in msleep(), or similar, and many others call wakeup(). > It may be a good idea to examine the lock right before panicing to > see if the lock state has changed. Yeah, I agree, but it may still happen.... although you make it lesss likely by doing that. > It may also be a good idea to alternate between a hard spin and a > DELAY loop rather then backoff so much. > > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > "I have the heart of a child; I keep it in a jar on my desk." Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Sep 28 11: 7:29 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 0D7C837B622 for ; Thu, 28 Sep 2000 11:06:38 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8SI6bI02119 for arch@freebsd.org; Thu, 28 Sep 2000 11:06:37 -0700 (PDT) Date: Thu, 28 Sep 2000 11:06:37 -0700 From: Alfred Perlstein To: arch@freebsd.org Subject: we need atomic_t Message-ID: <20000928110637.U7553@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Linux has a datatype called "atomic_t", very useful for refcounts and struct counters like tcpstat. My impression is that it's the largest type an arch can support atomic ops on without weird gyrations and/or extremely expensive operations. Example: atomic_t is 32bit on i386, and I think 24 on sparc32. This would replace our atomic_op_type with just atomic_op and make code easier to read and get right. Linux also has the ability to do a atomic_dec_and_test() which returns whether the operation decremented the atomic_t down to 0 or not very useful for making sure _you_ were the one that made the refcount == 0 so that you can free it. I'm already seeing a pretty good examples of where this can be applied: 1) struct ucred->cr_ref 2) struct uidinfo->ui_ref 3) tcpstats 4) other stats :) 5) mbuf external ref counts I don't have the gcc-assembler-foo to do this optimally without directly copying from Linux which isn't acceptable. Can anyone snap this up? I'd really appreciate it. thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Sep 28 11:15: 9 2000 Delivered-To: freebsd-arch@freebsd.org Received: from mass.osd.bsdi.com (adsl-63-202-176-106.dsl.snfc21.pacbell.net [63.202.176.106]) by hub.freebsd.org (Postfix) with ESMTP id 232D037B423 for ; Thu, 28 Sep 2000 11:14:57 -0700 (PDT) Received: from mass.osd.bsdi.com (localhost [127.0.0.1]) by mass.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id e8SIGLA01632; Thu, 28 Sep 2000 11:16:21 -0700 (PDT) (envelope-from msmith@mass.osd.bsdi.com) Message-Id: <200009281816.e8SIGLA01632@mass.osd.bsdi.com> X-Mailer: exmh version 2.1.1 10/15/1999 To: Alfred Perlstein Cc: arch@freebsd.org Subject: Re: we need atomic_t In-reply-to: Your message of "Thu, 28 Sep 2000 11:06:37 PDT." <20000928110637.U7553@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 28 Sep 2000 11:16:21 -0700 From: Mike Smith Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Linux has a datatype called "atomic_t", very useful for refcounts > and struct counters like tcpstat. My impression is that it's the > largest type an arch can support atomic ops on without weird > gyrations and/or extremely expensive operations. sig_atomic_t. > This would replace our atomic_op_type with just atomic_op and make > code easier to read and get right. I strongly disagree. The atomic__ interface is useful and necessary and should remain. I don't agree that this would make anything easier. In particular, the explicit use of the atomic_* operations makes the atomicity constraints very clear. > I'm already seeing a pretty good examples of where this can be > applied: > > 1) struct ucred->cr_ref > 2) struct uidinfo->ui_ref > 3) tcpstats > 4) other stats :) > 5) mbuf external ref counts > > I don't have the gcc-assembler-foo to do this optimally without > directly copying from Linux which isn't acceptable. In most cases, you're manipulating the reference count under a mutex (since there's no other way to avoid the race where someone else frees your structure while you're in the process of dereferencing it), so this is largely unnecessary. > Can anyone snap this up? I'd really appreciate it. Hold it right there, sunshine. 8) -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Sep 28 11:39:41 2000 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id E506D37B422; Thu, 28 Sep 2000 11:39:09 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8SId9D04813; Thu, 28 Sep 2000 11:39:09 -0700 (PDT) Date: Thu, 28 Sep 2000 11:39:09 -0700 From: Alfred Perlstein To: Mike Smith Cc: arch@freebsd.org Subject: Re: we need atomic_t Message-ID: <20000928113907.V7553@fw.wintelcom.net> References: <20000928110637.U7553@fw.wintelcom.net> <200009281816.e8SIGLA01632@mass.osd.bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <200009281816.e8SIGLA01632@mass.osd.bsdi.com>; from msmith@freebsd.org on Thu, Sep 28, 2000 at 11:16:21AM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Mike Smith [000928 11:14] wrote: > > Linux has a datatype called "atomic_t", very useful for refcounts > > and struct counters like tcpstat. My impression is that it's the > > largest type an arch can support atomic ops on without weird > > gyrations and/or extremely expensive operations. > > sig_atomic_t. I'll look at that. > > > This would replace our atomic_op_type with just atomic_op and make > > code easier to read and get right. > > I strongly disagree. The atomic__ interface is useful and > necessary and should remain. I don't agree that this would make anything > easier. In particular, the explicit use of the atomic_* operations makes > the atomicity constraints very clear. I really hate it, it slows down my programming, most of these counters need to be the largest the platform will allow just avoid overflows, you remeber the struct file refcount problem we had with apache right? What's the point of having counters and refcounts that easily overflow? > > I'm already seeing a pretty good examples of where this can be > > applied: > > > > 1) struct ucred->cr_ref > > 2) struct uidinfo->ui_ref > > 3) tcpstats > > 4) other stats :) > > 5) mbuf external ref counts > > > > I don't have the gcc-assembler-foo to do this optimally without > > directly copying from Linux which isn't acceptable. > > In most cases, you're manipulating the reference count under a mutex > (since there's no other way to avoid the race where someone else frees > your structure while you're in the process of dereferencing it), so this > is largely unnecessary. It's not possible for this to happen, this is why struct ucred, mbuf and uidinfo lend themselves to mpsafeness pretty easily with atomic refcounts. You do need to own the parent structure lock (struct proc/socket/etc) so that two codepaths can not deref the same pointer at the same time, but you need to do that anyway. Basically you need to own a lock on whatever allows access to the pointer to the ucred/uidinfo/mbuf. The atomic ops are particularly useful when you have multiple different structures that point to a single object (ucred and mbuf particularly) When you instantiate a ucred, it has a refcount of 1 and you can be garanteed that no one else if referencing it, right before it's shallow copied (a point to it is in more than one place) the count is at 2 which prevents free() from other codepaths, you're expected to "own" the parent structure of be it a refcount of 1 or a lock on the parent. I'd much rather have: void crfree(cr) struct ucred *cr; { if (atomic_dec_test(&cr->cr_ref, 1) == 0) { /* * Some callers of crget(), such as nfs_statfs(), * allocate a temporary credential, but don't * allocate a uidinfo structure. */ if (cr->cr_uidinfo != NULL) uifree(cr->cr_uidinfo); FREE((caddr_t)cr, M_CRED); } } than: void crfree(cr) struct ucred *cr; { mtx_enter(&cr->cr_mtx, MTX_DEF); if (--cr->cr_ref == 0) { mtx_exit(&cr->cr_mtx, MTX_DEF); /* * Some callers of crget(), such as nfs_statfs(), * allocate a temporary credential, but don't * allocate a uidinfo structure. */ if (cr->cr_uidinfo != NULL) uifree(cr->cr_uidinfo); FREE((caddr_t)cr, M_CRED); } else { mtx_exit(&cr->cr_mtx, MTX_DEF); } } Note that there's a interesting cascade assertion going on: if (atomic_dec_test(&cr->cr_ref, 1) == 0) { /* * then i have exclusive ownership of this ucred * so I can free the uidinfo it references without a lock * because my parent is locked. */ Is there a problem with this scheme? > > Can anyone snap this up? I'd really appreciate it. > > Hold it right there, sunshine. 8) pfft! :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Sep 29 19:25:42 2000 Delivered-To: freebsd-arch@freebsd.org Received: from puck.firepipe.net (mcut-b-167.resnet.purdue.edu [128.211.209.167]) by hub.freebsd.org (Postfix) with ESMTP id CB9BD37B503 for ; Fri, 29 Sep 2000 19:25:40 -0700 (PDT) Received: by puck.firepipe.net (Postfix, from userid 1000) id 842591908; Fri, 29 Sep 2000 21:26:36 -0500 (EST) Date: Fri, 29 Sep 2000 21:26:36 -0500 From: Will Andrews To: Hubert Feyrer , op-tech@openpackages.org, arch@FreeBSD.org Subject: :C/// regex for make(1) Message-ID: <20000929212636.Q75085@puck.firepipe.net> Reply-To: Will Andrews Mail-Followup-To: Will Andrews , Hubert Feyrer , op-tech@openpackages.org, arch@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Operating-System: FreeBSD 4.1-STABLE i386 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi Hubert & others, I've reviewed your PR#21605 and have only one objection: the #ifndef NO_REGEX parts. I don't see the point in having this (and others have concurred). Is there something I missed about when someone may not want regex support in make(1)? As soon as I can get -current working on my laptop again I'll test your changes and do a closer review of the code. Then commit, pending any changes. Thanks for your submission again, -- Will Andrews - Physics Computer Network wench The Universal Answer to All Problems - "It has something to do with physics." -- Comic on door of Room 240, Physics Building, Purdue University To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Sep 30 6:13:18 2000 Delivered-To: freebsd-arch@freebsd.org Received: from rfhs8012.fh-regensburg.de (rfhs8012.fh-regensburg.de [194.95.108.29]) by hub.freebsd.org (Postfix) with ESMTP id 81F6637B502 for ; Sat, 30 Sep 2000 06:13:16 -0700 (PDT) Received: from rfhpc8320.fh-regensburg.de (feyrer@rfhpc8320 [194.95.108.32]) by rfhs8012.fh-regensburg.de (8.10.1/8.10.1) with ESMTP id e8UDCP013873; Sat, 30 Sep 2000 15:12:26 +0200 (MET DST) Received: (from feyrer@localhost) by rfhpc8320.fh-regensburg.de (8.9.1/8.8.3) id PAA05519; Sat, 30 Sep 2000 15:15:15 +0200 (MET DST) Date: Sat, 30 Sep 2000 15:15:15 +0200 (MET DST) From: Hubert Feyrer X-Sender: feyrer@rfhpc8320.fh-regensburg.de To: Will Andrews Cc: op-tech@openpackages.org, arch@FreeBSD.org Subject: Re: :C/// regex for make(1) In-Reply-To: <20000929212636.Q75085@puck.firepipe.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 29 Sep 2000, Will Andrews wrote: > I've reviewed your PR#21605 and have only one objection: the #ifndef > NO_REGEX parts. I don't see the point in having this (and others have > concurred). Is there something I missed about when someone may not want > regex support in make(1)? I've left the #ifdef in as it's in the NetBSD code. I can only guess that the reason is to make it easier to bootstrap make(1) on systems that don't have regexp routines. Seeing that FreeBSD has dropped all the bootstrapping code, it's probably best to just put in the code unconditionally. > As soon as I can get -current working on my laptop again I'll test your > changes and do a closer review of the code. Then commit, pending any > changes. OK - maybe let me know when you're done. FYI, I've tested this with 4.0-STABLE. - Hubert -- Hubert Feyrer To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Sep 30 6:58:54 2000 Delivered-To: freebsd-arch@freebsd.org Received: from puck.firepipe.net (mcut-b-167.resnet.purdue.edu [128.211.209.167]) by hub.freebsd.org (Postfix) with ESMTP id 7472F37B503 for ; Sat, 30 Sep 2000 06:58:52 -0700 (PDT) Received: by puck.firepipe.net (Postfix, from userid 1000) id 1904E1908; Sat, 30 Sep 2000 08:59:54 -0500 (EST) Date: Sat, 30 Sep 2000 08:59:54 -0500 From: Will Andrews To: Hubert Feyrer Cc: Will Andrews , op-tech@openpackages.org, arch@FreeBSD.org Subject: Re: :C/// regex for make(1) Message-ID: <20000930085954.W75085@puck.firepipe.net> Reply-To: Will Andrews Mail-Followup-To: Will Andrews , Hubert Feyrer , op-tech@openpackages.org, arch@FreeBSD.org References: <20000929212636.Q75085@puck.firepipe.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from hubert.feyrer@informatik.fh-regensburg.de on Sat, Sep 30, 2000 at 03:15:15PM +0200 X-Operating-System: FreeBSD 4.1-STABLE i386 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, Sep 30, 2000 at 03:15:15PM +0200, Hubert Feyrer wrote: > I've left the #ifdef in as it's in the NetBSD code. I can only guess that > the reason is to make it easier to bootstrap make(1) on systems that don't > have regexp routines. Seeing that FreeBSD has dropped all the > bootstrapping code, it's probably best to just put in the code > unconditionally. Yeah. I'm still waiting for objections. > OK - maybe let me know when you're done. > FYI, I've tested this with 4.0-STABLE. Yes, make(1) has for the most part stayed in sync as far as the -stable & -current branches at a given time are concerned. Thanks again, -- Will Andrews - Physics Computer Network wench The Universal Answer to All Problems - "It has something to do with physics." -- Comic on door of Room 240, Physics Building, Purdue University To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message