From owner-freebsd-arch@FreeBSD.ORG Mon Aug 15 06:15:38 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF10016A41F; Mon, 15 Aug 2005 06:15:38 +0000 (GMT) (envelope-from m.ehinger@ltur.de) Received: from posty.gateway-inter.net (posty.gateway-inter.net [213.144.19.86]) by mx1.FreeBSD.org (Postfix) with ESMTP id 09D6E43D6E; Mon, 15 Aug 2005 06:15:35 +0000 (GMT) (envelope-from m.ehinger@ltur.de) In-Reply-To: <20050812162719.GA27362@garage.freebsd.pl> To: Pawel Jakub Dawidek Message-ID: From: m.ehinger@ltur.de Date: Mon, 15 Aug 2005 08:15:00 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Cc: freebsd-arch@freebsd.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2005 06:15:39 -0000 thanks, meanwhile i found out that sysctlbyname(3) calls it only once. maik Pawel Jakub Dawidek 12.08.2005 18:27 An m.ehinger@ltur.de Kopie freebsd-arch@freebsd.org Thema Re: sysctl_proc calls handler twice On Thu, Aug 11, 2005 at 06:12:14PM +0200, m.ehinger@ltur.de wrote: +> Hi, +> +> can someone explain why a proc sysctl (add via SYSCTL_PROC or SYSCTL_ADD_PROC) calls the handler twice if i read the sysctl only +> once? Is this the normal behaviour? Yes, AFAIR first call is done to find out how much memory should be allocated and second one is request for a real data. +> How can i prevent this? You can't, try to live with it. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! From owner-freebsd-arch@FreeBSD.ORG Mon Aug 15 18:21:59 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9566916A422 for ; Mon, 15 Aug 2005 18:21:59 +0000 (GMT) (envelope-from Doe@jaytech.com) Received: from jaytech.com (dpg231.neoplus.adsl.tpnet.pl [83.24.140.231]) by mx1.FreeBSD.org (Postfix) with SMTP id AA91843D46 for ; Mon, 15 Aug 2005 18:21:45 +0000 (GMT) (envelope-from Doe@jaytech.com) From: "Meital Doe" To: "Itzhak Wu" Message-ID: <001901c5a1c6$24a02a00$698aa8c0@flying> Date: Mon, 15 Aug 2005 13:21:24 -0500 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: =?iso-8859-1?q?McDonald=27s_bomber_j=C4iled?= X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2005 18:21:59 -0000 Hello, some acquaintance will see me and then theyll tell my husband I was = = outKoroviev - hes the devil!has been sitting on this platform and = sleeping, but when the full moonthree pm come. Berlioz.sounds - over = him. But before then he wont rise.will not see Yeshua, you will never = leave your refuge. Hundreds of arms were raised, the spectators = held the bills up to theabout the world and about power. Well, so, = if its more comforting, consider me that, Woland replied `He casts = no shadow! Rimsky cried out desperately in his mind. Heeven moaned, = began talking to himself, got up. The storm raged more andshe had = received from Azazello, and Margarita did not take her eyes from itsthe = same company. Here the girls story was interrupted - the valerian = had not done muchscored by deep wrinkles running parallel to the sharp = eyebrows. The skin ofparchment and was horrified. I said decidedly = nothing of whats written From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 12:12:53 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B4E0A16A41F; Tue, 16 Aug 2005 12:12:53 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BAA343D46; Tue, 16 Aug 2005 12:12:53 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7GCCYeR066933; Tue, 16 Aug 2005 05:12:34 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7GCCVpr066932; Tue, 16 Aug 2005 05:12:31 -0700 (PDT) (envelope-from rizzo) Date: Tue, 16 Aug 2005 05:12:31 -0700 From: Luigi Rizzo To: John Baldwin Message-ID: <20050816051231.D66550@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <200508101638.27087.jhb@FreeBSD.org> <42FA6E0E.4070205@samsco.org> <200508111121.46546.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508111121.46546.jhb@FreeBSD.org>; from jhb@freebsd.org on Thu, Aug 11, 2005 at 11:21:45AM -0400 Cc: frank@exit.com, Andre Oppermann , freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 12:12:53 -0000 reading this thread, and at times looking at some of the kernel code, with plenty of places where you have to drop a lock that you already have, do some small thing and then reacquire the lock itself, makes me wonder if we don't need a better mechanism/abtraction for this kind of programming. In a way, this seems similar to the handling of interrupts: if we want a thread to be interrupted we don't check for interrupts (and save and restore state) explicitly at every instruction, but rely on the processor doing the right thing for us. I am sorry i cannot formulate the analogy in a clearer way (if i could i would probably have a design to address this problem :( ) cheers luigi On Thu, Aug 11, 2005 at 11:21:45AM -0400, John Baldwin wrote: > On Wednesday 10 August 2005 05:13 pm, Scott Long wrote: > > John Baldwin wrote: > > > On Wednesday 10 August 2005 04:10 pm, Frank Mayhar wrote: > > >>On Wed, 2005-08-10 at 09:11 -0400, John Baldwin wrote: > > >>>I think this is the model that BSD/OS employed > > >>>for SMP in its 4.x series before they did their version of SMPng. > > >> > > >>I didn't grunge around in the scheduler (much), but as far as I'm aware > > >>BSD/OS 4.x used the Big Giant Lock mechanism just as FreeBSD did, and > > >>for the same reason. > > > > > > I believe that at some point during the 4.x series they added a scheduler > > > lock that covered just enough to allow threads that weren't asleep in the > > > kernel to be switched to without require the big giant lock and that it > > > was a pretty decent performance win over the earlier single BGL ala > > > FreeBSD 4.x. > > > > So when a syscall is made on an AP, does it get serviced on the same AP > > (assuming that the lock is available and no sleeping is needed), or does > > it get serviced my the BSP? Where kernel threads explicitely pinned to > > the BSP? Was the APIC explicitely programmed to deliver only to the > > BSP? > > I think the AP would block on the BGL in the stuff BSD/OS did, but Schimmel > points out that that can be non-optimal (SMP in 4.x was basically about the > worst possible idea according to Schimmel). A better implementation of > master/slave is for all syscalls, traps, and interrupts to run only on the > BSP and have the APs just run in userland. I.e. they could take over a > thread that had made it to userret (when you get to userret, you would mark > the thread as a user thread somwhow) and when a thread running on an AP > wanted to enter the kernel (syscall or trap), it would have to stick the > thread on the runqueue for the BSP and go look for another user thread. > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 12:59:31 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CEF1E16A41F; Tue, 16 Aug 2005 12:59:31 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 34F6C43D55; Tue, 16 Aug 2005 12:59:28 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [172.20.101.145] ([12.167.156.2]) (authenticated bits=0) by pooker.samsco.org (8.13.3/8.13.3) with ESMTP id j7GDAL8Q045243; Tue, 16 Aug 2005 07:10:21 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <4301E303.9060101@samsco.org> Date: Tue, 16 Aug 2005 06:58:43 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050615 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Luigi Rizzo References: <42F9ECF2.8080809@freebsd.org> <200508101638.27087.jhb@FreeBSD.org> <42FA6E0E.4070205@samsco.org> <200508111121.46546.jhb@FreeBSD.org> <20050816051231.D66550@xorpc.icir.org> In-Reply-To: <20050816051231.D66550@xorpc.icir.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org Cc: frank@exit.com, Andre Oppermann , freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 12:59:32 -0000 Luigi Rizzo wrote: > reading this thread, and at times looking at some of the kernel code, > with plenty of places where you have to drop a lock that you > already have, do some small thing and then reacquire the lock itself, > makes me wonder if we don't need a better mechanism/abtraction for > this kind of programming. > > In a way, this seems similar to the handling of interrupts: > if we want a thread to be interrupted we don't check for interrupts > (and save and restore state) explicitly at every instruction, but > rely on the processor doing the right thing for us. > > I am sorry i cannot formulate the analogy in a clearer way > (if i could i would probably have a design to address this problem :( ) > > cheers > luigi > You're saying that you would like a system where a thread that wants a lock can ask another thread that has the lock to temporarily give it up and go to sleep? Scott > On Thu, Aug 11, 2005 at 11:21:45AM -0400, John Baldwin wrote: > >>On Wednesday 10 August 2005 05:13 pm, Scott Long wrote: >> >>>John Baldwin wrote: >>> >>>>On Wednesday 10 August 2005 04:10 pm, Frank Mayhar wrote: >>>> >>>>>On Wed, 2005-08-10 at 09:11 -0400, John Baldwin wrote: >>>>> >>>>>>I think this is the model that BSD/OS employed >>>>>>for SMP in its 4.x series before they did their version of SMPng. >>>>> >>>>>I didn't grunge around in the scheduler (much), but as far as I'm aware >>>>>BSD/OS 4.x used the Big Giant Lock mechanism just as FreeBSD did, and >>>>>for the same reason. >>>> >>>>I believe that at some point during the 4.x series they added a scheduler >>>>lock that covered just enough to allow threads that weren't asleep in the >>>>kernel to be switched to without require the big giant lock and that it >>>>was a pretty decent performance win over the earlier single BGL ala >>>>FreeBSD 4.x. >>> >>>So when a syscall is made on an AP, does it get serviced on the same AP >>>(assuming that the lock is available and no sleeping is needed), or does >>>it get serviced my the BSP? Where kernel threads explicitely pinned to >>>the BSP? Was the APIC explicitely programmed to deliver only to the >>>BSP? >> >>I think the AP would block on the BGL in the stuff BSD/OS did, but Schimmel >>points out that that can be non-optimal (SMP in 4.x was basically about the >>worst possible idea according to Schimmel). A better implementation of >>master/slave is for all syscalls, traps, and interrupts to run only on the >>BSP and have the APs just run in userland. I.e. they could take over a >>thread that had made it to userret (when you get to userret, you would mark >>the thread as a user thread somwhow) and when a thread running on an AP >>wanted to enter the kernel (syscall or trap), it would have to stick the >>thread on the runqueue for the BSP and go look for another user thread. >> >>-- >>John Baldwin <>< http://www.FreeBSD.org/~jhb/ >>"Power Users Use the Power to Serve" = http://www.FreeBSD.org >>_______________________________________________ >>freebsd-arch@freebsd.org mailing list >>http://lists.freebsd.org/mailman/listinfo/freebsd-arch >>To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 13:17:30 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 80D5116A41F; Tue, 16 Aug 2005 13:17:30 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id E9F6243D46; Tue, 16 Aug 2005 13:17:29 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87]) by mailout2.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j7GDHS2i000856; Tue, 16 Aug 2005 23:17:28 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j7GDHMhS006711; Tue, 16 Aug 2005 23:17:27 +1000 Date: Tue, 16 Aug 2005 23:17:21 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: m.ehinger@ltur.de In-Reply-To: Message-ID: <20050816221033.C47830@delplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Pawel Jakub Dawidek , freebsd-arch@FreeBSD.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 13:17:30 -0000 On Mon, 15 Aug 2005 m.ehinger@ltur.de wrote: > meanwhile i found out that sysctlbyname(3) calls it only once. This seems to be sort of backwards. It is sysctlbyname(3) that makes twice as many syscalls as sysctl(3); sysctl(3) doesn't call handlers twice, but sysctl(8) calls sysctl(3) many times 2 of these calls typically reach the handler. In more detail: sysctlbyname(3) calls sysctl(3) twice: 1 to look up the sysctl 1 to do the work. sysctl(3) calls __sysctl(2undoc) 0 or 1 times: 1 if the sysctl is a normal (kernel) one 0 for for library (user) sysctls sysctl(8) for reading calls sysctl(3) 6 times for the cases that I tested: 4 to look up the sysctl (why more than for sysctlbyname(3)?) 1 to estimate the size of the amount of data to be returned 1 to do the work (read the data) Only the last 2 of these calls reach the handler. Proc handlers are only special here in that they are more specialized than the integer handlers. The data size is known in advance for integer handlers, but sysctl(8) asks the kernel for the size in all cases. sysctl(8) for writing calls sysctl 12 (!) times for the (integer) case that I tested: 6 to read the old value as above. The read-and-return-the-old values semantics of sysctl(3) is apparently not used by sysctl(8). 2 to look up the sysctl again 1 to do the work (write the new value) 2 to look up the sysctl again 1 to read the new value (this step is necessary since sometimes the value comes back in a modified form even for non-volatile data, due to bugs or features). [Context almost lost to top posting] > On Thu, Aug 11, 2005 at 06:12:14PM +0200, m.ehinger@ltur.de wrote: > +> Hi, > +> > +> can someone explain why a proc sysctl (add via SYSCTL_PROC or SYSCTL_ADD_PROC) calls the handler twice if i read the sysctl only > +> once? Is this the normal behaviour? > > Yes, AFAIR first call is done to find out how much memory should be > allocated and second one is request for a real data. > > +> How can i prevent this? > > You can't, try to live with it. No, just don't call it twice like sysctl(8) if you know the size in advance or from a previous call (and know that it won't change or handle the error from it changing...). Bruce From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 13:33:34 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E59616A420 for ; Tue, 16 Aug 2005 13:33:34 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id D712743D5A for ; Tue, 16 Aug 2005 13:33:31 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 9E8DE52C7E; Tue, 16 Aug 2005 15:33:28 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id B6C6852C3F; Tue, 16 Aug 2005 15:33:16 +0200 (CEST) Date: Tue, 16 Aug 2005 15:33:07 +0200 From: Pawel Jakub Dawidek To: Bruce Evans Message-ID: <20050816133307.GD3944@garage.freebsd.pl> References: <20050816221033.C47830@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="T6xhMxlHU34Bk0ad" Content-Disposition: inline In-Reply-To: <20050816221033.C47830@delplex.bde.org> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng devel (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: m.ehinger@ltur.de, freebsd-arch@FreeBSD.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 13:33:34 -0000 --T6xhMxlHU34Bk0ad Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 16, 2005 at 11:17:21PM +1000, Bruce Evans wrote: +> No, just don't call it twice like sysctl(8) if you know the size in adva= nce +> or from a previous call (and know that it won't change or handle the err= or +> from it changing...). Thread's author, as I understand/guess it, represents kernel side. He doesn't want his handler to be called twice and tools like sysctl(8) are not able to know size of every sysctl, so he just has to be ready that his handler will be called twice. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --T6xhMxlHU34Bk0ad Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFDAesTForvXbEpPzQRAljFAJ4vMsSlgjczWLL9+AHtGZ3hxu8YHgCcDnog 7hJ5tcPSKmQV12DkE6TsU+M= =8MMb -----END PGP SIGNATURE----- --T6xhMxlHU34Bk0ad-- From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 13:35:47 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 779A816A41F; Tue, 16 Aug 2005 13:35:47 +0000 (GMT) (envelope-from phk@phk.freebsd.dk) Received: from haven.freebsd.dk (haven.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 15F8943D53; Tue, 16 Aug 2005 13:35:47 +0000 (GMT) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (unknown [192.168.48.2]) by haven.freebsd.dk (Postfix) with ESMTP id 42029BC50; Tue, 16 Aug 2005 13:35:43 +0000 (UTC) To: Bruce Evans From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 16 Aug 2005 23:17:21 +1000." <20050816221033.C47830@delplex.bde.org> Date: Tue, 16 Aug 2005 15:35:42 +0200 Message-ID: <11638.1124199342@phk.freebsd.dk> Sender: phk@phk.freebsd.dk Cc: m.ehinger@ltur.de, Pawel Jakub Dawidek , freebsd-arch@FreeBSD.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 13:35:47 -0000 In message <20050816221033.C47830@delplex.bde.org>, Bruce Evans writes: > Only the last 2 of these calls reach the handler. Proc handlers are > only special here in that they are more specialized than the integer > handlers. Strictly speaking it is the other way arond: since integer handlers are implemented in terms of proc handlers, it follows that proc handlers cannot be more specialized than integer handlers. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 14:20:03 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 654B316A41F; Tue, 16 Aug 2005 14:20:03 +0000 (GMT) (envelope-from m.ehinger@ltur.de) Received: from posty.gateway-inter.net (posty.gateway-inter.net [213.144.19.86]) by mx1.FreeBSD.org (Postfix) with ESMTP id E9FFD43D45; Tue, 16 Aug 2005 14:20:02 +0000 (GMT) (envelope-from m.ehinger@ltur.de) In-Reply-To: <11638.1124199342@phk.freebsd.dk> To: "Poul-Henning Kamp" Message-ID: From: m.ehinger@ltur.de Date: Tue, 16 Aug 2005 16:19:26 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Cc: Pawel Jakub Dawidek , freebsd-arch@FreeBSD.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 14:20:03 -0000 Thanks for all answers, that more details than i needed. I got around the problem by checking an external variable. Here comes the hole story if somebody cares. (but be warned my english isn't the best) I wrote an driver for the Accelerometers used in Thinkpad and just want to see the data returned. So i created an sysctl proc which will read the data and return it. The problem was that the two calls to the proc are to close, i forgot to check if the Accelerometer has already finished its last run. So i want to write to the Accelerometer although he wasn't ready for that. With sysctlbyname i got no errors because the program itself took enough time between the calls. I now check if the previous command has finished before i send the new one so the double call isn't a probleme anymore. Any questions? thanks again for the answers maik "Poul-Henning Kamp" Gesendet von: phk@phk.freebsd.dk An Bruce Evans 16.08.2005 15:35 Kopie m.ehinger@ltur.de, Pawel Jakub Dawidek , freebsd-arch@FreeBSD.org Thema Re: sysctl_proc calls handler twice In message <20050816221033.C47830@delplex.bde.org>, Bruce Evans writes: > Only the last 2 of these calls reach the handler. Proc handlers are > only special here in that they are more specialized than the integer > handlers. Strictly speaking it is the other way arond: since integer handlers are implemented in terms of proc handlers, it follows that proc handlers cannot be more specialized than integer handlers. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 14:29:24 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4533216A42A for ; Tue, 16 Aug 2005 14:29:24 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9BE1143D46 for ; Tue, 16 Aug 2005 14:29:23 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id E926F52C70; Tue, 16 Aug 2005 16:29:21 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 1F39352C6F; Tue, 16 Aug 2005 16:29:12 +0200 (CEST) Date: Tue, 16 Aug 2005 16:29:02 +0200 From: Pawel Jakub Dawidek To: m.ehinger@ltur.de Message-ID: <20050816142902.GE3944@garage.freebsd.pl> References: <11638.1124199342@phk.freebsd.dk> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3yNHWXBV/QO9xKNm" Content-Disposition: inline In-Reply-To: X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng devel (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: Poul-Henning Kamp , freebsd-arch@FreeBSD.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 14:29:25 -0000 --3yNHWXBV/QO9xKNm Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 16, 2005 at 04:19:26PM +0200, m.ehinger@ltur.de wrote: +>=20 +> Thanks for all answers, +>=20 +> that more details than i needed. +>=20 +> I got around the problem by checking an external variable. +>=20 +> Here comes the hole story if somebody cares. (but be warned my english i= sn't the best) +>=20 +> I wrote an driver for the Accelerometers used in Thinkpad and just= want to see the data returned. +> So i created an sysctl proc which will read the data and return it. +> The problem was that the two calls to the proc are to close, i for= got to check if the Accelerometer +> has already finished its last run. So i want to write to the Accel= erometer although he wasn't ready for that. +> With sysctlbyname i got no errors because the program itself took = enough time between the calls. +> I now check if the previous command has finished before i send the= new one so the double call isn't +> a probleme anymore. +> Any questions? You can recognize if userland is asking for the size only by checking if req->oldptr is NULL. If it is, you should only return size, if it is !=3D N= ULL then it requests for real data. BTW. Do you have any docs for what you are doing? Or can you send me what you got already? I was interested in this subject as well, but didn't found anything useful. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --3yNHWXBV/QO9xKNm Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFDAfguForvXbEpPzQRAryPAKDlbatvQesnPIATjEjubSieLtv3nwCfXJaZ QSU6G93hw020ZljtTNMeFHk= =87Wc -----END PGP SIGNATURE----- --3yNHWXBV/QO9xKNm-- From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 20:00:15 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2330116A41F for ; Tue, 16 Aug 2005 20:00:15 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7191243D45 for ; Tue, 16 Aug 2005 20:00:13 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c220-239-19-236.belrs4.nsw.optusnet.com.au [220.239.19.236]) by mail27.syd.optusnet.com.au (8.12.11/8.12.11) with ESMTP id j7GK0BNg023512 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 17 Aug 2005 06:00:12 +1000 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1]) by cirb503493.alcatel.com.au (8.12.10/8.12.10) with ESMTP id j7GK0BSR026740; Wed, 17 Aug 2005 06:00:11 +1000 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost) by cirb503493.alcatel.com.au (8.12.10/8.12.9/Submit) id j7GK0AmM026739; Wed, 17 Aug 2005 06:00:10 +1000 (EST) (envelope-from pjeremy) Date: Wed, 17 Aug 2005 06:00:10 +1000 From: Peter Jeremy To: Luigi Rizzo Message-ID: <20050816200010.GJ13959@cirb503493.alcatel.com.au> References: <42F9ECF2.8080809@freebsd.org> <200508101638.27087.jhb@FreeBSD.org> <42FA6E0E.4070205@samsco.org> <200508111121.46546.jhb@FreeBSD.org> <20050816051231.D66550@xorpc.icir.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050816051231.D66550@xorpc.icir.org> User-Agent: Mutt/1.4.2i Cc: freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 20:00:15 -0000 On Tue, 2005-Aug-16 05:12:31 -0700, Luigi Rizzo wrote: >reading this thread, and at times looking at some of the kernel code, >with plenty of places where you have to drop a lock that you >already have, do some small thing and then reacquire the lock itself, >makes me wonder if we don't need a better mechanism/abtraction for >this kind of programming. There are two distinct cases where this is done: 1) The thread needs to call another function which can potentially sleep and therefore needs to drop any locks it is holding before calling the function and re-acquire them later. 2) The thread needs to execute for an excessive period whilst holding the lock and explicitly releases it occasionally to allow other threads to run (eg much of the vnode/buffer/page scanning code). For the second case, we could create a function that checked if another thread was waiting on the lock and, if so, released the lock and grabbed it again after sleeping. This is fairly easy for a single lock but would be quite messy to implement for multiple locks - since you need to release/acquire them in the correct order to prevent deadlocks. The first case again seems easy at first glance: Add a "I'm happy to release this lock" flag to each lock. Before calling a function that might block, you set the flag. When the function calls sleep(9), the code releases the locks and re-acquires them before returning. There are three large gotchas: 1) Since sleeping might have changed the state of some structures that the thread is working on, it needs to re-validate its internal state to ensure that it's not working with stale data. This means the thread needs to check if it slept or not after each point where it could potentially sleep - and I suspect this amounts to most of the non-boilerplate code in the existing re-acquire lock section. 2) You have to ensure that either the release/acquire is atomic for all locks or the ordering is correct. 3) You need some way to pass back a permanent failure to re-acquire a requested lock (maybe a piece of hardware went away). >In a way, this seems similar to the handling of interrupts: >if we want a thread to be interrupted we don't check for interrupts >(and save and restore state) explicitly at every instruction, but >rely on the processor doing the right thing for us. In the case of locks, restoring the previous state may not be correct. Whilst I hold a lock on object foo, I know its state will remain unchanged (unless I change it) and can therefore safely cache information about the object. If I release/reacquire the lock, the object may have changed in arbitrary ways and I need to ensure that any cached information is still correct (and correct it if it's not). Overall, you could probably make the code cleaner (and maybe more efficient) but you can't make it totally transparent. -- Peter Jeremy From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 21:19:45 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 81C2F16A41F for ; Tue, 16 Aug 2005 21:19:45 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 204D843D45 for ; Tue, 16 Aug 2005 21:19:45 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Tue, 16 Aug 2005 17:34:41 -0400 From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 16 Aug 2005 17:16:43 -0400 User-Agent: KMail/1.8 References: <42F9ECF2.8080809@freebsd.org> <20050816051231.D66550@xorpc.icir.org> <20050816200010.GJ13959@cirb503493.alcatel.com.au> In-Reply-To: <20050816200010.GJ13959@cirb503493.alcatel.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508161716.44597.jhb@FreeBSD.org> Cc: Peter Jeremy Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 21:19:45 -0000 On Tuesday 16 August 2005 04:00 pm, Peter Jeremy wrote: > On Tue, 2005-Aug-16 05:12:31 -0700, Luigi Rizzo wrote: > >reading this thread, and at times looking at some of the kernel code, > >with plenty of places where you have to drop a lock that you > >already have, do some small thing and then reacquire the lock itself, > >makes me wonder if we don't need a better mechanism/abtraction for > >this kind of programming. > > There are two distinct cases where this is done: > 1) The thread needs to call another function which can potentially sleep > and therefore needs to drop any locks it is holding before calling > the function and re-acquire them later. > 2) The thread needs to execute for an excessive period whilst holding > the lock and explicitly releases it occasionally to allow other > threads to run (eg much of the vnode/buffer/page scanning code). > > For the second case, we could create a function that checked if > another thread was waiting on the lock and, if so, released the lock > and grabbed it again after sleeping. This is fairly easy for a single > lock but would be quite messy to implement for multiple locks - since > you need to release/acquire them in the correct order to prevent > deadlocks. > > The first case again seems easy at first glance: Add a "I'm happy > to release this lock" flag to each lock. Before calling a function > that might block, you set the flag. When the function calls sleep(9), > the code releases the locks and re-acquires them before returning. > There are three large gotchas: > 1) Since sleeping might have changed the state of some structures > that the thread is working on, it needs to re-validate its > internal state to ensure that it's not working with stale data. > This means the thread needs to check if it slept or not after > each point where it could potentially sleep - and I suspect > this amounts to most of the non-boilerplate code in the existing > re-acquire lock section. > 2) You have to ensure that either the release/acquire is atomic for > all locks or the ordering is correct. > 3) You need some way to pass back a permanent failure to re-acquire > a requested lock (maybe a piece of hardware went away). We had a thought about this at USENIX ATC two years ago in Boston. The idea would be that each thread had an array of say 4 mutex pointers. You would have two new mutex functions: mtx_unlock_if_sleep(); mtx_lock_if_slept(); mtx_unlock_if_sleep() would add the mutex to the per-thread array. If the array was full, it would unlock the lock. If you ever slept, you would unlock the locks in the order that they are in the array. The mtx_lock_if_slept() function would lock the mutex if it wasn't in the array due to an overflow or if the thread had slept and would remove the mutex from the array if it was in there. Note that if you don't carefully balance the calls you could create trouble but some assertions could help catch that. Programmatically, the programmer would have to think of these just as if it was a normal mtx_unlock() and mtx_lock() pair. All it does is avoid dropping the lock unless you actually block. Thus, one case might be: mtx_lock(&foo); ... mtx_unlock_if_sleep(&foo); malloc(, M_WAITOK, ); mtx_lock_if_slept(&foo); ... mtx_unlock(&foo); Note that as far as handling races, etc. though it would be the same as mtx_lock(&foo); ... mtx_unlock(&foo); malloc(, M_WAITOK, ); mtx_lock(&foo); ... mtx_unlock(&foo); That is, it doesn't simplify handling of races at all or eliminate any races and if anything is even more confusing, so I'm not sure if it would really buy much and if what it bought would be worth the potential for confusion and folks thinking it did eliminate races thus resulting in bugs. I think one of the biggest things to help make things saner in general is to queue up as much work as possible while holding the lock and defer things that need to happen outside of the lock so you avoid lots of unlock/lock pairs. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Tue Aug 16 23:22:42 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2579416A41F; Tue, 16 Aug 2005 23:22:42 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id C634F43D46; Tue, 16 Aug 2005 23:22:41 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7GNMfWx074218; Tue, 16 Aug 2005 16:22:41 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7GNMf2u074217; Tue, 16 Aug 2005 16:22:41 -0700 (PDT) (envelope-from rizzo) Date: Tue, 16 Aug 2005 16:22:41 -0700 From: Luigi Rizzo To: John Baldwin Message-ID: <20050816162241.A74005@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <20050816051231.D66550@xorpc.icir.org> <20050816200010.GJ13959@cirb503493.alcatel.com.au> <200508161716.44597.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508161716.44597.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Tue, Aug 16, 2005 at 05:16:43PM -0400 Cc: Peter Jeremy , freebsd-arch@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2005 23:22:42 -0000 ok just for the records, the issue i had in mind is the release/acquire the mutex around some code that might cause a deadlock, not for the mutex ops per se, but for the need to make sure that the data structure is consistent before releasing the lock, and rechecking the state afterwards. Basically: 1 mtx_lock(&foo) 2 .. work on obj_foo 3 .. make obj_foo consistent 4 mtx_unlock(&foo) 5 f() 6 mtx_lock(&foo) 7 .. revalidate state of obj_foo 8 .. more work where f() is the call that might sleep or cause a deadlock. In cases where f() has a low probability of blocking, we could save a bit of work at runtime by writing the code as below: 1 mtx_lock(&foo) 2 .. work on obj_foo if (try_f_without_blocking() == EWOULDBLOCK) { 3 .. make obj_foo consistent 4 mtx_unlock(&foo) 5 f() 6 mtx_lock(&foo) 7 .. revalidate state of obj_foo } 8 .. more work (where try_f_without_blocking() is a version of f() that returns EWOULDBLOCK in case the 'other' lock is busy). Here maybe we would benefit by some support (macros, whatever) that permits us to specify in a compact way what to do around f() should it become blocking. On the other hand, maybe the instances of code as the one above are so rare that there is hardly any need for that. cheers luigi On Tue, Aug 16, 2005 at 05:16:43PM -0400, John Baldwin wrote: > On Tuesday 16 August 2005 04:00 pm, Peter Jeremy wrote: > > On Tue, 2005-Aug-16 05:12:31 -0700, Luigi Rizzo wrote: > > >reading this thread, and at times looking at some of the kernel code, > > >with plenty of places where you have to drop a lock that you > > >already have, do some small thing and then reacquire the lock itself, > > >makes me wonder if we don't need a better mechanism/abtraction for > > >this kind of programming. > > > > There are two distinct cases where this is done: > > 1) The thread needs to call another function which can potentially sleep > > and therefore needs to drop any locks it is holding before calling > > the function and re-acquire them later. > > 2) The thread needs to execute for an excessive period whilst holding > > the lock and explicitly releases it occasionally to allow other > > threads to run (eg much of the vnode/buffer/page scanning code). > > > > For the second case, we could create a function that checked if > > another thread was waiting on the lock and, if so, released the lock > > and grabbed it again after sleeping. This is fairly easy for a single > > lock but would be quite messy to implement for multiple locks - since > > you need to release/acquire them in the correct order to prevent > > deadlocks. > > > > The first case again seems easy at first glance: Add a "I'm happy > > to release this lock" flag to each lock. Before calling a function > > that might block, you set the flag. When the function calls sleep(9), > > the code releases the locks and re-acquires them before returning. > > There are three large gotchas: > > 1) Since sleeping might have changed the state of some structures > > that the thread is working on, it needs to re-validate its > > internal state to ensure that it's not working with stale data. > > This means the thread needs to check if it slept or not after > > each point where it could potentially sleep - and I suspect > > this amounts to most of the non-boilerplate code in the existing > > re-acquire lock section. > > 2) You have to ensure that either the release/acquire is atomic for > > all locks or the ordering is correct. > > 3) You need some way to pass back a permanent failure to re-acquire > > a requested lock (maybe a piece of hardware went away). > > We had a thought about this at USENIX ATC two years ago in Boston. The idea > would be that each thread had an array of say 4 mutex pointers. You would > have two new mutex functions: > > mtx_unlock_if_sleep(); > mtx_lock_if_slept(); > > mtx_unlock_if_sleep() would add the mutex to the per-thread array. If the > array was full, it would unlock the lock. If you ever slept, you would > unlock the locks in the order that they are in the array. The > mtx_lock_if_slept() function would lock the mutex if it wasn't in the array > due to an overflow or if the thread had slept and would remove the mutex from > the array if it was in there. Note that if you don't carefully balance the > calls you could create trouble but some assertions could help catch that. > Programmatically, the programmer would have to think of these just as if it > was a normal mtx_unlock() and mtx_lock() pair. All it does is avoid dropping > the lock unless you actually block. Thus, one case might be: > > mtx_lock(&foo); > ... > mtx_unlock_if_sleep(&foo); > malloc(, M_WAITOK, ); > mtx_lock_if_slept(&foo); > ... > mtx_unlock(&foo); > > Note that as far as handling races, etc. though it would be the same as > > mtx_lock(&foo); > ... > mtx_unlock(&foo); > malloc(, M_WAITOK, ); > mtx_lock(&foo); > ... > mtx_unlock(&foo); > > That is, it doesn't simplify handling of races at all or eliminate any races > and if anything is even more confusing, so I'm not sure if it would really > buy much and if what it bought would be worth the potential for confusion and > folks thinking it did eliminate races thus resulting in bugs. > > I think one of the biggest things to help make things saner in general is to > queue up as much work as possible while holding the lock and defer things > that need to happen outside of the lock so you avoid lots of unlock/lock > pairs. > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 00:05:20 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8747216A41F; Wed, 17 Aug 2005 00:05:20 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5179F43D45; Wed, 17 Aug 2005 00:05:20 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7H05JNR075086; Tue, 16 Aug 2005 17:05:19 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7H05JnM075085; Tue, 16 Aug 2005 17:05:19 -0700 (PDT) (envelope-from rizzo) Date: Tue, 16 Aug 2005 17:05:19 -0700 From: Luigi Rizzo To: arch@freebsd.org, net@freebsd.org Message-ID: <20050816170519.A74422@xorpc.icir.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Cc: Subject: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 00:05:20 -0000 [apologies for the cross post but it belongs both to arch and net.] I notice that net/pfil.c and netinet/ip_fw2.c have two copies of aisimilar but slightly different implementation of multiple-reader/single-writer locks, which brings up the question(s): 1. should we rather put this code in the generic kernel code so that other subsystems could make use of it ? E.g. the routing table is certainly a candidate, and especially 2. should we implement it right ? Both implementations are subject to starvation for the writers (which is indeed a problem here, because we might want to modify a ruleset and be prevented from doing it because of incoming traffic that keeps readers active). Also the PFIL_TRY_WLOCK will in fact be blocking if a writer is already in - i have no idea how problematic is this in the way it is actually used. cheers luigi From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 01:01:54 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F35816A423; Wed, 17 Aug 2005 01:01:54 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 671ED43D48; Wed, 17 Aug 2005 01:01:53 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j7H11pkk016746; Wed, 17 Aug 2005 11:01:51 +1000 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j7H11nXp011889; Wed, 17 Aug 2005 11:01:50 +1000 Date: Wed, 17 Aug 2005 11:01:49 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Pawel Jakub Dawidek In-Reply-To: <20050816133307.GD3944@garage.freebsd.pl> Message-ID: <20050817104144.L1361@epsplex.bde.org> References: <20050816221033.C47830@delplex.bde.org> <20050816133307.GD3944@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: m.ehinger@ltur.de, freebsd-arch@freebsd.org Subject: Re: sysctl_proc calls handler twice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 01:01:54 -0000 On Tue, 16 Aug 2005, Pawel Jakub Dawidek wrote: > On Tue, Aug 16, 2005 at 11:17:21PM +1000, Bruce Evans wrote: > +> No, just don't call it twice like sysctl(8) if you know the size in advance > +> or from a previous call (and know that it won't change or handle the error > +> from it changing...). > > Thread's author, as I understand/guess it, represents kernel side. > He doesn't want his handler to be called twice and tools like sysctl(8) > are not able to know size of every sysctl, so he just has to be ready > that his handler will be called twice. [It turned out to be a race problem] Well, any handler may be called any number of times in any order, including possibly concurrently. All sysctls are still Giant-locked so concurent calls can't actually happen, but making a separate call to determine the size gives races anyway if the size can change. Applications should loop if the size that they got turns out to be insufficent. sysctl(8) doesn't do this -- it doubles the size but doesn't retry. Writing requires even more care. Bruce From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 02:35:39 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 77F2C16A41F; Wed, 17 Aug 2005 02:35:39 +0000 (GMT) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.FreeBSD.org (Postfix) with ESMTP id D746843D46; Wed, 17 Aug 2005 02:35:38 +0000 (GMT) (envelope-from max@love2party.net) Received: from p54A3D793.dip.t-dialin.net [84.163.215.147] (helo=donor.laier.local) by mrelayeu.kundenserver.de with ESMTP (Nemesis), id 0ML29c-1E5DmC3q09-0005Ri; Wed, 17 Aug 2005 04:35:36 +0200 From: Max Laier To: Luigi Rizzo Date: Wed, 17 Aug 2005 04:35:19 +0200 User-Agent: KMail/1.8.2 References: <20050816170519.A74422@xorpc.icir.org> In-Reply-To: <20050816170519.A74422@xorpc.icir.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1156185.pn3pmraMmZ"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200508170435.34688.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:61c499deaeeba3ba5be80f48ecc83056 Cc: arch@freebsd.org, net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 02:35:39 -0000 --nextPart1156185.pn3pmraMmZ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > [apologies for the cross post but it belongs both to arch and net.] > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of > aisimilar but slightly different implementation of > multiple-reader/single-writer locks, which brings up the question(s): > > 1. should we rather put this code in the generic kernel code so that other > subsystems could make use of it ? E.g. the routing table is certainly > a candidate, I have asked this several time on -arch and IRC, but never found anyone=20 willing to pursue it. However, the problem is ... > and especially > > 2. should we implement it right ? > > Both implementations are subject to starvation for the writers > (which is indeed a problem here, because we might want to modify > a ruleset and be prevented from doing it because of incoming traffic > that keeps readers active). > Also the PFIL_TRY_WLOCK will in fact be blocking if a writer > is already in - i have no idea how problematic is this in the > way it is actually used. =2E.. really this. I didn't find a clean way out of the starvation issue. = What=20 I do for pfil is that I set a flag and simply stop serving[2] shared reques= ts=20 once a writer waits for the lock. If a writer can't sleep[1] then we retur= n=20 EBUSY and don't. However, for pfil it's almost ever safe to assume that a= =20 write may sleep (as it is for most instances of this kind of sx-lock where= =20 you have BIGNUMxreads:1xwrite). [1] Note that there is a *big* difference between blocking and sleeping. =20 These two are usually confused. While it is almost always okay to block it= =20 is seldom okay to sleep. The existing sx(9) api has the problem that it=20 *sleeps* in the shared path which renders it unusable for this usecase (as = we=20 might be holding other locks and must not sleep in the shared path). =20 However, sleeping in the shared path is one (?the only?) way out of the=20 starvation problem - other than a problem specific as done for pfil. [2] See pfil(9) BUGS. =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart1156185.pn3pmraMmZ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQBDAqJ2XyyEoT62BG0RAgM9AJ4kzFxHhG6gUCKDFwfaxNL4NeprdACfSzoW X33PNJnt6EzhMiEntWkt79A= =Ce2y -----END PGP SIGNATURE----- --nextPart1156185.pn3pmraMmZ-- From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 03:20:19 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ADE7616A41F; Wed, 17 Aug 2005 03:20:19 +0000 (GMT) (envelope-from julian@elischer.org) Received: from delight.idiom.com (delight.idiom.com [216.240.32.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6A30B43D48; Wed, 17 Aug 2005 03:20:19 +0000 (GMT) (envelope-from julian@elischer.org) Received: from idiom.com (idiom.com [216.240.32.1]) by delight.idiom.com (Postfix) with ESMTP id 4A6E4208C8A; Tue, 16 Aug 2005 20:20:19 -0700 (PDT) Received: from [192.168.2.2] (home.elischer.org [216.240.48.38]) by idiom.com (8.12.11/8.12.11) with ESMTP id j7H3KH5E040622; Tue, 16 Aug 2005 20:20:18 -0700 (PDT) (envelope-from julian@elischer.org) Message-ID: <4302ACF1.6050209@elischer.org> Date: Tue, 16 Aug 2005 20:20:17 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050424 X-Accept-Language: en, hu MIME-Version: 1.0 To: Max Laier References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> In-Reply-To: <200508170435.34688.max@love2party.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: net@freebsd.org, arch@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 03:20:19 -0000 Max Laier wrote: > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > >>[apologies for the cross post but it belongs both to arch and net.] >> >>I notice that net/pfil.c and netinet/ip_fw2.c have two copies of >>aisimilar but slightly different implementation of >>multiple-reader/single-writer locks, which brings up the question(s): >> >>1. should we rather put this code in the generic kernel code so that other >> subsystems could make use of it ? E.g. the routing table is certainly >> a candidate, > > > I have asked this several time on -arch and IRC, but never found anyone > willing to pursue it. However, the problem is ... > > >>and especially >> >>2. should we implement it right ? >> >> Both implementations are subject to starvation for the writers >> (which is indeed a problem here, because we might want to modify >> a ruleset and be prevented from doing it because of incoming traffic >> that keeps readers active). >> Also the PFIL_TRY_WLOCK will in fact be blocking if a writer >> is already in - i have no idea how problematic is this in the >> way it is actually used. > > > ... really this. I didn't find a clean way out of the starvation issue. What > I do for pfil is that I set a flag and simply stop serving[2] shared requests > once a writer waits for the lock. If a writer can't sleep[1] then we return > EBUSY and don't. However, for pfil it's almost ever safe to assume that a > write may sleep (as it is for most instances of this kind of sx-lock where > you have BIGNUMxreads:1xwrite). > > [1] Note that there is a *big* difference between blocking and sleeping. > These two are usually confused. While it is almost always okay to block it > is seldom okay to sleep. The existing sx(9) api has the problem that it > *sleeps* in the shared path which renders it unusable for this usecase (as we > might be holding other locks and must not sleep in the shared path). > However, sleeping in the shared path is one (?the only?) way out of the > starvation problem - other than a problem specific as done for pfil. > > [2] See pfil(9) BUGS. netgraph has yet another implementation of R/W locks. It relies on the fact that every lock action is done on behalf of a command request or a data processing request, each of which is queueable, and each RW lock is associated with a queue. Instead of blocking, the item is queued instead for later processing. > From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 04:03:44 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 842CE16A41F for ; Wed, 17 Aug 2005 04:03:44 +0000 (GMT) (envelope-from gnn@neville-neil.com) Received: from mrout2.yahoo.com (mrout2.yahoo.com [216.145.54.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0EA3943D48 for ; Wed, 17 Aug 2005 04:03:43 +0000 (GMT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (proxy8.corp.yahoo.com [216.145.48.13]) by mrout2.yahoo.com (8.13.4/8.13.4/y.out) with ESMTP id j7H42BDZ076984; Tue, 16 Aug 2005 21:02:11 -0700 (PDT) Date: Wed, 17 Aug 2005 11:13:40 +0900 Message-ID: From: gnn@freebsd.org To: Luigi Rizzo In-Reply-To: <20050816162241.A74005@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <20050816051231.D66550@xorpc.icir.org> <20050816200010.GJ13959@cirb503493.alcatel.com.au> <200508161716.44597.jhb@FreeBSD.org> User-Agent: Wanderlust/2.12.2 (99 Luftballons) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (=?ISO-8859-4?Q?Sanj=F2?=) APEL/10.6 Emacs/21.3.50 (powerpc-apple-darwin8.1.0) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 04:03:44 -0000 At Tue, 16 Aug 2005 16:22:41 -0700, luigi wrote: > > ok just for the records, the issue i had in mind is the release/acquire > the mutex around some code that might cause a deadlock, not for the > mutex ops per se, but for the need to make sure that the data > structure is consistent before releasing the lock, and rechecking > the state afterwards. Basically: > > 1 mtx_lock(&foo) > 2 .. work on obj_foo > 3 .. make obj_foo consistent > 4 mtx_unlock(&foo) > 5 f() > 6 mtx_lock(&foo) > 7 .. revalidate state of obj_foo > 8 .. more work > > where f() is the call that might sleep or cause a deadlock. > In cases where f() has a low probability of blocking, we could > save a bit of work at runtime by writing the code as below: > > 1 mtx_lock(&foo) > 2 .. work on obj_foo > if (try_f_without_blocking() == EWOULDBLOCK) { > 3 .. make obj_foo consistent > 4 mtx_unlock(&foo) > 5 f() > 6 mtx_lock(&foo) > 7 .. revalidate state of obj_foo > } > 8 .. more work > > (where try_f_without_blocking() is a version of f() that returns > EWOULDBLOCK in case the 'other' lock is busy). > Here maybe we would benefit by some support (macros, whatever) that > permits us to specify in a compact way what to do around f() should > it become blocking. > > On the other hand, maybe the instances of code as the one above > are so rare that there is hardly any need for that. > Correct me if I'm wrong but I suspect you're thinking of cases such as: RT_UNLOCK(rt); /* XXX workaround LOR */ gwrt = rtalloc1(gate, 1, 0); RT_LOCK(rt); in the routing/networking code. A quick, and by no means exhaustive, check of CURRENT with cscope and Emacs looking for these turned up 3. It may still be something to look at, but perhaps mroe from a point of view of cleaning up our APIs to not do this kind of jiggery pokery, rather than inventing a new locking paradigm, which is fraught with peril. This isn't exhaustive though, if others can show this kind of thing being done a lot, or becoming a idiom in our code, then there is more cause for concern. Just my 2 yen :-) Later, George From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 08:46:30 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 605A216A41F; Wed, 17 Aug 2005 08:46:30 +0000 (GMT) (envelope-from wilkinsa@squash.dsto.defence.gov.au) Received: from digger1.defence.gov.au (digger1.defence.gov.au [203.5.217.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9076C43D45; Wed, 17 Aug 2005 08:46:28 +0000 (GMT) (envelope-from wilkinsa@squash.dsto.defence.gov.au) Received: from ednmsw501.dsto.defence.gov.au (ednmsw501.dsto.defence.gov.au [131.185.2.150]) by digger1.defence.gov.au with ESMTP id j7H8iYRb025943; Wed, 17 Aug 2005 18:14:34 +0930 (CST) Received: from muttley.dsto.defence.gov.au (unverified) by ednmsw501.dsto.defence.gov.au (Content Technologies SMTPRS 4.3.17) with ESMTP id ; Wed, 17 Aug 2005 18:16:22 +0930 Received: from ednex501.dsto.defence.gov.au (ednex501.dsto.defence.gov.au [131.185.2.81]) by muttley.dsto.defence.gov.au (8.11.3/8.11.3) with ESMTP id j7H8iW015518; Wed, 17 Aug 2005 18:14:32 +0930 (CST) Received: from squash.dsto.defence.gov.au ([131.185.40.212]) by ednex501.dsto.defence.gov.au with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id QX6CR7K5; Wed, 17 Aug 2005 18:14:23 +0930 Received: from squash.dsto.defence.gov.au (localhost [127.0.0.1]) by squash.dsto.defence.gov.au (8.13.3/8.13.3) with ESMTP id j7H8jKFw027321; Wed, 17 Aug 2005 18:15:20 +0930 (CST) (envelope-from wilkinsa@squash.dsto.defence.gov.au) Received: (from wilkinsa@localhost) by squash.dsto.defence.gov.au (8.13.3/8.13.3/Submit) id j7H8jIPT027320; Wed, 17 Aug 2005 18:15:18 +0930 (CST) (envelope-from wilkinsa) Date: Wed, 17 Aug 2005 18:15:18 +0930 From: "Wilkinson, Alex" To: Colin Percival Message-ID: <20050817084518.GG25467@squash.dsto.defence.gov.au> Mail-Followup-To: Colin Percival , Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , freebsd-arch@freebsd.org References: <42F62C5F.6000609@freebsd.org> <20050807.101746.68985623.imp@bsdimp.com> <42F636BE.3020906@freebsd.org> <8664ub4bp3.fsf@xps.des.no> <42FCA675.7090300@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <42FCA675.7090300@freebsd.org> User-Agent: Mutt/1.5.9i Cc: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , freebsd-arch@freebsd.org Subject: Re: Adding portsnap to the base system X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 08:46:30 -0000 0n Fri, Aug 12, 2005 at 06:39:01AM -0700, Colin Percival wrote: >Dag-Erling Smørgrav wrote: >> Colin Percival writes: >>>Yes, pipelined HTTP. Basically, I spent six months on-and-off, and >>>at least two weeks of actual work, trying to fit pipelined HTTP into >>>fetch(3)... but the design of that library is all around the idea of >>>fetching a single file at once. In the end I gave up and wrote my >>>own code (phttpget) in under 24 hours. >> >> You are mistaken. Pipelined HTTP can be implemented in libfetch with >> the same ease (and the same limitations) as FTP connection caching, >> which was included from the start. > >Well, err... go ahead, then. I'm not going to tell the author of a >library that his library can't be modified to include a feature; all >I can do is point out that my best efforts were insufficient. > >I can see that it would be very easy to implement _persistent_ HTTP, >but implementing _pipelined_ HTTP is quite a different matter... erm ... what is meant by "_pipelined_ HTTP" ? - aW From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 09:43:21 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 980BB16A41F for ; Wed, 17 Aug 2005 09:43:21 +0000 (GMT) (envelope-from cperciva@freebsd.org) Received: from pd3mo3so.prod.shaw.ca (shawidc-mo1.cg.shawcable.net [24.71.223.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id D3C0F43D48 for ; Wed, 17 Aug 2005 09:43:20 +0000 (GMT) (envelope-from cperciva@freebsd.org) Received: from pd4mr4so.prod.shaw.ca (pd4mr4so-qfe3.prod.shaw.ca [10.0.141.215]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0ILD00AIK1HJRRB0@l-daemon> for freebsd-arch@freebsd.org; Wed, 17 Aug 2005 03:39:19 -0600 (MDT) Received: from pn2ml10so.prod.shaw.ca ([10.0.121.80]) by pd4mr4so.prod.shaw.ca (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0ILD004K71HJRGL0@pd4mr4so.prod.shaw.ca> for freebsd-arch@freebsd.org; Wed, 17 Aug 2005 03:39:19 -0600 (MDT) Received: from [192.168.0.60] (S0106006067227a4a.vc.shawcable.net [24.87.209.6]) by l-daemon (iPlanet Messaging Server 5.2 HotFix 1.18 (built Jul 28 2003)) with ESMTP id <0ILD0093H1HIZ1@l-daemon> for freebsd-arch@freebsd.org; Wed, 17 Aug 2005 03:39:19 -0600 (MDT) Date: Wed, 17 Aug 2005 02:39:18 -0700 From: Colin Percival In-reply-to: <20050817084518.GG25467@squash.dsto.defence.gov.au> To: "Wilkinson, Alex" Message-id: <430305C6.2060407@freebsd.org> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: 7bit X-Accept-Language: en-us, en References: <42F62C5F.6000609@freebsd.org> <20050807.101746.68985623.imp@bsdimp.com> <42F636BE.3020906@freebsd.org> <8664ub4bp3.fsf@xps.des.no> <42FCA675.7090300@freebsd.org> <20050817084518.GG25467@squash.dsto.defence.gov.au> User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050724) Cc: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= , freebsd-arch@freebsd.org Subject: Re: Adding portsnap to the base system X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 09:43:21 -0000 Wilkinson, Alex wrote: > 0n Fri, Aug 12, 2005 at 06:39:01AM -0700, Colin Percival wrote: > >I can see that it would be very easy to implement _persistent_ HTTP, > >but implementing _pipelined_ HTTP is quite a different matter... > > erm ... what is meant by "_pipelined_ HTTP" ? I use the word in the sense that it is used in section 8.1.2.2 of RFC 2616: A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). Colin Percival From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 08:48:07 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0512316A41F; Wed, 17 Aug 2005 08:48:07 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C25A43D48; Wed, 17 Aug 2005 08:48:06 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 5296D52C2F; Wed, 17 Aug 2005 10:48:04 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 6DA3052BC4; Wed, 17 Aug 2005 10:47:59 +0200 (CEST) Date: Wed, 17 Aug 2005 10:47:49 +0200 From: Pawel Jakub Dawidek To: Doug Barton Message-ID: <20050817084749.GC11066@garage.freebsd.pl> References: <200508120005.j7C05ARc090857@repoman.freebsd.org> <20050815053757.GB2660@green.homeunix.org> <20050815070033.GA8368@garage.freebsd.pl> <20050815125814.GC2660@green.homeunix.org> <20050816081644.GA3944@garage.freebsd.pl> <1124182906.2492.4.camel@buffy.york.ac.uk> <20050816095217.GB3944@garage.freebsd.pl> <43028269.50904@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="V88s5gaDVPzZ0KCq" Content-Disposition: inline In-Reply-To: <43028269.50904@FreeBSD.org> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng devel (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 X-Mailman-Approved-At: Wed, 17 Aug 2005 11:33:23 +0000 Cc: src-committers@freebsd.org, Gavin Atkinson , cvs-src@freebsd.org, cvs-all@freebsd.org, freebsd-arch@FreeBSD.org Subject: Re: cvs commit: src/sys/geom/label g_label.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-arch@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 08:48:07 -0000 --V88s5gaDVPzZ0KCq Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 16, 2005 at 05:18:49PM -0700, Doug Barton wrote: +> Pawel Jakub Dawidek wrote: +>=20 +> >Because '/' creates a directory and I want each label to be represented +> >only by one file. +>=20 +> I think what people are saying is that they like the directory creating = behavior. Can you explain your rationale in more detail? Actually, I don't really care. All I wanted was one label to be represented by one single file. That's all. For me, leaving it as it is just asks for troubles. I can live without this change, really. This is something I'd like to ask about our TRB, but unfortunately it was retired yesterday:) CCing to freebsd-arch@. The question(s) is(are): Should we allow '/' in labels or should we replace it with something (eg. '_')? Maybe we should only deny labels with '/../'? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --V88s5gaDVPzZ0KCq Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFDAvm0ForvXbEpPzQRAodbAKDms13rd73qr3sVDoSkN3SB518F9ACfeYnM NbmOTZcBS4nYKm/7R96Y4N0= =9tS+ -----END PGP SIGNATURE----- --V88s5gaDVPzZ0KCq-- From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 12:49:36 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E59E316A41F for ; Wed, 17 Aug 2005 12:49:35 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mail28.sea5.speakeasy.net (mail28.sea5.speakeasy.net [69.17.117.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37FD243D46 for ; Wed, 17 Aug 2005 12:49:35 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 11481 invoked from network); 17 Aug 2005 12:49:35 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender ) by mail28.sea5.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 17 Aug 2005 12:49:33 -0000 Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j7HCnOh8034383; Wed, 17 Aug 2005 08:49:24 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Wed, 17 Aug 2005 08:36:04 -0400 User-Agent: KMail/1.8 References: <42F9ECF2.8080809@freebsd.org> <200508161716.44597.jhb@FreeBSD.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200508170836.05861.jhb@FreeBSD.org> X-Spam-Status: No, score=-2.8 required=4.2 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx Cc: gnn@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 12:49:36 -0000 On Tuesday 16 August 2005 10:13 pm, gnn@freebsd.org wrote: > At Tue, 16 Aug 2005 16:22:41 -0700, > > luigi wrote: > > ok just for the records, the issue i had in mind is the release/acquire > > the mutex around some code that might cause a deadlock, not for the > > mutex ops per se, but for the need to make sure that the data > > structure is consistent before releasing the lock, and rechecking > > the state afterwards. Basically: > > > > 1 mtx_lock(&foo) > > 2 .. work on obj_foo > > 3 .. make obj_foo consistent > > 4 mtx_unlock(&foo) > > 5 f() > > 6 mtx_lock(&foo) > > 7 .. revalidate state of obj_foo > > 8 .. more work > > > > where f() is the call that might sleep or cause a deadlock. > > In cases where f() has a low probability of blocking, we could > > save a bit of work at runtime by writing the code as below: > > > > 1 mtx_lock(&foo) > > 2 .. work on obj_foo > > if (try_f_without_blocking() =3D=3D EWOULDBLOCK) { > > 3 .. make obj_foo consistent > > 4 mtx_unlock(&foo) > > 5 f() > > 6 mtx_lock(&foo) > > 7 .. revalidate state of obj_foo > > } > > 8 .. more work > > > > (where try_f_without_blocking() is a version of f() that returns > > EWOULDBLOCK in case the 'other' lock is busy). > > Here maybe we would benefit by some support (macros, whatever) that > > permits us to specify in a compact way what to do around f() should > > it become blocking. > > > > On the other hand, maybe the instances of code as the one above > > are so rare that there is hardly any need for that. > > Correct me if I'm wrong but I suspect you're thinking of cases such > as: > > RT_UNLOCK(rt); /* XXX workaround LOR */ > gwrt =3D rtalloc1(gate, 1, 0); > RT_LOCK(rt); > > in the routing/networking code. A quick, and by no means exhaustive, > check of CURRENT with cscope and Emacs looking for these turned up 3. > It may still be something to look at, but perhaps mroe from a point of > view of cleaning up our APIs to not do this kind of jiggery pokery, > rather than inventing a new locking paradigm, which is fraught with > peril. > > This isn't exhaustive though, if others can show this kind of thing > being done a lot, or becoming a idiom in our code, then there is more > cause for concern. > > Just my 2 yen :-) Agreed. I'd rather that we fix our APIs and try to organize the workflow t= o=20 minimize the number of times that a lock has to be dropped and then=20 reacquired. =2D-=20 John Baldwin =A0<>< =A0http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" =A0=3D =A0http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 16:33:34 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6365416A41F; Wed, 17 Aug 2005 16:33:34 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1CE0D43D45; Wed, 17 Aug 2005 16:33:34 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7HGXWQD085690; Wed, 17 Aug 2005 09:33:32 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7HGXWCM085689; Wed, 17 Aug 2005 09:33:32 -0700 (PDT) (envelope-from rizzo) Date: Wed, 17 Aug 2005 09:33:32 -0700 From: Luigi Rizzo To: John Baldwin Message-ID: <20050817093332.A85431@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <200508161716.44597.jhb@FreeBSD.org> <200508170836.05861.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508170836.05861.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Wed, Aug 17, 2005 at 08:36:04AM -0400 Cc: gnn@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 16:33:34 -0000 On Wed, Aug 17, 2005 at 08:36:04AM -0400, John Baldwin wrote: > On Tuesday 16 August 2005 10:13 pm, gnn@freebsd.org wrote: > > At Tue, 16 Aug 2005 16:22:41 -0700, > > > > luigi wrote: > > > ok just for the records, the issue i had in mind is the release/acquire > > > the mutex around some code that might cause a deadlock, not for the > > > mutex ops per se, but for the need to make sure that the data > > > structure is consistent before releasing the lock, and rechecking ... > > Correct me if I'm wrong but I suspect you're thinking of cases such > > as: > > > > RT_UNLOCK(rt); /* XXX workaround LOR */ > > gwrt = rtalloc1(gate, 1, 0); > > RT_LOCK(rt); > > > > in the routing/networking code. A quick, and by no means exhaustive, > > check of CURRENT with cscope and Emacs looking for these turned up 3. actually most network device drivers have a similar thing in the receive interrupt processing. The code below is from if_fxp.c but many other drivers have that as well: /* * Drop locks before calling if_input() since it * may re-enter fxp_start() in the netisr case. * This would result in a lock reversal. Better * performance might be obtained by chaining all * packets received, dropping the lock, and then * calling if_input() on each one. */ FXP_UNLOCK(sc); (*ifp->if_input)(ifp, m); FXP_LOCK(sc); ... here (and everywhere else) there is no check that the device is still in a consistent state when reacquiring the lock. How safe is this in the face of, say, device removal, i have no idea, because i cannot find a place that comments what the sc->sc_mtx lock is supposed to protect. But in fact, reading fxp_detach() is not very encouraging.... and the leftover spl*() calls are not making it any easier to understand the code. Actually, if someone can point me to something that documents how network locking (device driver and above) is designed, i'd be very grateful. cheers luigi > > It may still be something to look at, but perhaps mroe from a point of > > view of cleaning up our APIs to not do this kind of jiggery pokery, > > rather than inventing a new locking paradigm, which is fraught with > > peril. > > > > This isn't exhaustive though, if others can show this kind of thing > > being done a lot, or becoming a idiom in our code, then there is more > > cause for concern. > > > > Just my 2 yen :-) > > Agreed. I'd rather that we fix our APIs and try to organize the workflow to > minimize the number of times that a lock has to be dropped and then > reacquired. > > -- > John Baldwin  <><  http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 17:27:52 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3A50616A44C; Wed, 17 Aug 2005 17:27:52 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD35C43D45; Wed, 17 Aug 2005 17:27:51 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Wed, 17 Aug 2005 13:42:49 -0400 From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 17 Aug 2005 13:28:28 -0400 User-Agent: KMail/1.8 References: <42F9ECF2.8080809@freebsd.org> <200508170836.05861.jhb@FreeBSD.org> <20050817093332.A85431@xorpc.icir.org> In-Reply-To: <20050817093332.A85431@xorpc.icir.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508171328.29654.jhb@FreeBSD.org> Cc: gnn@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 17:27:53 -0000 On Wednesday 17 August 2005 12:33 pm, Luigi Rizzo wrote: > On Wed, Aug 17, 2005 at 08:36:04AM -0400, John Baldwin wrote: > > On Tuesday 16 August 2005 10:13 pm, gnn@freebsd.org wrote: > > > At Tue, 16 Aug 2005 16:22:41 -0700, > > > > > > luigi wrote: > > > > ok just for the records, the issue i had in mind is the > > > > release/acquire the mutex around some code that might cause a > > > > deadlock, not for the mutex ops per se, but for the need to make sure > > > > that the data structure is consistent before releasing the lock, and > > > > rechecking > > ... > > > > Correct me if I'm wrong but I suspect you're thinking of cases such > > > as: > > > > > > RT_UNLOCK(rt); /* XXX workaround LOR */ > > > gwrt = rtalloc1(gate, 1, 0); > > > RT_LOCK(rt); > > > > > > in the routing/networking code. A quick, and by no means exhaustive, > > > check of CURRENT with cscope and Emacs looking for these turned up 3. > > actually most network device drivers have a similar thing in > the receive interrupt processing. The code below is from if_fxp.c > but many other drivers have that as well: > > /* > * Drop locks before calling if_input() since it > * may re-enter fxp_start() in the netisr case. > * This would result in a lock reversal. Better > * performance might be obtained by chaining all > * packets received, dropping the lock, and then > * calling if_input() on each one. > */ > FXP_UNLOCK(sc); > (*ifp->if_input)(ifp, m); > FXP_LOCK(sc); > ... > > here (and everywhere else) there is no check that the device is still > in a consistent state when reacquiring the lock. > > How safe is this in the face of, say, device removal, i have no idea, > because i cannot find a place that comments what the sc->sc_mtx lock is > supposed to protect. > > But in fact, reading fxp_detach() is not very encouraging.... > and the leftover spl*() calls are not making it any easier to understand > the code. > > Actually, if someone can point me to something that documents how > network locking (device driver and above) is designed, i'd be > very grateful. fxp(4)'s locking is somewhat buggy where you are looking probably. I think I've already committed the fixes to HEAD so that detach() is less discouraging (we just lock fxp_stop() in detach now). The calls to if_input() are the one place in ethernet drivers where they drop the lock and then reacquire it later after they have finished dequeueing a freshly received packet from their rx ring. One suggestion for improvement here is to have the various ethernet drivers batch up all the received packets in a IFQ private to that function and then at the end of the function drop the lock and pass the packets up to if_input so that in the case that multiple packets are received, there are fewer lock operations. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 00:02:49 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CB86C16A41F; Thu, 18 Aug 2005 00:02:49 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 849D443D49; Thu, 18 Aug 2005 00:02:49 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7I02mEp072251; Wed, 17 Aug 2005 17:02:48 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7I02mBM072250; Wed, 17 Aug 2005 17:02:48 -0700 (PDT) (envelope-from rizzo) Date: Wed, 17 Aug 2005 17:02:48 -0700 From: Luigi Rizzo To: Max Laier Message-ID: <20050817170248.A70991@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508170435.34688.max@love2party.net>; from max@love2party.net on Wed, Aug 17, 2005 at 04:35:19AM +0200 Cc: arch@freebsd.org, net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 00:02:49 -0000 On Wed, Aug 17, 2005 at 04:35:19AM +0200, Max Laier wrote: > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > > [apologies for the cross post but it belongs both to arch and net.] > > > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of > > aisimilar but slightly different implementation of > > multiple-reader/single-writer locks, which brings up the question(s): > > > > 1. should we rather put this code in the generic kernel code so that other > > subsystems could make use of it ? E.g. the routing table is certainly > > a candidate, > > I have asked this several time on -arch and IRC, but never found anyone > willing to pursue it. However, the problem is ... > > > and especially > > > > 2. should we implement it right ? > > > > Both implementations are subject to starvation for the writers > > (which is indeed a problem here, because we might want to modify > > a ruleset and be prevented from doing it because of incoming traffic > > that keeps readers active). > > Also the PFIL_TRY_WLOCK will in fact be blocking if a writer > > is already in - i have no idea how problematic is this in the > > way it is actually used. > >... really this. I didn't find a clean way out of the starvation issue. What > I do for pfil is that I set a flag and simply stop serving[2] shared requests > once a writer waits for the lock. If a writer can't sleep[1] then we return > EBUSY and don't. However, for pfil it's almost ever safe to assume that a > write may sleep (as it is for most instances of this kind of sx-lock where > you have BIGNUMxreads:1xwrite). could you guys look at the following code and see if it makes sense, or tell me where i am wrong ? It should solve the starvation and blocking trylock problems, because the active reader does not hold the mutex in the critical section anymore. The lock could well be a spinlock. cheers luigi /* * Implementation of multiple reader-single writer that prevents starvation. * Luigi Rizzo 2005.08.19 * * The mutex m only protects accesses to the struct rwlock. * We can have the following states: * IDLE: readers = 0, writers = 0, br = 0; * any request will be granted immediately. * * READ: readers > 0, writers = 0, br = 0. Read in progress. * Grant read requests immediately, queue write requests and * move to READ1. * When last reader terminates, move to IDLE. * * READ1: readers > 0, writers > 0, br >= 0. * Read in progress, but writers are queued. * Queue read and write requests to qr and wr, respectively. * When the last reader terminates, wakeup the next queued writer * and move to WRITE * * WRITE: readers = 0, writers > 0, br >= 0. * Write in progress, possibly queued readers/writers. * Queue read and write requests to qr and wr, respectively. * When the writer terminates, wake up all readers if any, * otherwise wake up the next writer if any. * Move to READ, READ1, IDLE accordingly. */ struct rwlock { mtx m; /* protects access to the rwlock */ int readers; /* active readers */ int br; /* blocked readers */ int writers; /* active + blocked writers */ cv qr; /* queued readers */ cv qw; /* queued writers */ } int RLOCK(struct rwlock *rwl, int try) { if (!try) mtx_lock(&rwl->m); else if (!mtx_trylock(&rwl->m)) return EBUSY; if (rwl->writers == 0) /* no writer, pass */ rwl->readers++; else { rwl->br++; cv_wait(&rwl->qr, &rwl->m); } mtx_unlock(&rwl->m); return 0; } int WLOCK(struct rwlock *rwl, int try) { if (!try) mtx_lock(&rwl->m); else if (!mtx_trylock(&rwl->m)) return EBUSY; rwl->writers++; if (rwl->readers > 0) /* have readers, must wait */ cv_wait(&rwl->qw, &rwl->m); mtx_unlock(&rwl->m); return 0; } void RUNLOCK(struct rwlock *rwl) { mtx_lock(&rwl->m); rwl->readers--; if (rwl->readers == 0 && rwl->writers > 0) cv_signal(&rwl->qw); mtx_unlock(&rwl->m); } void WUNLOCK(struct rwlock *rwl) { mtx_lock(&rwl->m); rwl->writers--; if (rwl->br > 0) { /* priority to readers */ rwl->readers = rwl->br; rwl->br = 0; cv_broadcast(&rwl->qr); } else if (rwl->writers > 0) cv_signal(&rwl->qw); mtx_unlock(&rwl->m); } From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 01:32:42 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5974A16A41F; Thu, 18 Aug 2005 01:32:42 +0000 (GMT) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.FreeBSD.org (Postfix) with ESMTP id A442B43D48; Thu, 18 Aug 2005 01:32:41 +0000 (GMT) (envelope-from max@love2party.net) Received: from p54A3E589.dip.t-dialin.net [84.163.229.137] (helo=donor.laier.local) by mrelayeu.kundenserver.de with ESMTP (Nemesis), id 0MKwtQ-1E5ZGn3aft-00063d; Thu, 18 Aug 2005 03:32:37 +0200 From: Max Laier To: Luigi Rizzo Date: Thu, 18 Aug 2005 03:32:19 +0200 User-Agent: KMail/1.8.2 References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> In-Reply-To: <20050817170248.A70991@xorpc.icir.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1138726.tHd4SFx7Ar"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200508180332.34895.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:61c499deaeeba3ba5be80f48ecc83056 Cc: arch@freebsd.org, net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 01:32:42 -0000 --nextPart1138726.tHd4SFx7Ar Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > On Wed, Aug 17, 2005 at 04:35:19AM +0200, Max Laier wrote: > > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > > > [apologies for the cross post but it belongs both to arch and net.] > > > > > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of > > > aisimilar but slightly different implementation of > > > multiple-reader/single-writer locks, which brings up the question(s): > > > > > > 1. should we rather put this code in the generic kernel code so that > > > other subsystems could make use of it ? E.g. the routing table is > > > certainly a candidate, > > > > I have asked this several time on -arch and IRC, but never found anyone > > willing to pursue it. However, the problem is ... > > > > > and especially > > > > > > 2. should we implement it right ? > > > > > > Both implementations are subject to starvation for the writers > > > (which is indeed a problem here, because we might want to modify > > > a ruleset and be prevented from doing it because of incoming traff= ic > > > that keeps readers active). > > > Also the PFIL_TRY_WLOCK will in fact be blocking if a writer > > > is already in - i have no idea how problematic is this in the > > > way it is actually used. > > > >... really this. I didn't find a clean way out of the starvation issue.= =20 > > What I do for pfil is that I set a flag and simply stop serving[2] shar= ed > > requests once a writer waits for the lock. If a writer can't sleep[1] > > then we return EBUSY and don't. However, for pfil it's almost ever safe > > to assume that a write may sleep (as it is for most instances of this > > kind of sx-lock where you have BIGNUMxreads:1xwrite). > > could you guys look at the following code and see if it makes sense, > or tell me where i am wrong ? > > It should solve the starvation and blocking trylock problems, > because the active reader does not hold the mutex in the critical > section anymore. The lock could well be a spinlock. The reader doesn't hold the lock over the critical section with the current= =20 implementation either. The write holds the lock over the critical section,= =20 which was a conscious decision. For some cases it does make sense to get r= id=20 of this, however. See inline comments why this doesn't work for pfil. > cheers > luigi > > /* > * Implementation of multiple reader-single writer that prevents > starvation. * Luigi Rizzo 2005.08.19 > * > * The mutex m only protects accesses to the struct rwlock. > * We can have the following states: > * IDLE: readers =3D 0, writers =3D 0, br =3D 0; > * any request will be granted immediately. > * > * READ: readers > 0, writers =3D 0, br =3D 0. Read in progress. > * Grant read requests immediately, queue write requests and > * move to READ1. > * When last reader terminates, move to IDLE. > * > * READ1: readers > 0, writers > 0, br >=3D 0. > * Read in progress, but writers are queued. > * Queue read and write requests to qr and wr, respectively. > * When the last reader terminates, wakeup the next queued writer > * and move to WRITE > * > * WRITE: readers =3D 0, writers > 0, br >=3D 0. > * Write in progress, possibly queued readers/writers. > * Queue read and write requests to qr and wr, respectively. > * When the writer terminates, wake up all readers if any, > * otherwise wake up the next writer if any. > * Move to READ, READ1, IDLE accordingly. > */ > > struct rwlock { > mtx m; /* protects access to the rwlock */ > int readers; /* active readers */ > int br; /* blocked readers */ > int writers; /* active + blocked writers */ > cv qr; /* queued readers */ > cv qw; /* queued writers */ > } > > int > RLOCK(struct rwlock *rwl, int try) > { > if (!try) > mtx_lock(&rwl->m); > else if (!mtx_trylock(&rwl->m)) > return EBUSY; > if (rwl->writers =3D=3D 0) /* no writer, pass */ > rwl->readers++; > else { > rwl->br++; > cv_wait(&rwl->qr, &rwl->m); ^^^^^^^ That we can't do. That's exactly the thing the existing sx(9) implementati= on=20 does and where it breaks. The problem is that cv_wait() is an implicit sle= ep=20 which breaks when we try to RLOCK() with other mutex already acquired. =20 Moreover will this break for recursive reads e.g.: Thread 1: RLOCK() ... RLOCK() -> cv_wait ... Thread 2: WLOCK() -> cv_wait ... This is exactly what pfil_hooks must be able to do as the packet filter may= =20 want to call back to the stack in order to send rejects etc. It would be an idea to use cv_wait() depending on the value in the try=20 argument and return EBUSY for that as well. This is a textbook implementation, that sure will work (given the above=20 change). The problem is, that we still have to invest 4 mutex operations f= or=20 every access. The current implementation has the same basic problem (thoug= h=20 it only uses 2 mutex operations for the WLOCK/UNLOCK). Ideally the RLOCK/= =20 UNLOCK should be free unless there is a writer waiting for the lock. In=20 order to do this I am thinking of a "spl-like" value in the PCPU section. = A=20 write would IPI other CPUs with a request to "raise" the bar once all have= =20 confirmed it would do the write and IPI again. This gives a serious=20 disadvantage to writes, but would allow us to block on failed read requests= =20 instead of erroring out or sleeping. Also, an uncongested read would be fr= ee=20 when compared to the existing solution. I have to read some more to see if this actually works. Comments appreciat= ed! > } > mtx_unlock(&rwl->m); > return 0; > } > > int > WLOCK(struct rwlock *rwl, int try) > { > if (!try) > mtx_lock(&rwl->m); > else if (!mtx_trylock(&rwl->m)) > return EBUSY; > rwl->writers++; > if (rwl->readers > 0) /* have readers, must wait */ > cv_wait(&rwl->qw, &rwl->m); > mtx_unlock(&rwl->m); > return 0; > } > > void > RUNLOCK(struct rwlock *rwl) > { > mtx_lock(&rwl->m); > rwl->readers--; > if (rwl->readers =3D=3D 0 && rwl->writers > 0) > cv_signal(&rwl->qw); > mtx_unlock(&rwl->m); > } > > void > WUNLOCK(struct rwlock *rwl) > { > mtx_lock(&rwl->m); > rwl->writers--; > if (rwl->br > 0) { /* priority to readers */ > rwl->readers =3D rwl->br; > rwl->br =3D 0; > cv_broadcast(&rwl->qr); > } else if (rwl->writers > 0) > cv_signal(&rwl->qw); > mtx_unlock(&rwl->m); > } =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart1138726.tHd4SFx7Ar Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQBDA+UyXyyEoT62BG0RAsJbAJ9deDV6DgsWcpWRrnkqAX3v/CDsVACfZ/fw 35bCcPJKPDG91LbEDylDFPw= =7C2d -----END PGP SIGNATURE----- --nextPart1138726.tHd4SFx7Ar-- From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 01:40:56 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D3B6D16A420; Thu, 18 Aug 2005 01:40:56 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9426743D46; Thu, 18 Aug 2005 01:40:56 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7I1euH8080957; Wed, 17 Aug 2005 18:40:56 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7I1euEq080956; Wed, 17 Aug 2005 18:40:56 -0700 (PDT) (envelope-from rizzo) Date: Wed, 17 Aug 2005 18:40:56 -0700 From: Luigi Rizzo To: John Baldwin Message-ID: <20050817184056.A72643@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <200508170836.05861.jhb@FreeBSD.org> <20050817093332.A85431@xorpc.icir.org> <200508171328.29654.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508171328.29654.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Wed, Aug 17, 2005 at 01:28:28PM -0400 Cc: gnn@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 01:40:57 -0000 On Wed, Aug 17, 2005 at 01:28:28PM -0400, John Baldwin wrote: ... > fxp(4)'s locking is somewhat buggy where you are looking probably. I think > I've already committed the fixes to HEAD so that detach() is less > discouraging (we just lock fxp_stop() in detach now). The calls to well, my specific concern with the detach routine (but I was wrong, at least on this part) was that dropping the lock could cause the struct to go away while the interrupt handler was working on it. Now i see that this should be safe because bus_teardown_intr() blocks until we are out of the handler (the comment "Unhook interrupt before dropping lock." is probably stale...), and given that the detach() handler runs under giant and we cannot have multiple instances of it, at least this path seems safe. However I am still unclear on what happens if a detach() is racing with the output path (leading to fxp_start()). cheers luigi From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 07:57:40 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E5D5C16A41F; Thu, 18 Aug 2005 07:57:40 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6671C43D49; Thu, 18 Aug 2005 07:57:40 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7I7veAT083923; Thu, 18 Aug 2005 00:57:40 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7I7vdHi083922; Thu, 18 Aug 2005 00:57:39 -0700 (PDT) (envelope-from rizzo) Date: Thu, 18 Aug 2005 00:57:39 -0700 From: Luigi Rizzo To: Max Laier Message-ID: <20050818005739.A83776@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> <200508180332.34895.max@love2party.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508180332.34895.max@love2party.net>; from max@love2party.net on Thu, Aug 18, 2005 at 03:32:19AM +0200 Cc: arch@freebsd.org, net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 07:57:41 -0000 On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: ... > > could you guys look at the following code and see if it makes sense, > > or tell me where i am wrong ? > > > > It should solve the starvation and blocking trylock problems, > > because the active reader does not hold the mutex in the critical ^^^^^^ i meant 'writer', sorry... as max said even in the current implementation the reader does not hold the lock. > > int > > RLOCK(struct rwlock *rwl, int try) > > { > > if (!try) > > mtx_lock(&rwl->m); > > else if (!mtx_trylock(&rwl->m)) > > return EBUSY; > > if (rwl->writers == 0) /* no writer, pass */ > > rwl->readers++; > > else { > > rwl->br++; > > cv_wait(&rwl->qr, &rwl->m); > ^^^^^^^ > > That we can't do. That's exactly the thing the existing sx(9) implementation > does and where it breaks. The problem is that cv_wait() is an implicit sleep > which breaks when we try to RLOCK() with other mutex already acquired. but that is not a solvable problem given that the *LOCK may be blocking. And the cv_wait is not an unconditioned sleep, it is one where you release the lock right before ans wait for an event to wake you up. In fact i don't understand why you consider spinning and sleeping on a mutex two different things. > Moreover will this break for recursive reads e.g.: > > Thread 1: RLOCK() ... RLOCK() -> cv_wait ... > Thread 2: WLOCK() -> cv_wait ... > > This is exactly what pfil_hooks must be able to do as the packet filter may > want to call back to the stack in order to send rejects etc. that's another story (also in issue in ipfw) and the way it is addressed elsewhere is by releasing and reaquiring the lock. In fact is the topic that started this thread. > change). The problem is, that we still have to invest 4 mutex operations for > every access. The current implementation has the same basic problem (though > it only uses 2 mutex operations for the WLOCK/UNLOCK). Ideally the RLOCK/ > UNLOCK should be free unless there is a writer waiting for the lock. In "free unless" means not free - you always have to check, be it through an atomic cmpswap or something else. But this is what mtx_lock does anyways in the fast path. cheers luigi From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 14:18:38 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AD22116A41F for ; Thu, 18 Aug 2005 14:18:38 +0000 (GMT) (envelope-from ups@tree.com) Received: from smtp.speedfactory.net (smtp.speedfactory.net [66.23.216.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 84B9543D48 for ; Thu, 18 Aug 2005 14:18:37 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 9936 invoked by uid 210); 18 Aug 2005 14:19:00 +0000 Received: from 66.23.216.49 by talon (envelope-from , uid 201) with qmail-scanner-1.25st (clamdscan: 0.85.1/1030. spamassassin: 3.0.2. perlscan: 1.25st. Clear:RC:1(66.23.216.49):. Processed in 0.051598 secs); 18 Aug 2005 14:19:00 -0000 X-Qmail-Scanner-Mail-From: ups@tree.com via talon X-Qmail-Scanner: 1.25st (Clear:RC:1(66.23.216.49):. Processed in 0.051598 secs Process 9926) Received: from 66-23-216-49.clients.speedfactory.net (HELO palm.tree.com) (66.23.216.49) by smtp.speedfactory.net with AES256-SHA encrypted SMTP; 18 Aug 2005 14:19:00 +0000 Received: from [127.0.0.1] (ups@localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j7IEIXrK007363; Thu, 18 Aug 2005 10:18:35 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Luigi Rizzo In-Reply-To: <20050818005739.A83776@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> <200508180332.34895.max@love2party.net> <20050818005739.A83776@xorpc.icir.org> Content-Type: text/plain Message-Id: <1124374713.1360.64660.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Thu, 18 Aug 2005 10:18:33 -0400 Content-Transfer-Encoding: 7bit Cc: Max Laier , net@freebsd.org, arch@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 14:18:38 -0000 On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > ... > > > could you guys look at the following code and see if it makes sense, > > > or tell me where i am wrong ? > > > > > > It should solve the starvation and blocking trylock problems, > > > because the active reader does not hold the mutex in the critical > ^^^^^^ > > i meant 'writer', sorry... as max said even in the current implementation > the reader does not hold the lock. > > > > int > > > RLOCK(struct rwlock *rwl, int try) > > > { > > > if (!try) > > > mtx_lock(&rwl->m); > > > else if (!mtx_trylock(&rwl->m)) > > > return EBUSY; > > > if (rwl->writers == 0) /* no writer, pass */ > > > rwl->readers++; > > > else { > > > rwl->br++; > > > cv_wait(&rwl->qr, &rwl->m); > > ^^^^^^^ > > > > That we can't do. That's exactly the thing the existing sx(9) implementation > > does and where it breaks. The problem is that cv_wait() is an implicit sleep > > which breaks when we try to RLOCK() with other mutex already acquired. > > but that is not a solvable problem given that the *LOCK may be blocking. > And the cv_wait is not an unconditioned sleep, it is one where you release > the lock right before ans wait for an event to wake you up. > In fact i don't understand why you consider spinning and sleeping > on a mutex two different things. The major difference between sleeping (cv_wait,msleep,..) and blocking on a mutex is priority inheritance. If you need to be able to use (non-spin) mutexes while holding a [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you need to implement priority inheritance for [R|W]LOCKs. For the (single) write lock holder tracking priority is easy. (just like a (non-spin) mutex). However priority inheritance to multiple readers is more difficult as one needs to keep track of all holders of the lock. Keeping track of all readers requires pre-allocated memory resources. This memory could come from 1) A limited global pool 2) A limited per [R|W]LOCK pool 3) A limited per thread pool 4) As a parameter for acquiring a RLOCK None of the choices are really pretty. (1),(2) and (3) can lead to limiting reader parallelism when running out of resources. (4) may be practically for some cases since the memory could be allocated from stack. However since the memory must be valid while holding a read lock choice (4) makes some algorithms (than use for example lock crabbing) a bit harder to implement. > > Moreover will this break for recursive reads e.g.: > > > > Thread 1: RLOCK() ... RLOCK() -> cv_wait ... > > Thread 2: WLOCK() -> cv_wait ... > > > > This is exactly what pfil_hooks must be able to do as the packet filter may > > want to call back to the stack in order to send rejects etc. > > that's another story (also in issue in ipfw) and the way it is addressed > elsewhere is by releasing and reaquiring the lock. In fact is the topic > that started this thread. > > > change). The problem is, that we still have to invest 4 mutex operations for > > every access. The current implementation has the same basic problem (though > > it only uses 2 mutex operations for the WLOCK/UNLOCK). Ideally the RLOCK/ > > UNLOCK should be free unless there is a writer waiting for the lock. In > > "free unless" means not free - you always have to check, be it through > an atomic cmpswap or something else. But this is what mtx_lock does > anyways in the fast path. > > cheers > luigi > From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 14:31:24 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E253B16A41F; Thu, 18 Aug 2005 14:31:24 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id A349E43D48; Thu, 18 Aug 2005 14:31:24 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7IEVOYw087693; Thu, 18 Aug 2005 07:31:24 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7IEVOQn087692; Thu, 18 Aug 2005 07:31:24 -0700 (PDT) (envelope-from rizzo) Date: Thu, 18 Aug 2005 07:31:24 -0700 From: Luigi Rizzo To: Stephan Uphoff Message-ID: <20050818073124.A87225@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> <200508180332.34895.max@love2party.net> <20050818005739.A83776@xorpc.icir.org> <1124374713.1360.64660.camel@palm> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <1124374713.1360.64660.camel@palm>; from ups@tree.com on Thu, Aug 18, 2005 at 10:18:33AM -0400 Cc: arch@freebsd.org, Max Laier , net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 14:31:25 -0000 On Thu, Aug 18, 2005 at 10:18:33AM -0400, Stephan Uphoff wrote: > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: ... > > In fact i don't understand why you consider spinning and sleeping > > on a mutex two different things. > > The major difference between sleeping (cv_wait,msleep,..) and blocking > on a mutex is priority inheritance. > If you need to be able to use (non-spin) mutexes while holding a > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you > need to implement priority inheritance for [R|W]LOCKs. is that required (in FreeBSD, i mean) for algorithmic correctness or just for performance ? cheers luigi From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 16:28:26 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EE53C16A41F; Thu, 18 Aug 2005 16:28:26 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B6A143D48; Thu, 18 Aug 2005 16:28:26 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Thu, 18 Aug 2005 12:43:26 -0400 From: John Baldwin To: freebsd-arch@freebsd.org Date: Thu, 18 Aug 2005 10:23:04 -0400 User-Agent: KMail/1.8 References: <42F9ECF2.8080809@freebsd.org> <200508171328.29654.jhb@FreeBSD.org> <20050817184056.A72643@xorpc.icir.org> In-Reply-To: <20050817184056.A72643@xorpc.icir.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508181023.05929.jhb@FreeBSD.org> Cc: gnn@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 16:28:27 -0000 On Wednesday 17 August 2005 09:40 pm, Luigi Rizzo wrote: > On Wed, Aug 17, 2005 at 01:28:28PM -0400, John Baldwin wrote: > ... > > > fxp(4)'s locking is somewhat buggy where you are looking probably. I > > think I've already committed the fixes to HEAD so that detach() is less > > discouraging (we just lock fxp_stop() in detach now). The calls to > > well, my specific concern with the detach routine (but I was wrong, > at least on this part) was that dropping the lock could cause the struct to > go away while the interrupt handler was working on it. > Now i see that this should be safe because bus_teardown_intr() > blocks until we are out of the handler (the comment "Unhook interrupt > before dropping lock." is probably stale...), and given that > the detach() handler runs under giant and we cannot have multiple > instances of it, at least this path seems safe. > > However I am still unclear on what happens if a detach() is racing with the > output path (leading to fxp_start()). Note that we first down the interface via fxp_stop() and then we unhook it from the network stack using ether_ifdetach(). Once we've done ether_ifdetach() the network stack can't get to the fxp device anymore. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 16:55:47 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 298B716A41F; Thu, 18 Aug 2005 16:55:47 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id E046F43D45; Thu, 18 Aug 2005 16:55:46 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7IGtkti092082; Thu, 18 Aug 2005 09:55:46 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7IGtksK092081; Thu, 18 Aug 2005 09:55:46 -0700 (PDT) (envelope-from rizzo) Date: Thu, 18 Aug 2005 09:55:46 -0700 From: Luigi Rizzo To: John Baldwin Message-ID: <20050818095546.A91965@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <200508171328.29654.jhb@FreeBSD.org> <20050817184056.A72643@xorpc.icir.org> <200508181023.05929.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508181023.05929.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Thu, Aug 18, 2005 at 10:23:04AM -0400 Cc: gnn@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 16:55:47 -0000 On Thu, Aug 18, 2005 at 10:23:04AM -0400, John Baldwin wrote: ... > > However I am still unclear on what happens if a detach() is racing with the > > output path (leading to fxp_start()). > > Note that we first down the interface via fxp_stop() and then we unhook it > from the network stack using ether_ifdetach(). Once we've done > ether_ifdetach() the network stack can't get to the fxp device anymore. It might have gotten there before, then this sequence might occur: thread 'fxp_detach' thread 'fxp_start' acquire fxp lock wait for lock (goes to sleep maybe ?) fxp_stop release fxp lock destroy everything including the lock resume, mtx_lock -> boom hmmm... it's really tricky to follow. Maybe this does not happen, but i wouldn't know why as fxp_detach() is under giant but the path leading to fxp_start is not... cheers luigi From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 17:16:55 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BEA916A420; Thu, 18 Aug 2005 17:16:55 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 09E7143D58; Thu, 18 Aug 2005 17:16:53 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Thu, 18 Aug 2005 13:31:54 -0400 From: John Baldwin To: Luigi Rizzo Date: Thu, 18 Aug 2005 13:12:20 -0400 User-Agent: KMail/1.8 References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> In-Reply-To: <20050818095546.A91965@xorpc.icir.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508181312.21512.jhb@FreeBSD.org> Cc: gnn@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 17:16:55 -0000 On Thursday 18 August 2005 12:55 pm, Luigi Rizzo wrote: > On Thu, Aug 18, 2005 at 10:23:04AM -0400, John Baldwin wrote: > ... > > > > However I am still unclear on what happens if a detach() is racing with > > > the output path (leading to fxp_start()). > > > > Note that we first down the interface via fxp_stop() and then we unhook > > it from the network stack using ether_ifdetach(). Once we've done > > ether_ifdetach() the network stack can't get to the fxp device anymore. > > It might have gotten there before, then this sequence might occur: > > thread 'fxp_detach' thread 'fxp_start' > > acquire fxp lock wait for lock (goes to sleep maybe ?) > fxp_stop > release fxp lock > destroy everything > including the lock > resume, mtx_lock -> boom > > hmmm... it's really tricky to follow. Maybe this does not happen, > but i wouldn't know why as fxp_detach() is under giant but the > path leading to fxp_start is not... ether_ifdetach() should be handling this for us I think by blocking until any known top-half threads are out of the driver. It may not be doing that yet, but I think that is where it should happen similar to how we use bus_teardown_intr() and callout_drain() in detach to block until other threads are out of our driver if necessary. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 17:25:25 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1444916A41F for ; Thu, 18 Aug 2005 17:25:25 +0000 (GMT) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 275C543D69 for ; Thu, 18 Aug 2005 17:25:22 +0000 (GMT) (envelope-from andre@freebsd.org) Received: (qmail 88045 invoked from network); 18 Aug 2005 17:05:32 -0000 Received: from unknown (HELO freebsd.org) ([62.48.0.53]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 18 Aug 2005 17:05:32 -0000 Message-ID: <4304C485.AC251A46@freebsd.org> Date: Thu, 18 Aug 2005 19:25:25 +0200 From: Andre Oppermann X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: John Baldwin References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org, gnn@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 17:25:25 -0000 To reverse-highjack my thread again. Here are some excerpts from a direct email response to Luigi for the record of -arch. Luigi> Apart from pinning one thread to a given CPU, the SMP architecture Luigi> of FreeBSD 5/6 will not help in doing what you want because it will Luigi> still force you to pay the locking overhead compared to 4.x. My idea is/was to move to much less expensive UP locks within the kernel if the CPU's can be split into kernel and userland. Maybe not all of them. Some synchronization would still be required between those two CPU's but at least all in the critical path I care about [are inexpensive]. Luigi> ... My workload for this scenarion is routing and running routing protocols. Routing is not something that is really SMP scaleable unless you have truely distributed architecture. However some routing protocols can get quite CPU intensive at times and with my split SMP I get both worlds. The kernel CPU can continue forwarding packets and the user CPU can recalculate all the routing tables. Only from time to time those two have to meet somewhere. When packets for userland arrive or the routing daemon is sending packets. And of course kernel FIB updates. But compared to the number of transactions happening overall these boundary crossings are just a little tiny fraction of 0.00001%. I'd rather have optimized the other 99.99999% of the events happening per second by 50% each. And I'm willing to pay even a hefty price for the boundary crossing as long as it is not too hefty on the kernel side. It can take as long as it wants on the userland side. I'm fine with suspending the userland CPU entirely while it waits for the kernel. Would be nice if it could switch to another task but I don't care. ... -- Andre From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 18:56:12 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 787E116A41F; Thu, 18 Aug 2005 18:56:12 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3BAF343D45; Thu, 18 Aug 2005 18:56:11 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7IIuB0h094300; Thu, 18 Aug 2005 11:56:11 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7IIuBU2094299; Thu, 18 Aug 2005 11:56:11 -0700 (PDT) (envelope-from rizzo) Date: Thu, 18 Aug 2005 11:56:11 -0700 From: Luigi Rizzo To: John Baldwin Message-ID: <20050818115611.A94148@xorpc.icir.org> References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508181312.21512.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Thu, Aug 18, 2005 at 01:12:20PM -0400 Cc: gnn@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 18:56:12 -0000 On Thu, Aug 18, 2005 at 01:12:20PM -0400, John Baldwin wrote: [discussion about potential race between foo_detach() and foo_start()] > > hmmm... it's really tricky to follow. Maybe this does not happen, > > but i wouldn't know why as fxp_detach() is under giant but the > > path leading to fxp_start is not... > > ether_ifdetach() should be handling this for us I think by blocking until any > known top-half threads are out of the driver. It may not be doing that yet, ok but note that fxp_start could well be called by a bottom-half thread receiving from a different interface and forwarding to this one. In any case the detach is probably a very difficult part to fix because there are references to interfaces all over the place, and they are not refcounted so we can never know when it is safe to release memory. At least, so i believe... cheers luigi From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 19:06:26 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9FB1916A41F; Thu, 18 Aug 2005 19:06:26 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 478E443D45; Thu, 18 Aug 2005 19:06:26 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.13.0/8.13.0) with ESMTP id j7IJ6Pf3032326; Thu, 18 Aug 2005 12:06:25 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.13.0/8.13.0/Submit) id j7IJ6PZ6032325; Thu, 18 Aug 2005 12:06:25 -0700 Date: Thu, 18 Aug 2005 12:06:25 -0700 From: Brooks Davis To: John Baldwin Message-ID: <20050818190625.GA28174@odin.ac.hmc.edu> References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="dDRMvlgZJXvWKvBx" Content-Disposition: inline In-Reply-To: <200508181312.21512.jhb@FreeBSD.org> User-Agent: Mutt/1.4.1i X-Virus-Scanned: by amavisd-new X-Spam-Status: No, hits=0.0 required=8.0 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on odin.ac.hmc.edu Cc: freebsd-arch@freebsd.org, gnn@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 19:06:26 -0000 --dDRMvlgZJXvWKvBx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 18, 2005 at 01:12:20PM -0400, John Baldwin wrote: > On Thursday 18 August 2005 12:55 pm, Luigi Rizzo wrote: > > On Thu, Aug 18, 2005 at 10:23:04AM -0400, John Baldwin wrote: > > ... > > > > > > However I am still unclear on what happens if a detach() is racing = with > > > > the output path (leading to fxp_start()). > > > > > > Note that we first down the interface via fxp_stop() and then we unho= ok > > > it from the network stack using ether_ifdetach(). Once we've done > > > ether_ifdetach() the network stack can't get to the fxp device anymor= e. > > > > It might have gotten there before, then this sequence might occur: > > > > thread 'fxp_detach' thread 'fxp_start' > > > > acquire fxp lock wait for lock (goes to sleep maybe ?) > > fxp_stop > > release fxp lock > > destroy everything > > including the lock > > resume, mtx_lock -> boom > > > > hmmm... it's really tricky to follow. Maybe this does not happen, > > but i wouldn't know why as fxp_detach() is under giant but the > > path leading to fxp_start is not... >=20 > ether_ifdetach() should be handling this for us I think by blocking until= any=20 > known top-half threads are out of the driver. It may not be doing that y= et,=20 > but I think that is where it should happen similar to how we use=20 > bus_teardown_intr() and callout_drain() in detach to block until other=20 > threads are out of our driver if necessary. Certainly we need assurance that nothing is going to try and touch the driver before the driver's detach function calls if_free. After if_free returns, no aspect of the driver should touch the ifnet. Other parts of the networking system might still have refrence to it should we end up with a refcouting scheme where the final free of ifnets comes some time later. All refrences to the softc need to be pruged prior to exit from the detach routine. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --dDRMvlgZJXvWKvBx Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFDBNwxXY6L6fI4GtQRAs2sAJ9rLUPGQkiobEFTsp2Gvf9XieaJUwCeKkzE Yf9KkU4AyKtpO+if6CygOTs= =our9 -----END PGP SIGNATURE----- --dDRMvlgZJXvWKvBx-- From owner-freebsd-arch@FreeBSD.ORG Fri Aug 19 02:08:48 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7901216A41F; Fri, 19 Aug 2005 02:08:48 +0000 (GMT) (envelope-from gnn@neville-neil.com) Received: from mrout2.yahoo.com (mrout2.yahoo.com [216.145.54.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C4D843D48; Fri, 19 Aug 2005 02:08:48 +0000 (GMT) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (proxy8.corp.yahoo.com [216.145.48.13]) by mrout2.yahoo.com (8.13.4/8.13.4/y.out) with ESMTP id j7J28KMU049277; Thu, 18 Aug 2005 19:08:22 -0700 (PDT) Date: Fri, 19 Aug 2005 11:08:20 +0900 Message-ID: From: gnn@freebsd.org To: Brooks Davis In-Reply-To: <20050818190625.GA28174@odin.ac.hmc.edu> References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> <20050818190625.GA28174@odin.ac.hmc.edu> User-Agent: Wanderlust/2.12.2 (99 Luftballons) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (=?ISO-8859-4?Q?Sanj=F2?=) APEL/10.6 Emacs/21.3.50 (powerpc-apple-darwin8.1.0) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2005 02:08:48 -0000 At Thu, 18 Aug 2005 12:06:25 -0700, brooks wrote: > Certainly we need assurance that nothing is going to try and touch > the driver before the driver's detach function calls if_free. After > if_free returns, no aspect of the driver should touch the ifnet. Other > parts of the networking system might still have refrence to it should we > end up with a refcouting scheme where the final free of ifnets comes > some time later. All refrences to the softc need to be pruged prior to > exit from the detach routine. > Documentatoin comment. When are we going to write a driver writer's handbook? Later, George From owner-freebsd-arch@FreeBSD.ORG Fri Aug 19 04:48:40 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C26416A41F; Fri, 19 Aug 2005 04:48:40 +0000 (GMT) (envelope-from julian@elischer.org) Received: from delight.idiom.com (delight.idiom.com [216.240.32.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 19C3E43D45; Fri, 19 Aug 2005 04:48:39 +0000 (GMT) (envelope-from julian@elischer.org) Received: from idiom.com (idiom.com [216.240.32.1]) by delight.idiom.com (Postfix) with ESMTP id 9661721F696; Thu, 18 Aug 2005 21:48:39 -0700 (PDT) Received: from [192.168.2.2] (home.elischer.org [216.240.48.38]) by idiom.com (8.12.11/8.12.11) with ESMTP id j7J4mbSU090817; Thu, 18 Aug 2005 21:48:37 -0700 (PDT) (envelope-from julian@elischer.org) Message-ID: <430564A1.70304@elischer.org> Date: Thu, 18 Aug 2005 21:48:33 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050424 X-Accept-Language: en, hu MIME-Version: 1.0 To: gnn@freebsd.org References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> <20050818190625.GA28174@odin.ac.hmc.edu> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2005 04:48:40 -0000 gnn@freebsd.org wrote: > At Thu, 18 Aug 2005 12:06:25 -0700, > brooks wrote: > >>Certainly we need assurance that nothing is going to try and touch >>the driver before the driver's detach function calls if_free. After >>if_free returns, no aspect of the driver should touch the ifnet. Other >>parts of the networking system might still have refrence to it should we >>end up with a refcouting scheme where the final free of ifnets comes >>some time later. All refrences to the softc need to be pruged prior to >>exit from the detach routine. >> > > > Documentatoin comment. When are we going to write a driver writer's > handbook? I've been reading the linux device drivers bookl from oreily. You caould almost use the first several chapters with sed s/linux/FreeBSD/ and a few other minor changes.. (not a serious comment of course but..) > > Later, > George > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Fri Aug 19 08:33:22 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4784116A41F; Fri, 19 Aug 2005 08:33:22 +0000 (GMT) (envelope-from chris@haakonia.hitnet.rwth-aachen.de) Received: from ms-dienst.rz.rwth-aachen.de (ms-1.rz.RWTH-Aachen.DE [134.130.3.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id A361343D46; Fri, 19 Aug 2005 08:33:21 +0000 (GMT) (envelope-from chris@haakonia.hitnet.rwth-aachen.de) Received: from r220-1 (r220-1.rz.RWTH-Aachen.DE [134.130.3.31]) by ms-dienst.rz.rwth-aachen.de (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTP id <0ILG00JNANRKPY@ms-dienst.rz.rwth-aachen.de>; Fri, 19 Aug 2005 10:33:20 +0200 (MEST) Received: from relay.rwth-aachen.de ([134.130.3.1]) by r220-1 (MailMonitor for SMTP v1.2.2 ) ; Fri, 19 Aug 2005 10:33:19 +0200 (MEST) Received: from bigboss.hitnet.rwth-aachen.de (bigspace.hitnet.RWTH-Aachen.DE [137.226.181.2]) by relay.rwth-aachen.de (8.13.3/8.13.3/1) with ESMTP id j7J8XImN028431; Fri, 19 Aug 2005 10:33:18 +0200 (MEST) Received: from moria.hitnet.rwth-aachen.de ([137.226.181.149] helo=haakonia.hitnet.rwth-aachen.de) by bigboss.hitnet.rwth-aachen.de with esmtp (Exim 3.35 #1 (Debian)) id 1E62JS-0005E6-00; Fri, 19 Aug 2005 10:33:18 +0200 Received: by haakonia.hitnet.rwth-aachen.de (Postfix, from userid 1001) id 9097E28446; Fri, 19 Aug 2005 10:32:48 +0200 (CEST) Date: Fri, 19 Aug 2005 10:32:48 +0200 From: Christian Brueffer In-reply-to: To: gnn@freebsd.org Message-id: <20050819083248.GA1769@unixpages.org> MIME-version: 1.0 Content-type: multipart/signed; boundary=17pEHd4RhPHOinZp; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-disposition: inline User-Agent: Mutt/1.5.6i X-Operating-System: FreeBSD 7.0-CURRENT X-PGP-Key: http://people.FreeBSD.org/~brueffer/brueffer.key.asc X-PGP-Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> <20050818190625.GA28174@odin.ac.hmc.edu> Cc: freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2005 08:33:22 -0000 --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 19, 2005 at 11:08:20AM +0900, gnn@freebsd.org wrote: > At Thu, 18 Aug 2005 12:06:25 -0700, > brooks wrote: > > Certainly we need assurance that nothing is going to try and touch > > the driver before the driver's detach function calls if_free. After > > if_free returns, no aspect of the driver should touch the ifnet. Other > > parts of the networking system might still have refrence to it should we > > end up with a refcouting scheme where the final free of ifnets comes > > some time later. All refrences to the softc need to be pruged prior to > > exit from the detach routine. > >=20 >=20 > Documentatoin comment. When are we going to write a driver writer's > handbook? >=20 Well, Jochen Kunz has already written a nice guide for NetBSD... :-) - Christian --=20 Christian Brueffer chris@unixpages.org brueffer@FreeBSD.org GPG Key: http://people.freebsd.org/~brueffer/brueffer.key.asc GPG Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D --17pEHd4RhPHOinZp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD4DBQFDBZkwbHYXjKDtmC0RApLXAJY03zhwyscd1hRc7WmTnD/+HjRXAJ45EfU2 wuBkJKXgXlPaYepyoGv2Ww== =mk/H -----END PGP SIGNATURE----- --17pEHd4RhPHOinZp-- From owner-freebsd-arch@FreeBSD.ORG Fri Aug 19 08:41:21 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BCC9916A41F; Fri, 19 Aug 2005 08:41:21 +0000 (GMT) (envelope-from chris@haakonia.hitnet.rwth-aachen.de) Received: from ms-dienst.rz.rwth-aachen.de (ms-1.rz.RWTH-Aachen.DE [134.130.3.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 25C8743D46; Fri, 19 Aug 2005 08:41:21 +0000 (GMT) (envelope-from chris@haakonia.hitnet.rwth-aachen.de) Received: from r220-1 (r220-1.rz.RWTH-Aachen.DE [134.130.3.31]) by ms-dienst.rz.rwth-aachen.de (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTP id <0ILG00KHXO4VZS@ms-dienst.rz.rwth-aachen.de>; Fri, 19 Aug 2005 10:41:19 +0200 (MEST) Received: from relay.rwth-aachen.de ([134.130.3.1]) by r220-1 (MailMonitor for SMTP v1.2.2 ) ; Fri, 19 Aug 2005 10:41:18 +0200 (MEST) Received: from bigboss.hitnet.rwth-aachen.de (bigspace.hitnet.RWTH-Aachen.DE [137.226.181.2]) by relay.rwth-aachen.de (8.13.3/8.13.3/1) with ESMTP id j7J8fIx0029445; Fri, 19 Aug 2005 10:41:18 +0200 (MEST) Received: from moria.hitnet.rwth-aachen.de ([137.226.181.149] helo=haakonia.hitnet.rwth-aachen.de) by bigboss.hitnet.rwth-aachen.de with esmtp (Exim 3.35 #1 (Debian)) id 1E62RC-0005HT-00; Fri, 19 Aug 2005 10:41:18 +0200 Received: by haakonia.hitnet.rwth-aachen.de (Postfix, from userid 1001) id 2C15F28446; Fri, 19 Aug 2005 10:40:48 +0200 (CEST) Date: Fri, 19 Aug 2005 10:40:48 +0200 From: Christian Brueffer In-reply-to: <20050819083248.GA1769@unixpages.org> To: gnn@freebsd.org Message-id: <20050819084048.GB1769@unixpages.org> MIME-version: 1.0 Content-type: multipart/signed; boundary=K8nIJk4ghYZn606h; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-disposition: inline User-Agent: Mutt/1.5.6i X-Operating-System: FreeBSD 7.0-CURRENT X-PGP-Key: http://people.FreeBSD.org/~brueffer/brueffer.key.asc X-PGP-Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D References: <42F9ECF2.8080809@freebsd.org> <200508181023.05929.jhb@FreeBSD.org> <20050818095546.A91965@xorpc.icir.org> <200508181312.21512.jhb@FreeBSD.org> <20050818190625.GA28174@odin.ac.hmc.edu> <20050819083248.GA1769@unixpages.org> Cc: freebsd-arch@freebsd.org Subject: Re: Special schedulers, one CPU only kernel, one only userland X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2005 08:41:21 -0000 --K8nIJk4ghYZn606h Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 19, 2005 at 10:32:48AM +0200, Christian Brueffer wrote: > On Fri, Aug 19, 2005 at 11:08:20AM +0900, gnn@freebsd.org wrote: > > At Thu, 18 Aug 2005 12:06:25 -0700, > > brooks wrote: > > > Certainly we need assurance that nothing is going to try and touch > > > the driver before the driver's detach function calls if_free. After > > > if_free returns, no aspect of the driver should touch the ifnet. Oth= er > > > parts of the networking system might still have refrence to it should= we > > > end up with a refcouting scheme where the final free of ifnets comes > > > some time later. All refrences to the softc need to be pruged prior = to > > > exit from the detach routine. > > >=20 > >=20 > > Documentatoin comment. When are we going to write a driver writer's > > handbook? > >=20 >=20 > Well, Jochen Kunz has already written a nice guide for NetBSD... :-) >=20 Should have included the URL: http://www.netbsd.org/Documentation/kernel/ddwg.html - Christian --=20 Christian Brueffer chris@unixpages.org brueffer@FreeBSD.org GPG Key: http://people.freebsd.org/~brueffer/brueffer.key.asc GPG Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D --K8nIJk4ghYZn606h Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQFDBZsQbHYXjKDtmC0RAnomAJwKGhMs5tajZvclEE4bRTDWTHL5igCeMSA+ H7KoATq6OIbfm8zS1ocnf0A= =Kmr6 -----END PGP SIGNATURE----- --K8nIJk4ghYZn606h-- From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 16:28:27 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D70716A41F; Thu, 18 Aug 2005 16:28:27 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0EDE743D48; Thu, 18 Aug 2005 16:28:26 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Thu, 18 Aug 2005 12:43:26 -0400 From: John Baldwin To: freebsd-arch@freebsd.org Date: Thu, 18 Aug 2005 10:26:33 -0400 User-Agent: KMail/1.8 References: <20050816170519.A74422@xorpc.icir.org> <20050818005739.A83776@xorpc.icir.org> <1124374713.1360.64660.camel@palm> In-Reply-To: <1124374713.1360.64660.camel@palm> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508181026.35502.jhb@FreeBSD.org> X-Mailman-Approved-At: Fri, 19 Aug 2005 11:43:45 +0000 Cc: arch@freebsd.org, net@freebsd.org, Max Laier , Stephan Uphoff Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 16:28:27 -0000 On Thursday 18 August 2005 10:18 am, Stephan Uphoff wrote: > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > > > > ... > > > > > > could you guys look at the following code and see if it makes sense, > > > > or tell me where i am wrong ? > > > > > > > > It should solve the starvation and blocking trylock problems, > > > > because the active reader does not hold the mutex in the critical > > > > ^^^^^^ > > > > i meant 'writer', sorry... as max said even in the current implementation > > the reader does not hold the lock. > > > > > > int > > > > RLOCK(struct rwlock *rwl, int try) > > > > { > > > > if (!try) > > > > mtx_lock(&rwl->m); > > > > else if (!mtx_trylock(&rwl->m)) > > > > return EBUSY; > > > > if (rwl->writers == 0) /* no writer, pass */ > > > > rwl->readers++; > > > > else { > > > > rwl->br++; > > > > cv_wait(&rwl->qr, &rwl->m); > > > > > > ^^^^^^^ > > > > > > That we can't do. That's exactly the thing the existing sx(9) > > > implementation does and where it breaks. The problem is that cv_wait() > > > is an implicit sleep which breaks when we try to RLOCK() with other > > > mutex already acquired. > > > > but that is not a solvable problem given that the *LOCK may be blocking. > > And the cv_wait is not an unconditioned sleep, it is one where you > > release the lock right before ans wait for an event to wake you up. > > In fact i don't understand why you consider spinning and sleeping > > on a mutex two different things. > > The major difference between sleeping (cv_wait,msleep,..) and blocking > on a mutex is priority inheritance. > If you need to be able to use (non-spin) mutexes while holding a > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you > need to implement priority inheritance for [R|W]LOCKs. > For the (single) write lock holder tracking priority is easy. (just like > a (non-spin) mutex). However priority inheritance to multiple readers is > more difficult as one needs to keep track of all holders of the lock. > Keeping track of all readers requires pre-allocated memory resources. > This memory could come from > 1) A limited global pool > 2) A limited per [R|W]LOCK pool > 3) A limited per thread pool > 4) As a parameter for acquiring a RLOCK > None of the choices are really pretty. > (1),(2) and (3) can lead to limiting reader parallelism when running out > of resources. (4) may be practically for some cases since the memory > could be allocated from stack. However since the memory must be valid > while holding a read lock choice (4) makes some algorithms (than use for > example lock crabbing) a bit harder to implement. Solaris handles the read case by only tracking the first thread to get a read lock (referred to as the "owner of record" IIRC) and only propagating priority to that thread and ignoring other readers. They admit it's not perfect as well. That's mentioned in the Solaris Internals book. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 16:28:27 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D70716A41F; Thu, 18 Aug 2005 16:28:27 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0EDE743D48; Thu, 18 Aug 2005 16:28:26 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.40.201] (Not Verified[10.50.40.201]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Thu, 18 Aug 2005 12:43:26 -0400 From: John Baldwin To: freebsd-arch@freebsd.org Date: Thu, 18 Aug 2005 10:26:33 -0400 User-Agent: KMail/1.8 References: <20050816170519.A74422@xorpc.icir.org> <20050818005739.A83776@xorpc.icir.org> <1124374713.1360.64660.camel@palm> In-Reply-To: <1124374713.1360.64660.camel@palm> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508181026.35502.jhb@FreeBSD.org> X-Mailman-Approved-At: Fri, 19 Aug 2005 11:43:45 +0000 Cc: arch@freebsd.org, net@freebsd.org, Max Laier , Stephan Uphoff Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 16:28:27 -0000 On Thursday 18 August 2005 10:18 am, Stephan Uphoff wrote: > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > > > > ... > > > > > > could you guys look at the following code and see if it makes sense, > > > > or tell me where i am wrong ? > > > > > > > > It should solve the starvation and blocking trylock problems, > > > > because the active reader does not hold the mutex in the critical > > > > ^^^^^^ > > > > i meant 'writer', sorry... as max said even in the current implementation > > the reader does not hold the lock. > > > > > > int > > > > RLOCK(struct rwlock *rwl, int try) > > > > { > > > > if (!try) > > > > mtx_lock(&rwl->m); > > > > else if (!mtx_trylock(&rwl->m)) > > > > return EBUSY; > > > > if (rwl->writers == 0) /* no writer, pass */ > > > > rwl->readers++; > > > > else { > > > > rwl->br++; > > > > cv_wait(&rwl->qr, &rwl->m); > > > > > > ^^^^^^^ > > > > > > That we can't do. That's exactly the thing the existing sx(9) > > > implementation does and where it breaks. The problem is that cv_wait() > > > is an implicit sleep which breaks when we try to RLOCK() with other > > > mutex already acquired. > > > > but that is not a solvable problem given that the *LOCK may be blocking. > > And the cv_wait is not an unconditioned sleep, it is one where you > > release the lock right before ans wait for an event to wake you up. > > In fact i don't understand why you consider spinning and sleeping > > on a mutex two different things. > > The major difference between sleeping (cv_wait,msleep,..) and blocking > on a mutex is priority inheritance. > If you need to be able to use (non-spin) mutexes while holding a > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you > need to implement priority inheritance for [R|W]LOCKs. > For the (single) write lock holder tracking priority is easy. (just like > a (non-spin) mutex). However priority inheritance to multiple readers is > more difficult as one needs to keep track of all holders of the lock. > Keeping track of all readers requires pre-allocated memory resources. > This memory could come from > 1) A limited global pool > 2) A limited per [R|W]LOCK pool > 3) A limited per thread pool > 4) As a parameter for acquiring a RLOCK > None of the choices are really pretty. > (1),(2) and (3) can lead to limiting reader parallelism when running out > of resources. (4) may be practically for some cases since the memory > could be allocated from stack. However since the memory must be valid > while holding a read lock choice (4) makes some algorithms (than use for > example lock crabbing) a bit harder to implement. Solaris handles the read case by only tracking the first thread to get a read lock (referred to as the "owner of record" IIRC) and only propagating priority to that thread and ignoring other readers. They admit it's not perfect as well. That's mentioned in the Solaris Internals book. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri Aug 19 05:19:43 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 38C9B16A41F; Fri, 19 Aug 2005 05:19:43 +0000 (GMT) (envelope-from gordon@tetlows.org) Received: from spiff.melthusia.org (spiff.melthusia.org [207.67.244.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA3A643D45; Fri, 19 Aug 2005 05:19:42 +0000 (GMT) (envelope-from gordon@tetlows.org) Received: from [192.168.12.10] (cpe-66-75-147-245.san.res.rr.com [66.75.147.245]) (authenticated bits=0) by spiff.melthusia.org (8.12.10/8.12.10) with ESMTP id j7J5JcTJ040983; Thu, 18 Aug 2005 22:19:39 -0700 (PDT) (envelope-from gordon@tetlows.org) Message-ID: <43056CAC.6040105@tetlows.org> Date: Thu, 18 Aug 2005 22:22:52 -0700 From: Gordon Tetlow User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-arch@freebsd.org References: <200508120005.j7C05ARc090857@repoman.freebsd.org> <20050815053757.GB2660@green.homeunix.org> <20050815070033.GA8368@garage.freebsd.pl> <20050815125814.GC2660@green.homeunix.org> <20050816081644.GA3944@garage.freebsd.pl> <1124182906.2492.4.camel@buffy.york.ac.uk> <20050816095217.GB3944@garage.freebsd.pl> <43028269.50904@FreeBSD.org> <20050817084749.GC11066@garage.freebsd.pl> In-Reply-To: <20050817084749.GC11066@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Fri, 19 Aug 2005 11:43:45 +0000 Cc: cvs-src@freebsd.org, Doug Barton , src-committers@freebsd.org, cvs-all@freebsd.org, Gavin Atkinson Subject: Re: cvs commit: src/sys/geom/label g_label.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2005 05:19:43 -0000 Pawel Jakub Dawidek wrote: >On Tue, Aug 16, 2005 at 05:18:49PM -0700, Doug Barton wrote: >+> Pawel Jakub Dawidek wrote: >+> >+> >Because '/' creates a directory and I want each label to be represented >+> >only by one file. >+> >+> I think what people are saying is that they like the directory creating behavior. Can you explain your rationale in more detail? > >Actually, I don't really care. All I wanted was one label to be represented >by one single file. That's all. For me, leaving it as it is just asks for >troubles. > >I can live without this change, really. This is something I'd like to ask >about our TRB, but unfortunately it was retired yesterday:) > >CCing to freebsd-arch@. > >The question(s) is(are): Should we allow '/' in labels or should we replace >it with something (eg. '_')? Maybe we should only deny labels with '/../'? > > When I wrote GEOM_VOL_FFS, I wrote it with the idea that you could make a heirarchy of providers in /dev/vol. Coming from an environment where it wasn't unusual for a single machine to have 30 to 40 disk available to it, it seemed natural that we should allow administrators the ability to define how they wanted things mapped out. Now that I have just gone back and looked at the code that I wrote, I didn't allow non-alphanumerics in the volume name (although I actually didn't check it when creating the provider). I seem to recall making that decision specifically to get around the ../ tree traversal. Anyway, I think it comes down to tools, not policy. I think "/" should be allowed. -gordon From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 11:11:23 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E56AE16A41F for ; Sat, 20 Aug 2005 11:11:23 +0000 (GMT) (envelope-from ume@mahoroba.org) Received: from ameno.mahoroba.org (gw4.mahoroba.org [218.45.22.175]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4E81243D45 for ; Sat, 20 Aug 2005 11:11:22 +0000 (GMT) (envelope-from ume@mahoroba.org) Received: from kasuga.mahoroba.org (IDENT:9KyaRBe0nZrnDei3m94DOVna6+Mv+ACv1A/oUj+h3mJYF9JRBcah8RhBN+ZgOB2g@kasuga.mahoroba.org [IPv6:3ffe:501:185b:8010:20b:97ff:fe2e:b521]) (user=ume mech=CRAM-MD5 bits=0) by ameno.mahoroba.org (8.13.3/8.13.3) with ESMTP/inet6 id j7KBBGFB039643 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 20 Aug 2005 20:11:16 +0900 (JST) (envelope-from ume@mahoroba.org) Date: Sat, 20 Aug 2005 20:11:16 +0900 Message-ID: From: Hajimu UMEMOTO To: arch@FreeBSD.org User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (=?ISO-8859-4?Q?Sanj=F2?=) APEL/10.6 Emacs/22.0.50 (i386-unknown-freebsd6.0) MULE/5.0 (SAKAKI) X-Operating-System: FreeBSD 6.0-BETA2 X-PGP-Key: http://www.imasy.or.jp/~ume/publickey.asc X-PGP-Fingerprint: 1F00 0B9E 2164 70FC 6DC5 BF5F 04E9 F086 BF90 71FE Organization: Internet Mutual Aid Society, YOKOHAMA MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0 (ameno.mahoroba.org [IPv6:3ffe:501:185b:8010::1]); Sat, 20 Aug 2005 20:11:17 +0900 (JST) X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-Spam-Status: No, score=-5.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.0.4 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on ameno.mahoroba.org Cc: Subject: [CFR] reflect resolv.conf update to running application X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 11:11:24 -0000 Hi, Our resolver reads resolv.conf once, and never re-read it. Recent OpenBSD changed to re-read resolv.conf when it is updated. I believe it is useful specially for mobile environment. So, I made a patch for our resolver. Please review it. Index: lib/libc/net/getaddrinfo.c diff -u -p lib/libc/net/getaddrinfo.c.orig lib/libc/net/getaddrinfo.c --- lib/libc/net/getaddrinfo.c.orig Sat May 21 22:46:37 2005 +++ lib/libc/net/getaddrinfo.c Sat May 21 23:20:27 2005 @@ -2443,7 +2443,7 @@ res_searchN(name, target) int got_nodata = 0, got_servfail = 0, tried_as_is = 0; char abuf[MAXDNAME]; - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { + if (_res_init() == -1) { h_errno = NETDB_INTERNAL; return (-1); } Index: lib/libc/net/gethostbydns.c diff -u -p lib/libc/net/gethostbydns.c.orig lib/libc/net/gethostbydns.c --- lib/libc/net/gethostbydns.c.orig Sat May 21 22:46:37 2005 +++ lib/libc/net/gethostbydns.c Sat May 21 23:20:27 2005 @@ -538,10 +538,6 @@ _dns_gethostbyaddr(void *rval, void *cb_ he = va_arg(ap, struct hostent *); hed = va_arg(ap, struct hostent_data *); - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { - h_errno = NETDB_INTERNAL; - return NS_UNAVAIL; - } switch (af) { case AF_INET: (void) sprintf(qbuf, "%u.%u.%u.%u.in-addr.arpa", Index: lib/libc/net/gethostnamadr.c diff -u -p lib/libc/net/gethostnamadr.c.orig lib/libc/net/gethostnamadr.c --- lib/libc/net/gethostnamadr.c.orig Sat May 21 22:46:37 2005 +++ lib/libc/net/gethostnamadr.c Sat May 21 23:20:27 2005 @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD: src/lib/libc/net/get #include /* XXX hack for _res */ #include "un-namespace.h" #include "netdb_private.h" +#include "res_config.h" extern int _ht_gethostbyname(void *, void *, va_list); extern int _dns_gethostbyname(void *, void *, va_list); @@ -264,7 +265,7 @@ gethostbyaddr_r(const char *addr, int le { 0 } }; - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { + if (_res_init() == -1) { h_errno = NETDB_INTERNAL; return -1; } Index: lib/libc/net/getnetbydns.c diff -u -p lib/libc/net/getnetbydns.c.orig lib/libc/net/getnetbydns.c --- lib/libc/net/getnetbydns.c.orig Sat May 21 22:46:37 2005 +++ lib/libc/net/getnetbydns.c Sat May 21 23:20:27 2005 @@ -304,6 +304,10 @@ _dns_getnetbyaddr(void *rval, void *cb_d netbr[1], netbr[0]); break; } + if (_res_init() == -1) { + h_errno = NETDB_INTERNAL; + return NS_UNAVAIL; + } if ((buf = malloc(sizeof(*buf))) == NULL) { h_errno = NETDB_INTERNAL; return NS_NOTFOUND; Index: lib/libc/net/name6.c diff -u -p lib/libc/net/name6.c.orig lib/libc/net/name6.c --- lib/libc/net/name6.c.orig Sat May 21 22:46:38 2005 +++ lib/libc/net/name6.c Sat May 21 23:20:27 2005 @@ -121,6 +121,7 @@ __FBSDID("$FreeBSD: src/lib/libc/net/nam #include #include "un-namespace.h" #include "netdb_private.h" +#include "res_config.h" #ifndef _PATH_HOSTS #define _PATH_HOSTS "/etc/hosts" @@ -1810,11 +1811,9 @@ _dns_ghbyaddr(void *rval, void *cb_data, return NS_NOTFOUND; } - if ((_res.options & RES_INIT) == 0) { - if (res_init() < 0) { - *errp = h_errno; - return NS_UNAVAIL; - } + if (_res_init() < 0) { + *errp = h_errno; + return NS_UNAVAIL; } memset(&hbuf, 0, sizeof(hbuf)); hbuf.h_name = NULL; Index: lib/libc/net/res_config.h diff -u lib/libc/net/res_config.h.orig lib/libc/net/res_config.h --- lib/libc/net/res_config.h.orig Sat Mar 23 08:41:54 2002 +++ lib/libc/net/res_config.h Sat May 21 23:20:27 2005 @@ -8,3 +8,5 @@ #define MULTI_PTRS_ARE_ALIASES 1 /* fold multiple PTR records into aliases */ #define CHECK_SRVR_ADDR 1 /* confirm that the server requested sent the reply */ #define BIND_UPDATE 1 /* update support */ + +int _res_init(void); Index: lib/libc/net/res_init.c diff -u -p lib/libc/net/res_init.c.orig lib/libc/net/res_init.c --- lib/libc/net/res_init.c.orig Thu Feb 26 06:03:45 2004 +++ lib/libc/net/res_init.c Sat May 21 23:20:27 2005 @@ -78,11 +78,13 @@ __FBSDID("$FreeBSD: src/lib/libc/net/res #include #include #include +#include #include #include #include #include #include +#include #include #include #include @@ -100,6 +102,7 @@ __FBSDID("$FreeBSD: src/lib/libc/net/res #undef h_errno extern int h_errno; +static void res_readconf(void); static void res_setoptions(char *, char *); #ifdef RESOLVSORT @@ -112,6 +115,18 @@ static u_int32_t net_mask(struct in_addr # define isascii(c) (!(c & 0200)) #endif +#define timespecclear(tvp) ((tvp)->tv_sec = (tvp)->tv_nsec = 0) +#define timespeccmp(tvp, uvp, cmp) \ + (((tvp)->tv_sec == (uvp)->tv_sec) ? \ + ((tvp)->tv_nsec cmp (uvp)->tv_nsec) : \ + ((tvp)->tv_sec cmp (uvp)->tv_sec)) + +struct __res_conf_private { + struct timespec mtimespec; +}; + +static struct __res_conf_private *___res_conf_private(void); + /* * Check structure for failed per-thread allocations. */ @@ -119,6 +134,7 @@ static struct res_per_thread { struct __res_state res_state; struct __res_state_ext res_state_ext; struct __res_send_private res_send_private; + struct __res_conf_private res_conf_private; int h_errno; } _res_per_thread_bogus = { .res_send_private = { .s = -1 } }; /* socket */ @@ -146,21 +162,8 @@ static struct res_per_thread { int res_init() { - FILE *fp; struct __res_send_private *rsp; - char *cp, **pp; - int n; - char buf[MAXDNAME]; - int nserv = 0; /* number of nameserver records read from file */ - int haveenv = 0; - int havesearch = 0; -#ifdef RESOLVSORT - int nsort = 0; - char *net; -#endif -#ifndef RFC1535 - int dots; -#endif + char *cp; /* * If allocation of memory for this thread's resolver has failed, @@ -208,6 +211,37 @@ res_init() if (!_res.id) _res.id = res_randomid(); + _res.pfcode = 0; + res_readconf(); + + if (issetugid()) + _res.options |= RES_NOALIASES; + else if ((cp = getenv("RES_OPTIONS")) != NULL) + res_setoptions(cp, "env"); + _res.options |= RES_INIT; + return (0); +} + +static void +res_readconf() +{ + struct __res_conf_private *rcp; + FILE *fp; + char *cp, **pp; + int n; + char buf[MAXDNAME]; + int nserv = 0; /* number of nameserver records read from file */ + int haveenv = 0; + int havesearch = 0; +#ifdef RESOLVSORT + int nsort = 0; + char *net; +#endif +#ifndef RFC1535 + int dots; +#endif + struct stat sb; + #ifdef USELOOPBACK _res.nsaddr.sin_addr = inet_makeaddr(IN_LOOPBACKNET, 1); #else @@ -220,7 +254,6 @@ res_init() memcpy(&_res_ext.nsaddr, &_res.nsaddr, _res.nsaddr.sin_len); _res.nscount = 1; _res.ndots = 1; - _res.pfcode = 0; /* Allow user to override the local domain definition */ if (issetugid() == 0 && (cp = getenv("LOCALDOMAIN")) != NULL) { @@ -262,7 +295,10 @@ res_init() (line[sizeof(name) - 1] == ' ' || \ line[sizeof(name) - 1] == '\t')) + rcp = ___res_conf_private(); if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) { + if (fstat(fileno(fp), &sb) == 0) + rcp->mtimespec = sb.st_mtimespec; /* read the config file */ while (fgets(buf, sizeof(buf), fp) != NULL) { /* skip comments */ @@ -396,11 +432,11 @@ res_init() if (inet_aton(net, &a)) { _res.sort_list[nsort].mask = a.s_addr; } else { - _res.sort_list[nsort].mask = + _res.sort_list[nsort].mask = net_mask(_res.sort_list[nsort].addr); } } else { - _res.sort_list[nsort].mask = + _res.sort_list[nsort].mask = net_mask(_res.sort_list[nsort].addr); } _res_ext.sort_list[nsort].af = AF_INET; @@ -465,13 +501,14 @@ res_init() continue; } } - if (nserv > 1) + if (nserv > 1) _res.nscount = nserv; #ifdef RESOLVSORT _res.nsort = nsort; #endif (void) fclose(fp); - } + } else + timespecclear(&rcp->mtimespec); if (_res.defdname[0] == 0 && gethostname(buf, sizeof(_res.defdname) - 1) == 0 && (cp = strchr(buf, '.')) != NULL) @@ -507,12 +544,27 @@ res_init() #endif #endif /* !RFC1535 */ } +} - if (issetugid()) - _res.options |= RES_NOALIASES; - else if ((cp = getenv("RES_OPTIONS")) != NULL) - res_setoptions(cp, "env"); - _res.options |= RES_INIT; +int +_res_init(void) +{ + struct __res_conf_private *rcp; + struct stat sb; + + if ((_res.options & RES_INIT) == 0) + return (res_init()); + + if (stat(_PATH_RESCONF, &sb) == -1) { + /* + * Lost the file, in chroot? + * Don' trash settings + */ + return (0); + } + rcp = ___res_conf_private(); + if (timespeccmp(&sb.st_mtimespec, &rcp->mtimespec, !=)) + res_readconf(); return (0); } @@ -629,6 +681,7 @@ struct __res_state _res; #endif struct __res_state_ext _res_ext; static struct __res_send_private _res_send_private = { .s = -1 }; /* socket */ +static struct __res_conf_private _res_conf_private; static thread_key_t res_key; static once_t res_init_once = ONCE_INITIALIZER; @@ -697,6 +750,14 @@ ___res_send_private(void) if (thr_main() != 0) return (&_res_send_private); return (&allocate_res()->res_send_private); +} + +struct __res_conf_private * +___res_conf_private(void) +{ + if (thr_main() != 0) + return (&_res_conf_private); + return (&allocate_res()->res_conf_private); } int * Index: lib/libc/net/res_query.c diff -u -p lib/libc/net/res_query.c.orig lib/libc/net/res_query.c --- lib/libc/net/res_query.c.orig Sat May 21 22:46:39 2005 +++ lib/libc/net/res_query.c Sat May 21 23:20:27 2005 @@ -200,7 +200,7 @@ res_search(name, class, type, answer, an int trailing_dot, ret, saved_herrno; int got_nodata = 0, got_servfail = 0, tried_as_is = 0; - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { + if (_res_init() == -1) { h_errno = NETDB_INTERNAL; return (-1); } Index: lib/libc/net/res_update.c diff -u -p lib/libc/net/res_update.c.orig lib/libc/net/res_update.c --- lib/libc/net/res_update.c.orig Mon Sep 16 01:51:09 2002 +++ lib/libc/net/res_update.c Sat May 21 23:20:27 2005 @@ -36,6 +36,8 @@ __FBSDID("$FreeBSD: src/lib/libc/net/res #include #include +#include "res_config.h" + /* * Separate a linked list of records into groups so that all records * in a group will belong to a single zone on the nameserver. @@ -84,7 +86,7 @@ res_update(ns_updrec *rrecp_in) { u_int16_t dlen, class, qclass, type, qtype; u_int32_t ttl; - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { + if (_res_init() == -1) { h_errno = NETDB_INTERNAL; return (-1); } Sincerely, -- Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan ume@mahoroba.org ume@{,jp.}FreeBSD.org http://www.imasy.org/~ume/ From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 17:06:35 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8FC2516A41F; Sat, 20 Aug 2005 17:06:35 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 398C943D5F; Sat, 20 Aug 2005 17:06:32 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id A6B8052C99; Sat, 20 Aug 2005 19:06:30 +0200 (CEST) Received: from localhost (dkd118.neoplus.adsl.tpnet.pl [83.24.7.118]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 13A2F52C6F; Sat, 20 Aug 2005 19:06:24 +0200 (CEST) Date: Sat, 20 Aug 2005 19:06:08 +0200 From: Pawel Jakub Dawidek To: Gordon Tetlow Message-ID: <20050820170608.GB749@garage.freebsd.pl> References: <200508120005.j7C05ARc090857@repoman.freebsd.org> <20050815053757.GB2660@green.homeunix.org> <20050815070033.GA8368@garage.freebsd.pl> <20050815125814.GC2660@green.homeunix.org> <20050816081644.GA3944@garage.freebsd.pl> <1124182906.2492.4.camel@buffy.york.ac.uk> <20050816095217.GB3944@garage.freebsd.pl> <43028269.50904@FreeBSD.org> <20050817084749.GC11066@garage.freebsd.pl> <43056CAC.6040105@tetlows.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+pHx0qQiF2pBVqBT" Content-Disposition: inline In-Reply-To: <43056CAC.6040105@tetlows.org> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng devel (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=BAYES_00,RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: Doug Barton , Gavin Atkinson , cvs-src@freebsd.org, cvs-all@freebsd.org, src-committers@freebsd.org, freebsd-arch@freebsd.org Subject: Re: cvs commit: src/sys/geom/label g_label.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 17:06:36 -0000 --+pHx0qQiF2pBVqBT Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 18, 2005 at 10:22:52PM -0700, Gordon Tetlow wrote: +> >On Tue, Aug 16, 2005 at 05:18:49PM -0700, Doug Barton wrote: +> >+> Pawel Jakub Dawidek wrote: +> >+> +> >Because '/' creates a directory and I want each label to be repr= esented +> >+> >only by one file. +> >+> +> I think what people are saying is that they like the directory cr= eating behavior. Can you explain your rationale in more detail? +> >Actually, I don't really care. All I wanted was one label to be represe= nted +> >by one single file. That's all. For me, leaving it as it is just asks f= or +> >troubles. +> >I can live without this change, really. This is something I'd like to a= sk +> >about our TRB, but unfortunately it was retired yesterday:) +> >CCing to freebsd-arch@. +> >The question(s) is(are): Should we allow '/' in labels or should we rep= lace +> >it with something (eg. '_')? Maybe we should only deny labels with '/..= /'? +> >=20 +> When I wrote GEOM_VOL_FFS, I wrote it with the idea that you could make = a heirarchy of providers in /dev/vol. Coming from an environment where it w= asn't unusual for a=20 +> single machine to have 30 to 40 disk available to it, it seemed natural = that we should allow administrators the ability to define how they wanted t= hings mapped out. +>=20 +> Now that I have just gone back and looked at the code that I wrote, I di= dn't allow non-alphanumerics in the volume name (although I actually didn't= check it when creating=20 +> the provider). I seem to recall making that decision specifically to get= around the ../ tree traversal. +>=20 +> Anyway, I think it comes down to tools, not policy. I think "/" should b= e allowed. Ok, guys, I backed-out the change. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --+pHx0qQiF2pBVqBT Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFDB2MAForvXbEpPzQRAn3rAJ9jsi94iwYeWOdPjqezeM834jOPMwCcD6PY I91cBgIFDxwIi3opZnAxtpI= =OYbX -----END PGP SIGNATURE----- --+pHx0qQiF2pBVqBT-- From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 17:30:23 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4CE7316A420; Sat, 20 Aug 2005 17:30:23 +0000 (GMT) (envelope-from julian@elischer.org) Received: from delight.idiom.com (delight.idiom.com [216.240.32.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1375E43D46; Sat, 20 Aug 2005 17:30:22 +0000 (GMT) (envelope-from julian@elischer.org) Received: from idiom.com (idiom.com [216.240.32.1]) by delight.idiom.com (Postfix) with ESMTP id CF059208EBA; Sat, 20 Aug 2005 10:30:22 -0700 (PDT) Received: from [192.168.2.2] (home.elischer.org [216.240.48.38]) by idiom.com (8.12.11/8.12.11) with ESMTP id j7KHULQf007925; Sat, 20 Aug 2005 10:30:22 -0700 (PDT) (envelope-from julian@elischer.org) Message-ID: <430768AC.6020502@elischer.org> Date: Sat, 20 Aug 2005 10:30:20 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050424 X-Accept-Language: en, hu MIME-Version: 1.0 To: Hajimu UMEMOTO References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org Subject: Re: [CFR] reflect resolv.conf update to running application X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 17:30:23 -0000 Hajimu UMEMOTO wrote: > Hi, > > Our resolver reads resolv.conf once, and never re-read it. Recent > OpenBSD changed to re-read resolv.conf when it is updated. I believe > it is useful specially for mobile environment. So, I made a patch for > our resolver. Please review it. > From a very quick read I couldn't see when it does the update. does it do it of EVERY lookup or only after a failed lookup? From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 18:52:00 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 05A0216A41F for ; Sat, 20 Aug 2005 18:52:00 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mail25.sea5.speakeasy.net (mail25.sea5.speakeasy.net [69.17.117.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6732543D46 for ; Sat, 20 Aug 2005 18:51:59 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 28955 invoked from network); 20 Aug 2005 18:51:59 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender ) by mail25.sea5.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 20 Aug 2005 18:51:58 -0000 Received: from zion.baldwin.cx (zion.baldwin.cx [192.168.0.7]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j7KIppBO061000; Sat, 20 Aug 2005 14:51:51 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Sat, 20 Aug 2005 14:21:19 -0400 User-Agent: KMail/1.8 References: <430768AC.6020502@elischer.org> In-Reply-To: <430768AC.6020502@elischer.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200508201421.20498.jhb@FreeBSD.org> X-Spam-Status: No, score=-2.8 required=4.2 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx Cc: Julian Elischer , Hajimu UMEMOTO Subject: Re: [CFR] reflect resolv.conf update to running application X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 18:52:00 -0000 On Saturday 20 August 2005 01:30 pm, Julian Elischer wrote: > Hajimu UMEMOTO wrote: > > Hi, > > > > Our resolver reads resolv.conf once, and never re-read it. Recent > > OpenBSD changed to re-read resolv.conf when it is updated. I believe > > it is useful specially for mobile environment. So, I made a patch for > > our resolver. Please review it. > > From a very quick read I couldn't see when it does the update. > does it do it of EVERY lookup or only after a failed lookup? Looks like it checks to see if the file mod date has changed on every looku= p=20 since he changed the code to always call _res_init() instead of only when t= he=20 INIT flag was clear. =2D-=20 John Baldwin =A0<>< =A0http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" =A0=3D =A0http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 19:30:32 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E14516A41F; Sat, 20 Aug 2005 19:30:32 +0000 (GMT) (envelope-from ume@mahoroba.org) Received: from ameno.mahoroba.org (gw4.mahoroba.org [218.45.22.175]) by mx1.FreeBSD.org (Postfix) with ESMTP id EBF3543D45; Sat, 20 Aug 2005 19:30:31 +0000 (GMT) (envelope-from ume@mahoroba.org) Received: from kasuga.mahoroba.org (IDENT:VSptpYzK7BY72k9Srd3ZfIne5Em/9ItXr3P0/nft0cNjaK1AEm36/pWidwV/Vdln@kasuga.mahoroba.org [IPv6:3ffe:501:185b:8010:20b:97ff:fe2e:b521]) (user=ume mech=CRAM-MD5 bits=0) by ameno.mahoroba.org (8.13.3/8.13.3) with ESMTP/inet6 id j7KJUMGM020985 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 21 Aug 2005 04:30:22 +0900 (JST) (envelope-from ume@mahoroba.org) Date: Sun, 21 Aug 2005 04:30:22 +0900 Message-ID: From: Hajimu UMEMOTO To: John Baldwin In-Reply-To: <200508201421.20498.jhb@FreeBSD.org> References: <430768AC.6020502@elischer.org> <200508201421.20498.jhb@FreeBSD.org> User-Agent: xcite1.38> Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.7 (=?ISO-8859-4?Q?Sanj=F2?=) APEL/10.6 Emacs/22.0.50 (i386-unknown-freebsd6.0) MULE/5.0 (SAKAKI) X-Operating-System: FreeBSD 6.0-BETA2 X-PGP-Key: http://www.imasy.or.jp/~ume/publickey.asc X-PGP-Fingerprint: 1F00 0B9E 2164 70FC 6DC5 BF5F 04E9 F086 BF90 71FE Organization: Internet Mutual Aid Society, YOKOHAMA MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0 (ameno.mahoroba.org [IPv6:3ffe:501:185b:8010::1]); Sun, 21 Aug 2005 04:30:23 +0900 (JST) X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-Spam-Status: No, score=-5.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.0.4 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on ameno.mahoroba.org Cc: Julian Elischer , freebsd-arch@FreeBSD.org Subject: Re: [CFR] reflect resolv.conf update to running application X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 19:30:32 -0000 Hi, >>>>> On Sat, 20 Aug 2005 14:21:19 -0400 >>>>> John Baldwin said: jhb> Looks like it checks to see if the file mod date has changed on every lookup jhb> since he changed the code to always call _res_init() instead of only when the jhb> INIT flag was clear. Yes, it is right. Sincerely, -- Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan ume@mahoroba.org ume@{,jp.}FreeBSD.org http://www.imasy.org/~ume/ From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 22:39:54 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4F88616A41F for ; Sat, 20 Aug 2005 22:39:54 +0000 (GMT) (envelope-from ups@tree.com) Received: from smtp.speedfactory.net (smtp.speedfactory.net [66.23.216.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 69F3143D45 for ; Sat, 20 Aug 2005 22:39:53 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 4963 invoked by uid 210); 20 Aug 2005 22:40:26 +0000 Received: from 66.23.216.49 by talon (envelope-from , uid 201) with qmail-scanner-1.25st (clamdscan: 0.85.1/1034. spamassassin: 3.0.2. perlscan: 1.25st. Clear:RC:1(66.23.216.49):. Processed in 0.03338 secs); 20 Aug 2005 22:40:26 -0000 X-Qmail-Scanner-Mail-From: ups@tree.com via talon X-Qmail-Scanner: 1.25st (Clear:RC:1(66.23.216.49):. Processed in 0.03338 secs Process 4958) Received: from 66-23-216-49.clients.speedfactory.net (HELO palm.tree.com) (66.23.216.49) by smtp.speedfactory.net with AES256-SHA encrypted SMTP; 20 Aug 2005 22:40:26 +0000 Received: from [127.0.0.1] (ups@localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j7KMdnrK021451; Sat, 20 Aug 2005 18:39:51 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Luigi Rizzo In-Reply-To: <20050818073124.A87225@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> <200508180332.34895.max@love2party.net> <20050818005739.A83776@xorpc.icir.org> <1124374713.1360.64660.camel@palm> <20050818073124.A87225@xorpc.icir.org> Content-Type: text/plain Message-Id: <1124577589.1360.73337.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sat, 20 Aug 2005 18:39:49 -0400 Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Max Laier , net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 22:39:54 -0000 On Thu, 2005-08-18 at 10:31, Luigi Rizzo wrote: > On Thu, Aug 18, 2005 at 10:18:33AM -0400, Stephan Uphoff wrote: > > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > ... > > > In fact i don't understand why you consider spinning and sleeping > > > on a mutex two different things. > > > > The major difference between sleeping (cv_wait,msleep,..) and blocking > > on a mutex is priority inheritance. > > If you need to be able to use (non-spin) mutexes while holding a > > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you > > need to implement priority inheritance for [R|W]LOCKs. > > is that required (in FreeBSD, i mean) for algorithmic > correctness or just for performance ? Hi Luigi, It is theoretically required since otherwise low priority user threads (programs) could block system (interrupt) threads indefinitely long. Example: Extreme low priority (nice?) thread A holds read/write lock RW as reader Thread B is holding mutex M tries to acquire read/write lock RW as writer and sleeps. Thread C with better priority than A runs and enters a busy loop (in user space). Interrupt thread preempts C and tries to acquire Mutex M. Interrupt priority is propagated to B BUT NOT TO A. Interrupt thread blocks on Mutex M. Thread C resumes and will block thread I forever if it can keep a better priority than thread A. In practice you would probably just see bad latency every now and then and may never encounter a hang. Stephan From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 23:02:29 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AC7B316A41F for ; Sat, 20 Aug 2005 23:02:29 +0000 (GMT) (envelope-from ups@tree.com) Received: from smtp.speedfactory.net (smtp.speedfactory.net [66.23.216.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 58F9543D5A for ; Sat, 20 Aug 2005 23:02:27 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 13794 invoked by uid 210); 20 Aug 2005 23:03:00 +0000 Received: from 66.23.216.49 by talon (envelope-from , uid 201) with qmail-scanner-1.25st (clamdscan: 0.85.1/1034. spamassassin: 3.0.2. perlscan: 1.25st. Clear:RC:1(66.23.216.49):. Processed in 0.039084 secs); 20 Aug 2005 23:03:00 -0000 X-Qmail-Scanner-Mail-From: ups@tree.com via talon X-Qmail-Scanner: 1.25st (Clear:RC:1(66.23.216.49):. Processed in 0.039084 secs Process 13788) Received: from 66-23-216-49.clients.speedfactory.net (HELO palm.tree.com) (66.23.216.49) by smtp.speedfactory.net with AES256-SHA encrypted SMTP; 20 Aug 2005 23:03:00 +0000 Received: from [127.0.0.1] (ups@localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j7KN2NrK021532; Sat, 20 Aug 2005 19:02:24 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200508181026.35502.jhb@FreeBSD.org> References: <20050816170519.A74422@xorpc.icir.org> <20050818005739.A83776@xorpc.icir.org> <1124374713.1360.64660.camel@palm> <200508181026.35502.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1124578943.1360.73407.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sat, 20 Aug 2005 19:02:23 -0400 Content-Transfer-Encoding: 7bit Cc: net@FreeBSD.org, Max Laier , arch@FreeBSD.org, "freebsd-arch@freebsd.org" Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 23:02:29 -0000 On Thu, 2005-08-18 at 10:26, John Baldwin wrote: > On Thursday 18 August 2005 10:18 am, Stephan Uphoff wrote: > > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > > > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > > > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > > > > > > ... > > > > > > > > could you guys look at the following code and see if it makes sense, > > > > > or tell me where i am wrong ? > > > > > > > > > > It should solve the starvation and blocking trylock problems, > > > > > because the active reader does not hold the mutex in the critical > > > > > > ^^^^^^ > > > > > > i meant 'writer', sorry... as max said even in the current implementation > > > the reader does not hold the lock. > > > > > > > > int > > > > > RLOCK(struct rwlock *rwl, int try) > > > > > { > > > > > if (!try) > > > > > mtx_lock(&rwl->m); > > > > > else if (!mtx_trylock(&rwl->m)) > > > > > return EBUSY; > > > > > if (rwl->writers == 0) /* no writer, pass */ > > > > > rwl->readers++; > > > > > else { > > > > > rwl->br++; > > > > > cv_wait(&rwl->qr, &rwl->m); > > > > > > > > ^^^^^^^ > > > > > > > > That we can't do. That's exactly the thing the existing sx(9) > > > > implementation does and where it breaks. The problem is that cv_wait() > > > > is an implicit sleep which breaks when we try to RLOCK() with other > > > > mutex already acquired. > > > > > > but that is not a solvable problem given that the *LOCK may be blocking. > > > And the cv_wait is not an unconditioned sleep, it is one where you > > > release the lock right before ans wait for an event to wake you up. > > > In fact i don't understand why you consider spinning and sleeping > > > on a mutex two different things. > > > > The major difference between sleeping (cv_wait,msleep,..) and blocking > > on a mutex is priority inheritance. > > If you need to be able to use (non-spin) mutexes while holding a > > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you > > need to implement priority inheritance for [R|W]LOCKs. > > For the (single) write lock holder tracking priority is easy. (just like > > a (non-spin) mutex). However priority inheritance to multiple readers is > > more difficult as one needs to keep track of all holders of the lock. > > Keeping track of all readers requires pre-allocated memory resources. > > This memory could come from > > 1) A limited global pool > > 2) A limited per [R|W]LOCK pool > > 3) A limited per thread pool > > 4) As a parameter for acquiring a RLOCK > > None of the choices are really pretty. > > (1),(2) and (3) can lead to limiting reader parallelism when running out > > of resources. (4) may be practically for some cases since the memory > > could be allocated from stack. However since the memory must be valid > > while holding a read lock choice (4) makes some algorithms (than use for > > example lock crabbing) a bit harder to implement. > > Solaris handles the read case by only tracking the first thread to get a read > lock (referred to as the "owner of record" IIRC) and only propagating > priority to that thread and ignoring other readers. They admit it's not > perfect as well. That's mentioned in the Solaris Internals book. Hi John, If I recall correctly Solaris user threads get a better "system priority" when running/blocking in the kernel. This prevents threads holding critical kernel resources from being starved by user processes. I think this would also be a good idea for FreeBSD to prevent unbound priority inversions caused by vnode locks and other non-tracked resources. However even with these changes I don't think the Solaris implementation is the right way to go if we want to freely mix mutexes and [R|W]locks. Stephan From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 23:02:29 2005 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B72F816A420 for ; Sat, 20 Aug 2005 23:02:29 +0000 (GMT) (envelope-from ups@tree.com) Received: from smtp.speedfactory.net (smtp.speedfactory.net [66.23.216.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4EF3443D58 for ; Sat, 20 Aug 2005 23:02:27 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 13794 invoked by uid 210); 20 Aug 2005 23:03:00 +0000 Received: from 66.23.216.49 by talon (envelope-from , uid 201) with qmail-scanner-1.25st (clamdscan: 0.85.1/1034. spamassassin: 3.0.2. perlscan: 1.25st. Clear:RC:1(66.23.216.49):. Processed in 0.039084 secs); 20 Aug 2005 23:03:00 -0000 X-Qmail-Scanner-Mail-From: ups@tree.com via talon X-Qmail-Scanner: 1.25st (Clear:RC:1(66.23.216.49):. Processed in 0.039084 secs Process 13788) Received: from 66-23-216-49.clients.speedfactory.net (HELO palm.tree.com) (66.23.216.49) by smtp.speedfactory.net with AES256-SHA encrypted SMTP; 20 Aug 2005 23:03:00 +0000 Received: from [127.0.0.1] (ups@localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j7KN2NrK021532; Sat, 20 Aug 2005 19:02:24 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200508181026.35502.jhb@FreeBSD.org> References: <20050816170519.A74422@xorpc.icir.org> <20050818005739.A83776@xorpc.icir.org> <1124374713.1360.64660.camel@palm> <200508181026.35502.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1124578943.1360.73407.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sat, 20 Aug 2005 19:02:23 -0400 Content-Transfer-Encoding: 7bit Cc: net@FreeBSD.org, Max Laier , arch@FreeBSD.org, "freebsd-arch@freebsd.org" Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 23:02:29 -0000 On Thu, 2005-08-18 at 10:26, John Baldwin wrote: > On Thursday 18 August 2005 10:18 am, Stephan Uphoff wrote: > > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > > > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > > > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > > > > > > ... > > > > > > > > could you guys look at the following code and see if it makes sense, > > > > > or tell me where i am wrong ? > > > > > > > > > > It should solve the starvation and blocking trylock problems, > > > > > because the active reader does not hold the mutex in the critical > > > > > > ^^^^^^ > > > > > > i meant 'writer', sorry... as max said even in the current implementation > > > the reader does not hold the lock. > > > > > > > > int > > > > > RLOCK(struct rwlock *rwl, int try) > > > > > { > > > > > if (!try) > > > > > mtx_lock(&rwl->m); > > > > > else if (!mtx_trylock(&rwl->m)) > > > > > return EBUSY; > > > > > if (rwl->writers == 0) /* no writer, pass */ > > > > > rwl->readers++; > > > > > else { > > > > > rwl->br++; > > > > > cv_wait(&rwl->qr, &rwl->m); > > > > > > > > ^^^^^^^ > > > > > > > > That we can't do. That's exactly the thing the existing sx(9) > > > > implementation does and where it breaks. The problem is that cv_wait() > > > > is an implicit sleep which breaks when we try to RLOCK() with other > > > > mutex already acquired. > > > > > > but that is not a solvable problem given that the *LOCK may be blocking. > > > And the cv_wait is not an unconditioned sleep, it is one where you > > > release the lock right before ans wait for an event to wake you up. > > > In fact i don't understand why you consider spinning and sleeping > > > on a mutex two different things. > > > > The major difference between sleeping (cv_wait,msleep,..) and blocking > > on a mutex is priority inheritance. > > If you need to be able to use (non-spin) mutexes while holding a > > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you > > need to implement priority inheritance for [R|W]LOCKs. > > For the (single) write lock holder tracking priority is easy. (just like > > a (non-spin) mutex). However priority inheritance to multiple readers is > > more difficult as one needs to keep track of all holders of the lock. > > Keeping track of all readers requires pre-allocated memory resources. > > This memory could come from > > 1) A limited global pool > > 2) A limited per [R|W]LOCK pool > > 3) A limited per thread pool > > 4) As a parameter for acquiring a RLOCK > > None of the choices are really pretty. > > (1),(2) and (3) can lead to limiting reader parallelism when running out > > of resources. (4) may be practically for some cases since the memory > > could be allocated from stack. However since the memory must be valid > > while holding a read lock choice (4) makes some algorithms (than use for > > example lock crabbing) a bit harder to implement. > > Solaris handles the read case by only tracking the first thread to get a read > lock (referred to as the "owner of record" IIRC) and only propagating > priority to that thread and ignoring other readers. They admit it's not > perfect as well. That's mentioned in the Solaris Internals book. Hi John, If I recall correctly Solaris user threads get a better "system priority" when running/blocking in the kernel. This prevents threads holding critical kernel resources from being starved by user processes. I think this would also be a good idea for FreeBSD to prevent unbound priority inversions caused by vnode locks and other non-tracked resources. However even with these changes I don't think the Solaris implementation is the right way to go if we want to freely mix mutexes and [R|W]locks. Stephan From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 23:33:04 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B9B1916A41F; Sat, 20 Aug 2005 23:33:04 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0BC7743D45; Sat, 20 Aug 2005 23:33:03 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with ESMTP id 26F2346B46; Sat, 20 Aug 2005 19:33:02 -0400 (EDT) Date: Sun, 21 Aug 2005 00:37:56 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Hajimu UMEMOTO In-Reply-To: Message-ID: <20050821003536.P14178@fledge.watson.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: Re: [CFR] reflect resolv.conf update to running application X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2005 23:33:04 -0000 On Sat, 20 Aug 2005, Hajimu UMEMOTO wrote: > Our resolver reads resolv.conf once, and never re-read it. Recent > OpenBSD changed to re-read resolv.conf when it is updated. I believe it > is useful specially for mobile environment. So, I made a patch for our > resolver. Please review it. Two concerns: (1) Has anyone characterized the significant of the cost of doing a stat() for every DNS lookup for a significant workload? Does it matter? Most performance-critical network paths don't do name lookups in order to prevent indefinite stalls in lookup, but doing, say, 1,000 additional stats a second is not a small issue. (2) By reading the configuration file more frequently and more quickly after a change, we increase the chances of a race condition in which the resolve reads a partially written resolv.conf file during an update. Does this happen in practice? I've always been very leery of re-reading configuration files automatically based on a time-stamp, as updates to files are not atomic at all. Robert N M Watson > > Index: lib/libc/net/getaddrinfo.c > diff -u -p lib/libc/net/getaddrinfo.c.orig lib/libc/net/getaddrinfo.c > --- lib/libc/net/getaddrinfo.c.orig Sat May 21 22:46:37 2005 > +++ lib/libc/net/getaddrinfo.c Sat May 21 23:20:27 2005 > @@ -2443,7 +2443,7 @@ res_searchN(name, target) > int got_nodata = 0, got_servfail = 0, tried_as_is = 0; > char abuf[MAXDNAME]; > > - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { > + if (_res_init() == -1) { > h_errno = NETDB_INTERNAL; > return (-1); > } > Index: lib/libc/net/gethostbydns.c > diff -u -p lib/libc/net/gethostbydns.c.orig lib/libc/net/gethostbydns.c > --- lib/libc/net/gethostbydns.c.orig Sat May 21 22:46:37 2005 > +++ lib/libc/net/gethostbydns.c Sat May 21 23:20:27 2005 > @@ -538,10 +538,6 @@ _dns_gethostbyaddr(void *rval, void *cb_ > he = va_arg(ap, struct hostent *); > hed = va_arg(ap, struct hostent_data *); > > - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { > - h_errno = NETDB_INTERNAL; > - return NS_UNAVAIL; > - } > switch (af) { > case AF_INET: > (void) sprintf(qbuf, "%u.%u.%u.%u.in-addr.arpa", > Index: lib/libc/net/gethostnamadr.c > diff -u -p lib/libc/net/gethostnamadr.c.orig lib/libc/net/gethostnamadr.c > --- lib/libc/net/gethostnamadr.c.orig Sat May 21 22:46:37 2005 > +++ lib/libc/net/gethostnamadr.c Sat May 21 23:20:27 2005 > @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD: src/lib/libc/net/get > #include /* XXX hack for _res */ > #include "un-namespace.h" > #include "netdb_private.h" > +#include "res_config.h" > > extern int _ht_gethostbyname(void *, void *, va_list); > extern int _dns_gethostbyname(void *, void *, va_list); > @@ -264,7 +265,7 @@ gethostbyaddr_r(const char *addr, int le > { 0 } > }; > > - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { > + if (_res_init() == -1) { > h_errno = NETDB_INTERNAL; > return -1; > } > Index: lib/libc/net/getnetbydns.c > diff -u -p lib/libc/net/getnetbydns.c.orig lib/libc/net/getnetbydns.c > --- lib/libc/net/getnetbydns.c.orig Sat May 21 22:46:37 2005 > +++ lib/libc/net/getnetbydns.c Sat May 21 23:20:27 2005 > @@ -304,6 +304,10 @@ _dns_getnetbyaddr(void *rval, void *cb_d > netbr[1], netbr[0]); > break; > } > + if (_res_init() == -1) { > + h_errno = NETDB_INTERNAL; > + return NS_UNAVAIL; > + } > if ((buf = malloc(sizeof(*buf))) == NULL) { > h_errno = NETDB_INTERNAL; > return NS_NOTFOUND; > Index: lib/libc/net/name6.c > diff -u -p lib/libc/net/name6.c.orig lib/libc/net/name6.c > --- lib/libc/net/name6.c.orig Sat May 21 22:46:38 2005 > +++ lib/libc/net/name6.c Sat May 21 23:20:27 2005 > @@ -121,6 +121,7 @@ __FBSDID("$FreeBSD: src/lib/libc/net/nam > #include > #include "un-namespace.h" > #include "netdb_private.h" > +#include "res_config.h" > > #ifndef _PATH_HOSTS > #define _PATH_HOSTS "/etc/hosts" > @@ -1810,11 +1811,9 @@ _dns_ghbyaddr(void *rval, void *cb_data, > return NS_NOTFOUND; > } > > - if ((_res.options & RES_INIT) == 0) { > - if (res_init() < 0) { > - *errp = h_errno; > - return NS_UNAVAIL; > - } > + if (_res_init() < 0) { > + *errp = h_errno; > + return NS_UNAVAIL; > } > memset(&hbuf, 0, sizeof(hbuf)); > hbuf.h_name = NULL; > Index: lib/libc/net/res_config.h > diff -u lib/libc/net/res_config.h.orig lib/libc/net/res_config.h > --- lib/libc/net/res_config.h.orig Sat Mar 23 08:41:54 2002 > +++ lib/libc/net/res_config.h Sat May 21 23:20:27 2005 > @@ -8,3 +8,5 @@ > #define MULTI_PTRS_ARE_ALIASES 1 /* fold multiple PTR records into aliases */ > #define CHECK_SRVR_ADDR 1 /* confirm that the server requested sent the reply */ > #define BIND_UPDATE 1 /* update support */ > + > +int _res_init(void); > Index: lib/libc/net/res_init.c > diff -u -p lib/libc/net/res_init.c.orig lib/libc/net/res_init.c > --- lib/libc/net/res_init.c.orig Thu Feb 26 06:03:45 2004 > +++ lib/libc/net/res_init.c Sat May 21 23:20:27 2005 > @@ -78,11 +78,13 @@ __FBSDID("$FreeBSD: src/lib/libc/net/res > #include > #include > #include > +#include > #include > #include > #include > #include > #include > +#include > #include > #include > #include > @@ -100,6 +102,7 @@ __FBSDID("$FreeBSD: src/lib/libc/net/res > #undef h_errno > extern int h_errno; > > +static void res_readconf(void); > static void res_setoptions(char *, char *); > > #ifdef RESOLVSORT > @@ -112,6 +115,18 @@ static u_int32_t net_mask(struct in_addr > # define isascii(c) (!(c & 0200)) > #endif > > +#define timespecclear(tvp) ((tvp)->tv_sec = (tvp)->tv_nsec = 0) > +#define timespeccmp(tvp, uvp, cmp) \ > + (((tvp)->tv_sec == (uvp)->tv_sec) ? \ > + ((tvp)->tv_nsec cmp (uvp)->tv_nsec) : \ > + ((tvp)->tv_sec cmp (uvp)->tv_sec)) > + > +struct __res_conf_private { > + struct timespec mtimespec; > +}; > + > +static struct __res_conf_private *___res_conf_private(void); > + > /* > * Check structure for failed per-thread allocations. > */ > @@ -119,6 +134,7 @@ static struct res_per_thread { > struct __res_state res_state; > struct __res_state_ext res_state_ext; > struct __res_send_private res_send_private; > + struct __res_conf_private res_conf_private; > int h_errno; > } _res_per_thread_bogus = { .res_send_private = { .s = -1 } }; /* socket */ > > @@ -146,21 +162,8 @@ static struct res_per_thread { > int > res_init() > { > - FILE *fp; > struct __res_send_private *rsp; > - char *cp, **pp; > - int n; > - char buf[MAXDNAME]; > - int nserv = 0; /* number of nameserver records read from file */ > - int haveenv = 0; > - int havesearch = 0; > -#ifdef RESOLVSORT > - int nsort = 0; > - char *net; > -#endif > -#ifndef RFC1535 > - int dots; > -#endif > + char *cp; > > /* > * If allocation of memory for this thread's resolver has failed, > @@ -208,6 +211,37 @@ res_init() > if (!_res.id) > _res.id = res_randomid(); > > + _res.pfcode = 0; > + res_readconf(); > + > + if (issetugid()) > + _res.options |= RES_NOALIASES; > + else if ((cp = getenv("RES_OPTIONS")) != NULL) > + res_setoptions(cp, "env"); > + _res.options |= RES_INIT; > + return (0); > +} > + > +static void > +res_readconf() > +{ > + struct __res_conf_private *rcp; > + FILE *fp; > + char *cp, **pp; > + int n; > + char buf[MAXDNAME]; > + int nserv = 0; /* number of nameserver records read from file */ > + int haveenv = 0; > + int havesearch = 0; > +#ifdef RESOLVSORT > + int nsort = 0; > + char *net; > +#endif > +#ifndef RFC1535 > + int dots; > +#endif > + struct stat sb; > + > #ifdef USELOOPBACK > _res.nsaddr.sin_addr = inet_makeaddr(IN_LOOPBACKNET, 1); > #else > @@ -220,7 +254,6 @@ res_init() > memcpy(&_res_ext.nsaddr, &_res.nsaddr, _res.nsaddr.sin_len); > _res.nscount = 1; > _res.ndots = 1; > - _res.pfcode = 0; > > /* Allow user to override the local domain definition */ > if (issetugid() == 0 && (cp = getenv("LOCALDOMAIN")) != NULL) { > @@ -262,7 +295,10 @@ res_init() > (line[sizeof(name) - 1] == ' ' || \ > line[sizeof(name) - 1] == '\t')) > > + rcp = ___res_conf_private(); > if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) { > + if (fstat(fileno(fp), &sb) == 0) > + rcp->mtimespec = sb.st_mtimespec; > /* read the config file */ > while (fgets(buf, sizeof(buf), fp) != NULL) { > /* skip comments */ > @@ -396,11 +432,11 @@ res_init() > if (inet_aton(net, &a)) { > _res.sort_list[nsort].mask = a.s_addr; > } else { > - _res.sort_list[nsort].mask = > + _res.sort_list[nsort].mask = > net_mask(_res.sort_list[nsort].addr); > } > } else { > - _res.sort_list[nsort].mask = > + _res.sort_list[nsort].mask = > net_mask(_res.sort_list[nsort].addr); > } > _res_ext.sort_list[nsort].af = AF_INET; > @@ -465,13 +501,14 @@ res_init() > continue; > } > } > - if (nserv > 1) > + if (nserv > 1) > _res.nscount = nserv; > #ifdef RESOLVSORT > _res.nsort = nsort; > #endif > (void) fclose(fp); > - } > + } else > + timespecclear(&rcp->mtimespec); > if (_res.defdname[0] == 0 && > gethostname(buf, sizeof(_res.defdname) - 1) == 0 && > (cp = strchr(buf, '.')) != NULL) > @@ -507,12 +544,27 @@ res_init() > #endif > #endif /* !RFC1535 */ > } > +} > > - if (issetugid()) > - _res.options |= RES_NOALIASES; > - else if ((cp = getenv("RES_OPTIONS")) != NULL) > - res_setoptions(cp, "env"); > - _res.options |= RES_INIT; > +int > +_res_init(void) > +{ > + struct __res_conf_private *rcp; > + struct stat sb; > + > + if ((_res.options & RES_INIT) == 0) > + return (res_init()); > + > + if (stat(_PATH_RESCONF, &sb) == -1) { > + /* > + * Lost the file, in chroot? > + * Don' trash settings > + */ > + return (0); > + } > + rcp = ___res_conf_private(); > + if (timespeccmp(&sb.st_mtimespec, &rcp->mtimespec, !=)) > + res_readconf(); > return (0); > } > > @@ -629,6 +681,7 @@ struct __res_state _res; > #endif > struct __res_state_ext _res_ext; > static struct __res_send_private _res_send_private = { .s = -1 }; /* socket */ > +static struct __res_conf_private _res_conf_private; > > static thread_key_t res_key; > static once_t res_init_once = ONCE_INITIALIZER; > @@ -697,6 +750,14 @@ ___res_send_private(void) > if (thr_main() != 0) > return (&_res_send_private); > return (&allocate_res()->res_send_private); > +} > + > +struct __res_conf_private * > +___res_conf_private(void) > +{ > + if (thr_main() != 0) > + return (&_res_conf_private); > + return (&allocate_res()->res_conf_private); > } > > int * > Index: lib/libc/net/res_query.c > diff -u -p lib/libc/net/res_query.c.orig lib/libc/net/res_query.c > --- lib/libc/net/res_query.c.orig Sat May 21 22:46:39 2005 > +++ lib/libc/net/res_query.c Sat May 21 23:20:27 2005 > @@ -200,7 +200,7 @@ res_search(name, class, type, answer, an > int trailing_dot, ret, saved_herrno; > int got_nodata = 0, got_servfail = 0, tried_as_is = 0; > > - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { > + if (_res_init() == -1) { > h_errno = NETDB_INTERNAL; > return (-1); > } > Index: lib/libc/net/res_update.c > diff -u -p lib/libc/net/res_update.c.orig lib/libc/net/res_update.c > --- lib/libc/net/res_update.c.orig Mon Sep 16 01:51:09 2002 > +++ lib/libc/net/res_update.c Sat May 21 23:20:27 2005 > @@ -36,6 +36,8 @@ __FBSDID("$FreeBSD: src/lib/libc/net/res > #include > #include > > +#include "res_config.h" > + > /* > * Separate a linked list of records into groups so that all records > * in a group will belong to a single zone on the nameserver. > @@ -84,7 +86,7 @@ res_update(ns_updrec *rrecp_in) { > u_int16_t dlen, class, qclass, type, qtype; > u_int32_t ttl; > > - if ((_res.options & RES_INIT) == 0 && res_init() == -1) { > + if (_res_init() == -1) { > h_errno = NETDB_INTERNAL; > return (-1); > } > > > Sincerely, > > -- > Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan > ume@mahoroba.org ume@{,jp.}FreeBSD.org > http://www.imasy.org/~ume/ > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >