From owner-freebsd-arch@FreeBSD.ORG Thu Aug 18 14:18:38 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AD22116A41F for ; Thu, 18 Aug 2005 14:18:38 +0000 (GMT) (envelope-from ups@tree.com) Received: from smtp.speedfactory.net (smtp.speedfactory.net [66.23.216.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 84B9543D48 for ; Thu, 18 Aug 2005 14:18:37 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 9936 invoked by uid 210); 18 Aug 2005 14:19:00 +0000 Received: from 66.23.216.49 by talon (envelope-from , uid 201) with qmail-scanner-1.25st (clamdscan: 0.85.1/1030. spamassassin: 3.0.2. perlscan: 1.25st. Clear:RC:1(66.23.216.49):. Processed in 0.051598 secs); 18 Aug 2005 14:19:00 -0000 X-Qmail-Scanner-Mail-From: ups@tree.com via talon X-Qmail-Scanner: 1.25st (Clear:RC:1(66.23.216.49):. Processed in 0.051598 secs Process 9926) Received: from 66-23-216-49.clients.speedfactory.net (HELO palm.tree.com) (66.23.216.49) by smtp.speedfactory.net with AES256-SHA encrypted SMTP; 18 Aug 2005 14:19:00 +0000 Received: from [127.0.0.1] (ups@localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j7IEIXrK007363; Thu, 18 Aug 2005 10:18:35 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Luigi Rizzo In-Reply-To: <20050818005739.A83776@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> <200508180332.34895.max@love2party.net> <20050818005739.A83776@xorpc.icir.org> Content-Type: text/plain Message-Id: <1124374713.1360.64660.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Thu, 18 Aug 2005 10:18:33 -0400 Content-Transfer-Encoding: 7bit Cc: Max Laier , net@freebsd.org, arch@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 14:18:38 -0000 On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote: > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote: > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > ... > > > could you guys look at the following code and see if it makes sense, > > > or tell me where i am wrong ? > > > > > > It should solve the starvation and blocking trylock problems, > > > because the active reader does not hold the mutex in the critical > ^^^^^^ > > i meant 'writer', sorry... as max said even in the current implementation > the reader does not hold the lock. > > > > int > > > RLOCK(struct rwlock *rwl, int try) > > > { > > > if (!try) > > > mtx_lock(&rwl->m); > > > else if (!mtx_trylock(&rwl->m)) > > > return EBUSY; > > > if (rwl->writers == 0) /* no writer, pass */ > > > rwl->readers++; > > > else { > > > rwl->br++; > > > cv_wait(&rwl->qr, &rwl->m); > > ^^^^^^^ > > > > That we can't do. That's exactly the thing the existing sx(9) implementation > > does and where it breaks. The problem is that cv_wait() is an implicit sleep > > which breaks when we try to RLOCK() with other mutex already acquired. > > but that is not a solvable problem given that the *LOCK may be blocking. > And the cv_wait is not an unconditioned sleep, it is one where you release > the lock right before ans wait for an event to wake you up. > In fact i don't understand why you consider spinning and sleeping > on a mutex two different things. The major difference between sleeping (cv_wait,msleep,..) and blocking on a mutex is priority inheritance. If you need to be able to use (non-spin) mutexes while holding a [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you need to implement priority inheritance for [R|W]LOCKs. For the (single) write lock holder tracking priority is easy. (just like a (non-spin) mutex). However priority inheritance to multiple readers is more difficult as one needs to keep track of all holders of the lock. Keeping track of all readers requires pre-allocated memory resources. This memory could come from 1) A limited global pool 2) A limited per [R|W]LOCK pool 3) A limited per thread pool 4) As a parameter for acquiring a RLOCK None of the choices are really pretty. (1),(2) and (3) can lead to limiting reader parallelism when running out of resources. (4) may be practically for some cases since the memory could be allocated from stack. However since the memory must be valid while holding a read lock choice (4) makes some algorithms (than use for example lock crabbing) a bit harder to implement. > > Moreover will this break for recursive reads e.g.: > > > > Thread 1: RLOCK() ... RLOCK() -> cv_wait ... > > Thread 2: WLOCK() -> cv_wait ... > > > > This is exactly what pfil_hooks must be able to do as the packet filter may > > want to call back to the stack in order to send rejects etc. > > that's another story (also in issue in ipfw) and the way it is addressed > elsewhere is by releasing and reaquiring the lock. In fact is the topic > that started this thread. > > > change). The problem is, that we still have to invest 4 mutex operations for > > every access. The current implementation has the same basic problem (though > > it only uses 2 mutex operations for the WLOCK/UNLOCK). Ideally the RLOCK/ > > UNLOCK should be free unless there is a writer waiting for the lock. In > > "free unless" means not free - you always have to check, be it through > an atomic cmpswap or something else. But this is what mtx_lock does > anyways in the fast path. > > cheers > luigi >