From owner-freebsd-net@FreeBSD.ORG Thu Aug 18 00:02:49 2005 Return-Path: X-Original-To: net@freebsd.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CB86C16A41F; Thu, 18 Aug 2005 00:02:49 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 849D443D49; Thu, 18 Aug 2005 00:02:49 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j7I02mEp072251; Wed, 17 Aug 2005 17:02:48 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j7I02mBM072250; Wed, 17 Aug 2005 17:02:48 -0700 (PDT) (envelope-from rizzo) Date: Wed, 17 Aug 2005 17:02:48 -0700 From: Luigi Rizzo To: Max Laier Message-ID: <20050817170248.A70991@xorpc.icir.org> References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200508170435.34688.max@love2party.net>; from max@love2party.net on Wed, Aug 17, 2005 at 04:35:19AM +0200 Cc: arch@freebsd.org, net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 00:02:50 -0000 On Wed, Aug 17, 2005 at 04:35:19AM +0200, Max Laier wrote: > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > > [apologies for the cross post but it belongs both to arch and net.] > > > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of > > aisimilar but slightly different implementation of > > multiple-reader/single-writer locks, which brings up the question(s): > > > > 1. should we rather put this code in the generic kernel code so that other > > subsystems could make use of it ? E.g. the routing table is certainly > > a candidate, > > I have asked this several time on -arch and IRC, but never found anyone > willing to pursue it. However, the problem is ... > > > and especially > > > > 2. should we implement it right ? > > > > Both implementations are subject to starvation for the writers > > (which is indeed a problem here, because we might want to modify > > a ruleset and be prevented from doing it because of incoming traffic > > that keeps readers active). > > Also the PFIL_TRY_WLOCK will in fact be blocking if a writer > > is already in - i have no idea how problematic is this in the > > way it is actually used. > >... really this. I didn't find a clean way out of the starvation issue. What > I do for pfil is that I set a flag and simply stop serving[2] shared requests > once a writer waits for the lock. If a writer can't sleep[1] then we return > EBUSY and don't. However, for pfil it's almost ever safe to assume that a > write may sleep (as it is for most instances of this kind of sx-lock where > you have BIGNUMxreads:1xwrite). could you guys look at the following code and see if it makes sense, or tell me where i am wrong ? It should solve the starvation and blocking trylock problems, because the active reader does not hold the mutex in the critical section anymore. The lock could well be a spinlock. cheers luigi /* * Implementation of multiple reader-single writer that prevents starvation. * Luigi Rizzo 2005.08.19 * * The mutex m only protects accesses to the struct rwlock. * We can have the following states: * IDLE: readers = 0, writers = 0, br = 0; * any request will be granted immediately. * * READ: readers > 0, writers = 0, br = 0. Read in progress. * Grant read requests immediately, queue write requests and * move to READ1. * When last reader terminates, move to IDLE. * * READ1: readers > 0, writers > 0, br >= 0. * Read in progress, but writers are queued. * Queue read and write requests to qr and wr, respectively. * When the last reader terminates, wakeup the next queued writer * and move to WRITE * * WRITE: readers = 0, writers > 0, br >= 0. * Write in progress, possibly queued readers/writers. * Queue read and write requests to qr and wr, respectively. * When the writer terminates, wake up all readers if any, * otherwise wake up the next writer if any. * Move to READ, READ1, IDLE accordingly. */ struct rwlock { mtx m; /* protects access to the rwlock */ int readers; /* active readers */ int br; /* blocked readers */ int writers; /* active + blocked writers */ cv qr; /* queued readers */ cv qw; /* queued writers */ } int RLOCK(struct rwlock *rwl, int try) { if (!try) mtx_lock(&rwl->m); else if (!mtx_trylock(&rwl->m)) return EBUSY; if (rwl->writers == 0) /* no writer, pass */ rwl->readers++; else { rwl->br++; cv_wait(&rwl->qr, &rwl->m); } mtx_unlock(&rwl->m); return 0; } int WLOCK(struct rwlock *rwl, int try) { if (!try) mtx_lock(&rwl->m); else if (!mtx_trylock(&rwl->m)) return EBUSY; rwl->writers++; if (rwl->readers > 0) /* have readers, must wait */ cv_wait(&rwl->qw, &rwl->m); mtx_unlock(&rwl->m); return 0; } void RUNLOCK(struct rwlock *rwl) { mtx_lock(&rwl->m); rwl->readers--; if (rwl->readers == 0 && rwl->writers > 0) cv_signal(&rwl->qw); mtx_unlock(&rwl->m); } void WUNLOCK(struct rwlock *rwl) { mtx_lock(&rwl->m); rwl->writers--; if (rwl->br > 0) { /* priority to readers */ rwl->readers = rwl->br; rwl->br = 0; cv_broadcast(&rwl->qr); } else if (rwl->writers > 0) cv_signal(&rwl->qw); mtx_unlock(&rwl->m); }