From owner-freebsd-net@FreeBSD.ORG Thu Aug 18 01:32:42 2005 Return-Path: X-Original-To: net@freebsd.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5974A16A41F; Thu, 18 Aug 2005 01:32:42 +0000 (GMT) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.FreeBSD.org (Postfix) with ESMTP id A442B43D48; Thu, 18 Aug 2005 01:32:41 +0000 (GMT) (envelope-from max@love2party.net) Received: from p54A3E589.dip.t-dialin.net [84.163.229.137] (helo=donor.laier.local) by mrelayeu.kundenserver.de with ESMTP (Nemesis), id 0MKwtQ-1E5ZGn3aft-00063d; Thu, 18 Aug 2005 03:32:37 +0200 From: Max Laier To: Luigi Rizzo Date: Thu, 18 Aug 2005 03:32:19 +0200 User-Agent: KMail/1.8.2 References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org> In-Reply-To: <20050817170248.A70991@xorpc.icir.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1138726.tHd4SFx7Ar"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200508180332.34895.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:61c499deaeeba3ba5be80f48ecc83056 Cc: arch@freebsd.org, net@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2005 01:32:42 -0000 --nextPart1138726.tHd4SFx7Ar Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Thursday 18 August 2005 02:02, Luigi Rizzo wrote: > On Wed, Aug 17, 2005 at 04:35:19AM +0200, Max Laier wrote: > > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > > > [apologies for the cross post but it belongs both to arch and net.] > > > > > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of > > > aisimilar but slightly different implementation of > > > multiple-reader/single-writer locks, which brings up the question(s): > > > > > > 1. should we rather put this code in the generic kernel code so that > > > other subsystems could make use of it ? E.g. the routing table is > > > certainly a candidate, > > > > I have asked this several time on -arch and IRC, but never found anyone > > willing to pursue it. However, the problem is ... > > > > > and especially > > > > > > 2. should we implement it right ? > > > > > > Both implementations are subject to starvation for the writers > > > (which is indeed a problem here, because we might want to modify > > > a ruleset and be prevented from doing it because of incoming traff= ic > > > that keeps readers active). > > > Also the PFIL_TRY_WLOCK will in fact be blocking if a writer > > > is already in - i have no idea how problematic is this in the > > > way it is actually used. > > > >... really this. I didn't find a clean way out of the starvation issue.= =20 > > What I do for pfil is that I set a flag and simply stop serving[2] shar= ed > > requests once a writer waits for the lock. If a writer can't sleep[1] > > then we return EBUSY and don't. However, for pfil it's almost ever safe > > to assume that a write may sleep (as it is for most instances of this > > kind of sx-lock where you have BIGNUMxreads:1xwrite). > > could you guys look at the following code and see if it makes sense, > or tell me where i am wrong ? > > It should solve the starvation and blocking trylock problems, > because the active reader does not hold the mutex in the critical > section anymore. The lock could well be a spinlock. The reader doesn't hold the lock over the critical section with the current= =20 implementation either. The write holds the lock over the critical section,= =20 which was a conscious decision. For some cases it does make sense to get r= id=20 of this, however. See inline comments why this doesn't work for pfil. > cheers > luigi > > /* > * Implementation of multiple reader-single writer that prevents > starvation. * Luigi Rizzo 2005.08.19 > * > * The mutex m only protects accesses to the struct rwlock. > * We can have the following states: > * IDLE: readers =3D 0, writers =3D 0, br =3D 0; > * any request will be granted immediately. > * > * READ: readers > 0, writers =3D 0, br =3D 0. Read in progress. > * Grant read requests immediately, queue write requests and > * move to READ1. > * When last reader terminates, move to IDLE. > * > * READ1: readers > 0, writers > 0, br >=3D 0. > * Read in progress, but writers are queued. > * Queue read and write requests to qr and wr, respectively. > * When the last reader terminates, wakeup the next queued writer > * and move to WRITE > * > * WRITE: readers =3D 0, writers > 0, br >=3D 0. > * Write in progress, possibly queued readers/writers. > * Queue read and write requests to qr and wr, respectively. > * When the writer terminates, wake up all readers if any, > * otherwise wake up the next writer if any. > * Move to READ, READ1, IDLE accordingly. > */ > > struct rwlock { > mtx m; /* protects access to the rwlock */ > int readers; /* active readers */ > int br; /* blocked readers */ > int writers; /* active + blocked writers */ > cv qr; /* queued readers */ > cv qw; /* queued writers */ > } > > int > RLOCK(struct rwlock *rwl, int try) > { > if (!try) > mtx_lock(&rwl->m); > else if (!mtx_trylock(&rwl->m)) > return EBUSY; > if (rwl->writers =3D=3D 0) /* no writer, pass */ > rwl->readers++; > else { > rwl->br++; > cv_wait(&rwl->qr, &rwl->m); ^^^^^^^ That we can't do. That's exactly the thing the existing sx(9) implementati= on=20 does and where it breaks. The problem is that cv_wait() is an implicit sle= ep=20 which breaks when we try to RLOCK() with other mutex already acquired. =20 Moreover will this break for recursive reads e.g.: Thread 1: RLOCK() ... RLOCK() -> cv_wait ... Thread 2: WLOCK() -> cv_wait ... This is exactly what pfil_hooks must be able to do as the packet filter may= =20 want to call back to the stack in order to send rejects etc. It would be an idea to use cv_wait() depending on the value in the try=20 argument and return EBUSY for that as well. This is a textbook implementation, that sure will work (given the above=20 change). The problem is, that we still have to invest 4 mutex operations f= or=20 every access. The current implementation has the same basic problem (thoug= h=20 it only uses 2 mutex operations for the WLOCK/UNLOCK). Ideally the RLOCK/= =20 UNLOCK should be free unless there is a writer waiting for the lock. In=20 order to do this I am thinking of a "spl-like" value in the PCPU section. = A=20 write would IPI other CPUs with a request to "raise" the bar once all have= =20 confirmed it would do the write and IPI again. This gives a serious=20 disadvantage to writes, but would allow us to block on failed read requests= =20 instead of erroring out or sleeping. Also, an uncongested read would be fr= ee=20 when compared to the existing solution. I have to read some more to see if this actually works. Comments appreciat= ed! > } > mtx_unlock(&rwl->m); > return 0; > } > > int > WLOCK(struct rwlock *rwl, int try) > { > if (!try) > mtx_lock(&rwl->m); > else if (!mtx_trylock(&rwl->m)) > return EBUSY; > rwl->writers++; > if (rwl->readers > 0) /* have readers, must wait */ > cv_wait(&rwl->qw, &rwl->m); > mtx_unlock(&rwl->m); > return 0; > } > > void > RUNLOCK(struct rwlock *rwl) > { > mtx_lock(&rwl->m); > rwl->readers--; > if (rwl->readers =3D=3D 0 && rwl->writers > 0) > cv_signal(&rwl->qw); > mtx_unlock(&rwl->m); > } > > void > WUNLOCK(struct rwlock *rwl) > { > mtx_lock(&rwl->m); > rwl->writers--; > if (rwl->br > 0) { /* priority to readers */ > rwl->readers =3D rwl->br; > rwl->br =3D 0; > cv_broadcast(&rwl->qr); > } else if (rwl->writers > 0) > cv_signal(&rwl->qw); > mtx_unlock(&rwl->m); > } =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart1138726.tHd4SFx7Ar Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQBDA+UyXyyEoT62BG0RAsJbAJ9deDV6DgsWcpWRrnkqAX3v/CDsVACfZ/fw 35bCcPJKPDG91LbEDylDFPw= =7C2d -----END PGP SIGNATURE----- --nextPart1138726.tHd4SFx7Ar--