Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Aug 2005 03:32:19 +0200
From:      Max Laier <max@love2party.net>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        arch@freebsd.org, net@freebsd.org
Subject:   Re: duplicate  read/write locks in net/pfil.c and netinet/ip_fw2.c
Message-ID:  <200508180332.34895.max@love2party.net>
In-Reply-To: <20050817170248.A70991@xorpc.icir.org>
References:  <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> <20050817170248.A70991@xorpc.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart1138726.tHd4SFx7Ar
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Thursday 18 August 2005 02:02, Luigi Rizzo wrote:
> On Wed, Aug 17, 2005 at 04:35:19AM +0200, Max Laier wrote:
> > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote:
> > > [apologies for the cross post but it belongs both to arch and net.]
> > >
> > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of
> > > aisimilar but slightly different implementation of
> > > multiple-reader/single-writer locks, which brings up the question(s):
> > >
> > > 1. should we rather put this code in the generic kernel code so that
> > > other subsystems could make use of it ? E.g. the routing table is
> > > certainly a candidate,
> >
> > I have asked this several time on -arch and IRC, but never found anyone
> > willing to pursue it.  However, the problem is ...
> >
> > > and especially
> > >
> > > 2. should we implement it right ?
> > >
> > >    Both implementations are subject to starvation for the writers
> > >    (which is indeed a problem here, because we might want to modify
> > >    a ruleset and be prevented from doing it because of incoming traff=
ic
> > >    that keeps readers active).
> > >    Also the PFIL_TRY_WLOCK will in fact be blocking if a writer
> > >    is already in - i have no idea how problematic is this in the
> > >    way it is actually used.
> >
> >... really this.  I didn't find a clean way out of the starvation issue.=
=20
> > What I do for pfil is that I set a flag and simply stop serving[2] shar=
ed
> > requests once a writer waits for the lock.  If a writer can't sleep[1]
> > then we return EBUSY and don't.  However, for pfil it's almost ever safe
> > to assume that a write may sleep (as it is for most instances of this
> > kind of sx-lock where you have BIGNUMxreads:1xwrite).
>
> could you guys look at the following code and see if it makes sense,
> or tell me where i am wrong ?
>
> It should solve the starvation and blocking trylock problems,
> because the active reader does not hold the mutex in the critical
> section anymore. The lock could well be a spinlock.

The reader doesn't hold the lock over the critical section with the current=
=20
implementation either.  The write holds the lock over the critical section,=
=20
which was a conscious decision.  For some cases it does make sense to get r=
id=20
of this, however.  See inline comments why this doesn't work for pfil.

> 	cheers
> 	luigi
>
> /*
>  * Implementation of multiple reader-single writer that prevents
> starvation. * Luigi Rizzo 2005.08.19
>  *
>  * The mutex m only protects accesses to the struct rwlock.
>  * We can have the following states:
>  * IDLE: readers =3D 0, writers =3D 0, br =3D 0;
>  *      any request will be granted immediately.
>  *
>  * READ: readers > 0, writers =3D 0, br =3D 0. Read in progress.
>  *      Grant read requests immediately, queue write requests and
>  *      move to READ1.
>  *      When last reader terminates, move to IDLE.
>  *
>  * READ1: readers > 0, writers > 0, br >=3D 0.
>  *      Read in progress, but writers are queued.
>  *      Queue read and write requests to qr and wr, respectively.
>  *      When the last reader terminates, wakeup the next queued writer
>  *      and move to WRITE
>  *
>  * WRITE: readers =3D 0, writers > 0, br >=3D 0.
>  *      Write in progress, possibly queued readers/writers.
>  *      Queue read and write requests to qr and wr, respectively.
>  *      When the writer terminates, wake up all readers if any,
>  *      otherwise wake up the next writer if any.
>  *      Move to READ, READ1, IDLE accordingly.
>  */
>
> struct rwlock {
>         mtx     m;      /* protects access to the rwlock */
>         int     readers;        /* active readers */
>         int     br;     /* blocked readers */
>         int     writers;        /* active + blocked writers */
>         cv      qr;     /* queued readers */
>         cv      qw;     /* queued writers */
> }
>
> int
> RLOCK(struct rwlock *rwl, int try)
> {
>         if (!try)
>                 mtx_lock(&rwl->m);
>         else if (!mtx_trylock(&rwl->m))
>                 return EBUSY;
>         if (rwl->writers =3D=3D 0)  /* no writer, pass */
>                 rwl->readers++;
>         else {
>                 rwl->br++;
>                 cv_wait(&rwl->qr, &rwl->m);
                  ^^^^^^^

That we can't do.  That's exactly the thing the existing sx(9) implementati=
on=20
does and where it breaks.  The problem is that cv_wait() is an implicit sle=
ep=20
which breaks when we try to RLOCK() with other mutex already acquired. =20
Moreover will this break for recursive reads e.g.:

Thread 1: RLOCK() ...      RLOCK() -> cv_wait ...
Thread 2:          WLOCK() -> cv_wait ...

This is exactly what pfil_hooks must be able to do as the packet filter may=
=20
want to call back to the stack in order to send rejects etc.

It would be an idea to use cv_wait() depending on the value in the try=20
argument and return EBUSY for that as well.

This is a textbook implementation, that sure will work (given the above=20
change).  The problem is, that we still have to invest 4 mutex operations f=
or=20
every access.  The current implementation has the same basic problem (thoug=
h=20
it only uses 2 mutex operations for the WLOCK/UNLOCK).  Ideally the RLOCK/=
=20
UNLOCK should be free unless there is a writer waiting for the lock.  In=20
order to do this I am thinking of a "spl-like" value in the PCPU section.  =
A=20
write would IPI other CPUs with a request to "raise" the bar once all have=
=20
confirmed it would do the write and IPI again.  This gives a serious=20
disadvantage to writes, but would allow us to block on failed read requests=
=20
instead of erroring out or sleeping.  Also, an uncongested read would be fr=
ee=20
when compared to the existing solution.

I have to read some more to see if this actually works.  Comments appreciat=
ed!

>         }
>         mtx_unlock(&rwl->m);
>         return 0;
> }
>
> int
> WLOCK(struct rwlock *rwl, int try)
> {
>         if (!try)
>                 mtx_lock(&rwl->m);
>         else if (!mtx_trylock(&rwl->m))
>                 return EBUSY;
>         rwl->writers++;
>         if (rwl->readers > 0)   /* have readers, must wait */
>                 cv_wait(&rwl->qw, &rwl->m);
>         mtx_unlock(&rwl->m);
>         return 0;
> }
>
> void
> RUNLOCK(struct rwlock *rwl)
> {
>         mtx_lock(&rwl->m);
>         rwl->readers--;
>         if (rwl->readers =3D=3D 0 && rwl->writers > 0)
>                 cv_signal(&rwl->qw);
>         mtx_unlock(&rwl->m);
> }
>
> void
> WUNLOCK(struct rwlock *rwl)
> {
>         mtx_lock(&rwl->m);
>         rwl->writers--;
>         if (rwl->br > 0) {      /* priority to readers */
>                 rwl->readers =3D rwl->br;
>                 rwl->br =3D 0;
>                 cv_broadcast(&rwl->qr);
>         } else if (rwl->writers > 0)
>                 cv_signal(&rwl->qw);
>         mtx_unlock(&rwl->m);
> }

=2D-=20
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News

--nextPart1138726.tHd4SFx7Ar
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQBDA+UyXyyEoT62BG0RAsJbAJ9deDV6DgsWcpWRrnkqAX3v/CDsVACfZ/fw
35bCcPJKPDG91LbEDylDFPw=
=7C2d
-----END PGP SIGNATURE-----

--nextPart1138726.tHd4SFx7Ar--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200508180332.34895.max>