Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jul 2003 16:03:16 -0400
From:      Mike Makonnen <mtm@identd.net>
To:        "Jacques A. Vidrine" <nectar@FreeBSD.org>
Cc:        kris@obsecurity.org
Subject:   Re: making nsswitch(3) thread-safe (was Re: Removal of bogus gethostbyaddr_r())
Message-ID:  <20030729200315.GA53927@kokeb.ambesa.net>
In-Reply-To: <20030623150627.GD82167@madman.celabo.org>
References:  <20030618210812.GB21622@rot13.obsecurity.org> <Pine.GSO.4.10.10306182206300.3647-100000@pcnet5.pcnet.com> <20030619055933.FNBD4805.out003.verizon.net@kokeb.ambesa.net> <20030623150627.GD82167@madman.celabo.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Ok Jacques,

I have abandoned the method I was thinking of doing it in.
Changes to third-party nss modules is definitely
a show-stopper IMO. So, this email is essentially to let you know
I have decided to use the modifications from PR: bin/29581 (as posted by
sobomax a while back) as a starting point since it generally got a
good response. For those who don't remember, it's the one that
uses the POSIX thread-local storage primitives.

BTW, in case it's not clear what I'm talking about:
I won't be touching the nsswitch stuff, I'll just be making the
the gethostbyXXX functions thread-safe. The essence of the patch
won't change, I'll just be modifying the backend to make some things
less static, and also hiding the whole thing behind __isthreaded.
I should be done in a few days and will post it here after
some testing.


On Mon, Jun 23, 2003 at 10:06:27AM -0500, Jacques A. Vidrine wrote:
> [Sorry for the late reply.  As you may or may not have noticed :-)
>  my FreeBSD hiatus continues :-(  I'll be back, just not sure when
>  yet.]
> 
> On Thu, Jun 19, 2003 at 01:59:32AM -0400, Mike Makonnen wrote:
> > 
> > [ Jacques, I'm CC'ing you since this affects nss ]
> 
> Thanks, Mike! -- for CC'ing me and for looking at this in the first
> place.
> 
> > I just took a look into what it would take to make the gethostby*_r functions
> > thread safe, and it isn't pretty. The "thread-unsafeness" goes all the way
> > down into the individual methods for the various nsswitch(3) databases (dns,
> > files, etc). So, the most expedient and least invasive way to go about it would
> > be to put a mutex in the gethostby* functions that is dependent on __isthreaded
> > being true. This, offcourse, means that the entire call-chain (from top to
> > bottom) is locked for the duration of a call to one of these functions, which
> > can be a considerable amount of time if it has to timeout. During this time no
> > other thread in the application can resolve a host because it will be blocked on
> > said mutex.
> 
> You couldn't take this approach even if you wanted.  No part of libc
> should hold a lock while making calls through nsdispatch.  NSS modules
> are free to call back into libc.  Imagine holding a lock in
> gethostbyname_r, calling through nsdispatch and getting to, say, the
> LDAP backend.  The LDAP backend might itself invoke gethostbyname_r
> or some other nsdispatch consumer resulting in recursive locking or
> lock order reversals.
>  
> > To my mind the way to fix this properly is to make the nsdispatch(3)
> > call-chain in libc thread-safe from the ground-up. This is a huge task in and of
> > itself, but is complicated even further by the fact that whatever we do can't
> > change the semantics of the existing user-visible functions.
> 
> Yes, this is the approach that must be taken.  The nsdispatch engine
> is itself thread-safe.  The getpw*_r, getgr*_r functions are
> thread-safe.  I believe that with the exception of the `dns' backend,
> the existing backends are thread-safe.
> 
> > I think there are three points to keep in mind:
> > 
> > 1. The nsswitch(3) database functions should be made thread-safe, but at the
> >     same time not change the thread-unsafe semantics that third-party apps
> >     might expect (through nsdispatch(3)).
> > 2. The thread-unsafeness starts at the bottom (low-level functions) of the
> >     call-chain, in the nsswitch(3) database functions.
> > 3. Because so many databases (hosts, passwd, group, etc) will be affected by
> >     this it would be too huge a task to do it all at once. The safest approach
> >     is probably to introduce the changes separately a few at a time.
> 
> What do you mean by the `nsswitch(3) database functions' ?  Do you
> mean the backends?  If so, yes, the backends must ultimately be
> thread-safe, and it is best to manage them one database at a time
> i.e., all built-in backends --- files, nis, dns --- for a database
> must be done at once.
> 
> > We can achieve this by
> > Introducing a variable that gets passed down the call-chain telling the
> > nsswitch(3) database functions whether the "destination address" (return
> > value) will by allocated by the caller or not. If this variable is false then
> > the routines can use a static buffer, otherwise, they will assume the caller has
> > allocated the necessary space.
> >
> > Most of the functions will need a lot more work than this to make them
> > fully thread-safe. The advantage of doing it this way is that it maintains
> > the status-quo while allowing us to make the individual subsystems
> > thread-safe separately and as time and resources permit.
> > 
> > The following patch shows what I have in mind. It won't
> > compile just yet, but it might make what I'm saying a little clearer :-)
> > This initial phase is really just a mechanical change of argument lists. It
> > doesn't introduce any thread-safeness on its own. It just makes it easier to
> > introduce such changes safely on a subsytem by subsystem basis.
> > 
> > Basically, the _nsdispatch() functionality is moved into nsdispatch_common(),
> > which takes a va_list as its last argument instead of a variable number of
> > arguments. It also takes one additional argument(int is_r) so callers can tell
> > it what kind of semantics they want. Applications calling nsdispatch() get a
> > wrapper that passes a false (0) value in the is_r argument. Callers from within
> > libc will continue calling_nsdispatch(), which also grows an additional argument
> > (int is_r) that callers can pass down to the nsswitch(3) database functions
> > through nsdispatch_common().
> > 
> > What do you think? Should I continue with this or do you think it's bogus?
> >
> > The important point to keep in mind is that we have to keep the dual semantics
> > (thread-safe and unsafe) all the way down the call-chain because only the
> > top-level and low-level functions know the actual memory type and size that
> > needs to be allocated for the return value to the library users. So, we can't
> > make the low-level nsswitch(2) database functions unconditionally thread-safe
> > and restrict the dual semantics to the top-level functions.
> 
> I think this is bogus :-)  I must say I don't really understand what
> you are trying to accomplish ... maybe we should get on IRC and chat.
> Also, please look at getpwent.c to see how thread-safe versus
> non-thread-safe entry points to getpw* are handled if you have not
> already.  I can't see any need to add this `is_r' argument.  I don't
> see the purpose of it.
> 
> The hardest part of making the thread-safe gethost*_r functions, IMHO
> (after having done some work in this area) is untangling the
> KAME-contributed code.  There is a lot of incest between the
> implementations of the various high-level resolver routines
> (gethostbyname, getaddrinfo, getipnodebyname, etc.) that complicates
> things.   But the approach is conceptually simple:
> 
>    1. Write gethostbyname_r (thread-safe), which invokes
>       nsdispatch("gethostbyname_r", ...).
> 
>    2. Make each back-end thread-safe.  Starting with a global lock is
>       reasonable.  I don't think we can ever do anything efficient
>       with the BIND 4 resolver, anyway.
> 
> Only I think you want to do the whole family of functions in one blow,
> not one-at-a-time.  (But maybe I'm wrong here, 'cause that has
> certainly kept me from committing any partial work in this area.)
> 
> Note also that the libc<-->NSS modules interface is constrained.
> Nominally we use the GNU libc interface, which has different entry
> points for e.g. gethostbyname and gethostbyname_r, with a set number
> and type of parameters.  We don't want to change these interface lest
> we make porting NSS modules to FreeBSD harder.  We can add some
> interfaces, though, where needed (i.e. getaddrinfo).
> 
> You can contact me in a number of ways to discuss:
> 
>   - EFnet nickname `nectar' (or whatever nick shows up for 
>     `nectar@free.bsd')
>   - AIM `nectar5'
>   - ICQ `17783107'
>   - Yahoo! `nectar'
>   - MSN `nectar523@hotmail.com'
>   - Voice 612-743-3364
> 
> It would be best if we could `schedule' something.  I apologize, but
> my freetime is very slim lately.
> 
> Cheers,
> -- 
> Jacques Vidrine   . NTT/Verio SME      . FreeBSD UNIX       . Heimdal
> nectar@celabo.org . jvidrine@verio.net . nectar@freebsd.org . nectar@kth.se

-- 
Mike Makonnen  | GPG-KEY: http://www.identd.net/~mtm/mtm.asc
mtm@identd.net | D228 1A6F C64E 120A A1C9  A3AA DAE1 E2AF DBCC 68B9
mtm@FreeBSD.Org| FreeBSD - Unleash the Daemon!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030729200315.GA53927>