Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Mar 2008 13:29:17 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        Garrett Wollman <wollman@bimajority.org>, Tim Kientzle <kientzle@freebsd.org>, Jason Evans <jasone@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, current@freebsd.org
Subject:   Re: Breaking the crt1.o -> atexit() -> malloc() dependency
Message-ID:  <20080308120038.D26157@besplex.bde.org>
In-Reply-To: <20080306094810.GM57756@deviant.kiev.zoral.com.ua>
References:  <200802280409.m1S498YJ062561@repoman.freebsd.org> <20080228231522.F57564@delplex.bde.org> <alpine.BSF.1.00.0802281109320.27124@thor.farley.org> <20080229141527.N59899@delplex.bde.org> <18375.43955.908262.696223@hergotha.csail.mit.edu> <47C8D0AB.20506@freebsd.org> <20080302062610.V66431@delplex.bde.org> <47CA2192.8020802@FreeBSD.org> <20080303065527.K69705@delplex.bde.org> <47CF4500.2050509@freebsd.org> <20080306094810.GM57756@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 6 Mar 2008, Kostik Belousov wrote:

> On Wed, Mar 05, 2008 at 05:12:32PM -0800, Tim Kientzle wrote:
>> There was some recent discussion on the commit mailing
>> list about how to disentangle crt1.o from malloc().
>>
>> Here's a design that I think addresses all of the
>> issues people raised, including the POSIX requirement
>> that atexit() always be able to support 32 registrations.
>> It does it without using sbrk() or mmap(), either.
>>
>> The basic idea is to lift the malloc() call up into
>> atexit() and have atexit_register() use statically-allocated
>> storage if atexit() didn't provide dynamically-allocated
>> storage.
>> ...
>> /* 32 required by POSIX plus a few for crt1.o */
>> static struct atexit pool[40];

Could it use a few for crt1 only, with dynamic allocation for everything
except crt1 and maybe stdio?  This might simplify the frees.

I don't agree with the argument that static allocation is needed or useful
for satisfying the requirement for 32 atexits to succeed.  malloc() can't
fail :-), and if it does then you have worse problems than atexit failures
to handle.

>> Avoiding free() from the low-level code is a little trickier
>> but I think it can be done by having the low-level code
>> put (dynamically-allocated) blocks back onto a free list
>> and having the higher-level atexit() release that list
>> on the next registration.  This should handle the case
>> of a dynamic library being repeatedly loaded and unloaded.
>> Of course, it's unnecessary to release the atexit storage
>> on program exit.

With separate storage for crt1, everything for crt1 except the
calls to the registered functions could be independent of atexit()
- just call the entries in the separate storage last at exit time.
stdio's rotting __cleanup hook works like this.  __cleanup's
reason for existence is to provide an atexit-like hook for stdio
without the full bloat of atexit, but this is defeated by always
calling atexit() from crt1.  This hook costs 1 pointer and one
statement in exit() when it is not used.  exit() still calls
__cleanup last (iff __cleanup is not null.  Thus __cleanup
effectively extends the static atexit table by 1 entry (the
first one).

>> In particular, crt1.o can then call atexit_register(f, NULL)
>> to register its exit functions without creating a dependency on
>> malloc.

Or it could do __cleanupN = functionN for a few small values of N
like stdio does for __cleanup.  Then it wouldn't have any dependency
on atexit either, but the ugliness in exit.c for __cleanup would need
to be duplicated for each __cleanupN.  At most 3 values of N need to
be supported (same for all arches I think):

 	for function cleanup = get_rtld_cleanup();	/* dynamic only */
 	for function _mcleanup				/* profiling only */
 	for function _fini				/* always */

Better, make all these atexit calls implicit.  The conditions for them
don't depend on the startup code, so __cxa_finalize() can call them
directly (except it needs a pointer for get_rtld_cleanup()).  __cxa_finalize
can also handle __cleanup (move the call though __cleanup from exit.c
to atexit.c).

I think this works so simply and machine-indepependently mainly because
most of the details are in _fini.  _fini calls __do_global_dtors_aux
on at least i386.  Any number of magically ordered cleanups can be hidden
there.

>> This does require that atexit() and atexit_register() be in
>> separate source files, but I think it addresses all of the other
>> concerns people have raised.
>
> I mostly agree with proposal, but there is also __cxa_atexit().

More bloat to remove :-).  It seems to be only for C++, but all
executables have it.  Before it existed, exit() looped over the atexit
table where it now calls __cxa_atexit(), and the order of the atexit
finalizations relative to the __cleanup was clearer.

> And, besides the issue of the size of the static linked executables,
> there is more exposed problem of atexit() memory leaks. See
> http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040644.html

This problem seems to only affect C++.  But how does the C dlclose() work
without calling __cxa_atexit()?

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080308120038.D26157>