Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Jan 2012 02:49:32 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Ed Schouten <ed@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r229368 - in head: lib/libc lib/libc/arm/string lib/libc/i386/string lib/libc/mips/string lib/libc/string lib/libstand sys/boot/userboot/libstand
Message-ID:  <20120104013401.S6960@besplex.bde.org>
In-Reply-To: <201201030714.q037E2qq010125@svn.freebsd.org>
References:  <201201030714.q037E2qq010125@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 3 Jan 2012, Ed Schouten wrote:

> Log:
>  Merge index() and strchr() together.
>
>  As I looked through the C library, I noticed the FreeBSD MIPS port has a
>  hand-written version of index(). This is nice, if it weren't for the
>  fact that most applications call strchr() instead.
>
>  Also, on the other architectures index() and strchr() are identical,
>  meaning we have two identical pieces of code in the C library and
>  statically linked applications.

Only in statically linked applications that used both, since they weren't
actually identical -- they were intentionally put in separate object files
to avoid this problem.  (In asm, you don't need symbol magic to declare
strong aliases, but just use 2 .globl labels together.  But this is usually
wrong since it doesn't keep things separate enough.  Some files use
#include to implement the multiple copies.  For example, amd64 and i386
don't bother optimizing memcpy() over memmove(), but make it a copy in
a separate file.  The i386 index.S and strchr.S were not so good -- they
duplicated the code.)

>  Solve this by naming the actual file strchr.[cS] and let it use
>  __strong_reference()/STRONG_ALIAS() to provide the index() routine. Do
>  the same for rindex()/strrchr().

This breaks the Standard C namespace.  When they are in the same object
file, there is no way to get the standard name without getting the
nonstandard name.  So the following C-standard-conforming C program
now gets a linkage error (multiple definition of `index'), at least with
static linkage:

     #include <string.h>
     int index;
     void foo(const char *p) { return strchr(p, '1'); }

When they were in separate object files, the nonstandard name just added
to the general pollution in the libc runtime in a way that doesn't seem
to cause any problems in practice, since it is orthogonal to any uses of
the name in a conforming application.

We mostly use weak references in libraries, to avoid problems like
this.  In libc, there were just 2 __strong_reference()s and 111
__weak_reference()s.  One of the oldest weak references is from
__vfscanf to vfscanf.  This is used to implement a bug in C90: C90
doesn't have vfscanf, so it must not be in libc in a way that conflicts
with any application symbol named vfscanf.  libc needs vfscanf's
functionality internally, and doesn't want to duplicate the whole
thing.  So it puts the functionality in __vfscanf and always uses that
internally, and provides the duplication solely as a weak symbol.  The
symbol remains weak, and C90 remains sort of supported, although the
bug is fixed in C99 (it has vfscanf).  There are also _many_ (but not
nearly all?) POSIX symbols that are handled as weak references.
Internally, they have names like _open and weak symbols like `open'
(for some reason, both _open and `open' are shown by nm as weak).
These are implemented more magically using include/*namespace.h and
macros in asm files.  I got the count of 111 by grepping for the C
macro.  This missed all the asm macros.  Grepping for ' W ' in libc.a
shows 1024 weak references.  That's almost 30% of all symbols (there
are 2195 ' T ' symbols).  nm doesn't seem to provide a way to show
what the symbols are aliases for.  It is worse for strong symbols
(shows them both as ' T ').

>
>  This seems to make the C libraries and static binaries slightly smaller,
>  but this reduction in size seems negligible.

Duplication of the object file (except for the global symbols) is best,
and may be required, even for the example of memcpy being identical
to memmove given above.  It is useful to be able to put a breakpoint
at memcpy without having it trigger when memmove is called, and the C
standard might require memcpy and memmove to have different addresses.
Similarly for profiling.  You want logically different functions to
have different addresses.  I wonder if gprof knows enough about symbols
to prefer strchr over index if they are strong aliases for each other.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120104013401.S6960>