Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Jan 2012 23:55:05 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Ed Schouten <ed@80386.nl>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r229368 - in head: lib/libc lib/libc/arm/string lib/libc/i386/string lib/libc/mips/string lib/libc/string lib/libstand sys/boot/userboot/libstand
Message-ID:  <20120104214707.Q1935@besplex.bde.org>
In-Reply-To: <20120103182020.GH1895@hoeg.nl>
References:  <201201030714.q037E2qq010125@svn.freebsd.org> <20120104013401.S6960@besplex.bde.org> <20120103182020.GH1895@hoeg.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 3 Jan 2012, Ed Schouten wrote:

> * Bruce Evans <brde@optusnet.com.au>, 20120103 16:49:
>> This breaks the Standard C namespace.  When they are in the same object
>> file, there is no way to get the standard name without getting the
>> nonstandard name.  So the following C-standard-conforming C program
>> now gets a linkage error (multiple definition of `index'), at least with
>> static linkage:
>>
>>     #include <string.h>
>>     int index;
>>     void foo(const char *p) { return strchr(p, '1'); }
>
> Though I sympathize, this problem is not just limited to strchr(). There
> are other portions of the C library that use index() as well. For
> example, if you use <ttyent.h>, ttyslot(), one of the exec*() functions
> or the NIS functions, you already get index() linked into your binary.

These are standard C or very often used parts of libc though.  Your
change gives the bizarre behaviour that any use of strchr() by an
application prevents all uses of index by the application (except
calling the library index()), while the only point of of cleaning up
libraries to not use index() is to allow applications to use it for
whatever they want.

>> [...] and the C
>> standard might require memcpy and memmove to have different addresses.
>
> I just skimmed through the standard, and if I haven't overlooked
> anything, no such requirement is made. Also, I can imagine a compiler
> with good support for link-time optimisation can already merge equal
> pieces of code together, making it even harder to reason about
> inequality of function addresses.
>
> Still, I am willing to address the issues you raised. index() and
> rindex() aren't that important nowadays and I have a patchset ready in
> my home directory that converts almost all apps in the base system to
> use strchr() anyway.
>
> As I don't feel like polluting the MI strchr() implementations with
> index()/rindex() support, would it be okay if I implement index() and
> rindex() as simple C functions that call into strchr() and strrchr()?

Just use a weak symbol.  This is good enough for hundreds or thousands
or existing aliases.

I did a quick check what gprof does with the alises.  It seems to be
random.  gprof reports that the symbol for open(2) is `open', irrespective
of whether I spell it `open' or _open in the source code.  OTOH, gprof
reports that the symbol for vfscanf(3) is __vfscanf, irrespective of
whether I spell it vfscanf or __vfscanf in the source code.  The object
code contains both symbols of course (after linking to libc).  I think
gprof just picks one of the symbols semi-randomly (for these 2 pairs,
it uses the last symbol in symbol table order).  To do the right thing,
the object linked object code would have to contain a hint about which
symbol the application used (which is only possible if the application
only used 1 of the aliases), and gprof would have to understand the
hint.  I think there is no such hint.  To do the next best thing, gprof
would have to prefer the non-weak symbol.  Is this possible?  nm output
doesn't distinguish the symbols.

I tried doing something using chained aliases with a weak link, and only
found lots of bugs.  E.g.:

% #include <sys/cdefs.h>
% 
% /*
%  * XXX __strong_alias() is unusable since it acts at the C level, but our
%  * intermediate symbol only exists at the asm level.
%  */
% #undef __strong_reference
% #define	__strong_reference(sym, alias)	\
% 	__asm(".global " #alias);	\
% 	__asm(#alias " = " #sym)
% 
% void xstrchr(void) {}
% 
% __weak_reference(xstrchr, __xindex);
% __strong_reference(__xindex, xindex);

This reimplementation of __strong_reference() worked like I wanted (on
i386), but the chain didn't do anything interesting or fix gprof.

I copied it from the STRONG_ALIAS() that you added in i386/include/asm.h,
except STRONG_ALIAS() has the parameters reversed relative to all the macros
in cdefs.h.  STRONG_ALIAS() only exists on mips and now i386.

The section that defines symbol macros in cdefs.h has lots of bugs:
- wrong conditional.  Old versions of gcc don't support the attribute that
   it uses, or __attribute__(()) at all.
- __strong_reference() is only defined for !__INTEL_COMPILER.  It is now
   used unconditionally in libc (in ptsname.c alone).  Thus, icc cannot
   possibly compile libc.  The above reemplmentation of __strong_reference()
   may be less unportable.  All of these symbol definitions are unportable
   and don't belong in MI cdefs.h.  They assume a particular dialect of asm
   and much more.  But they are portable enough in practice since we only
   use the gas dialect.
The rest are just style bugs:
- no space after commas in parameter lists.  The !STDC or the !ELF case
   may need this.  Otherwise, it is just a style bug (copied from the
   older !ELF code).  The !STDC case has probably never been used in
   FreeBSD.  Both subcases of the ELF case were added in 1998 just
   before K&R support started being dismantled.  Some !STDC support
   remains.  But ELF support was removed soon after.  I think it is only
   the removed !ELF && !STDC case that needs to misformat the parameter
   lists, since IIRC spaces in the list get into parameters for STDC, and
   this matters for !ELF since we want to add a prefix of `_' to symbol
   names.
- __asm is misspelled __asm__
- there is a tab instead of a space before the comments on #endif's.

More on the above reimplementation of __strong_reference():  here it is
again:

% #define	__strong_reference(sym, alias)	\
% 	__asm(".global " #alias);	\
% 	__asm(#alias " = " #sym)

I remembered that I can never remember the difference between "=" and
"equ".  In fact, there seems to be no difference, and "=" is just an
alternative spelling with a difference syntax for the ".equ" directive.
Using ".equ" is safer because "=" might be an arithmetic operator.
And in cdefs.h, using "=" is a style bug because all other places (just
the 2 implementations of __weak_reference()) use ".equ".  Changing the
above to use ".equ" as in __weak_reference gives:

#define	__strong_reference(sym, alias)	\
 	__asm(".global " #alias);	\
 	__asm(".equ " #alias ", " #sym)

Now the only difference between __strong_reference() and the STDC
__weak_reference() are:
- one declares #alias as .global and the other declares it as .weak.
     Oops, I forgot to fix the misspelling of .globl as .global.  It
     is spelled traditionally (to satisfy the limit of 6 characters in
     identifiers) in all instances in amd64 and mips, and in all places
     in i386 except in STRONG_REFERENCE().
The rest are just style differences:
- I didn't duplicate the fancy formatting that puts extra spaces in the
   source code to line up the '#alias's.  These should really be tabs
   to line up the symbols in the asm output, but tabs in strings in source
   code are even harder to format well (they should be spelled \t, but the
   macros are hard enough to read already.
- I didn't duplicate the misspelling of __asm as __asm__.

So it seems that __strong_reference() can be implemented in a way that
is identically unportable to __weak_reference().

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120104214707.Q1935>