Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 May 1998 19:51:36 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        syssgm@dtir.qld.gov.au (Stephen McKay)
Cc:        tlambert@primenet.com, freebsd-current@FreeBSD.ORG, syssgm@dtir.qld.gov.au
Subject:   Re: Fix for undefined "__error" and discussion of shared object versioning
Message-ID:  <199805231951.MAA10260@usr07.primenet.com>
In-Reply-To: <199805231040.UAA02235@troll.dtir.qld.gov.au> from "Stephen McKay" at May 23, 98 08:40:06 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> I'm sure you have misread my message.  Here is a diff from the test code
> you sent on 20 May 1998 03:12:25 +0000 to the test code I sent back on
> 20 May 1998 17:47:10 +1000:

[ ... ]

> All I did was count the number of times ___error() is called.  I didn't
> rename the symbol.  Since ___error() is called even when linked with -lc_r
> I conclude that __error() in libc_r is not overriding the weak __error
> supplied by your modified errno.h.  Thus, threaded applications would
> all share the same errno instead of getting one each, which led to my
> claim that a more extensive multithreaded test case is required.

I see what you are attempting.

The weak symbol is apparently being screwed over by our linker *before*
the libraries are examined for identical non-weak symbols.

Specifically, in pass 1, in ld.c, there is code:

                /*                      
                 * If this symbol has acquired final definition, we're done.
                 * Commons must be allowed to bind to shared object data
                 * definitions. 
                 */
                if (sp->defined &&
                    (sp->common_size == 0 ||
                     relocatable_output || building_shared_object)) {
                        if ((sp->defined & N_TYPE) == N_SETV)
                                /* Allocate zero entry in set vector */
                                setv_fill_count++;
                        /*      
                         * At this stage, we do not know whether an alias
                         * is going to be defined for real here, or whether
                         * it refers to a shared object symbol. The decision
                         * is deferred until digest_pass2().
                         */
                        if (!sp->alias) 
                                defined_global_sym_count++;
                        continue;
                }

This causes the symbol to be bound to the weak value, even though
there is a shared library definition.

This is *WRONG*.  The ld program is *BROKEN*.


So the problem you are seeing is specifically because the *PROGRAM*
object has the weak definition.

This will never be the case for the legacy code you are delaing with.

In the shared library case, the loading of shared objects and the
resoloution of weak symbols is, in fact, correct.

Practically, this means that the weak __error definition to ___error
*WILL* work, but *ONLY* if it occurs in shared objects, and *NOT*
in the main program.

This was the point of the _ERRNO_ protection of the static function
and weak symbol definition, in my last posting.  You don't have it
defined when you compile normal programs.

It would be better to tag this off of "SHARABLE" or whatever the
compiler likes to define when you are compiling PIC code for a
shared library.




> In other words, there is now just one errno because ___error from errno.h
> is used in preference to __error in libc_r.

That's because you are defining it in your objects that you are linking
against, as well as your stub shared library.


> Yes, I've looked at these.  That's why I'm so disappointed that the
> technique doesn't work.  Having played with it a bit, I'm now convinced
> that no tweaking with errno.h can ever fix the problem.

About 8 hours of work on ld could fix it.  I hacked together a stupid
instrumented ld that almost works in about an hour.  It still doesn't
do the right thing, quite, since it doesn't put the correct shared
library offset into the symbol definition; it does, however, put
the expected ___errno string in the relocation symbol external
name references, so it's a matter of clobbering one value (or more
accurately, pulling the weak value off of nzlist in favor of the
shared object value).


> >For programs *already* linked against libc_r instead of libc, or
> >linked against the new libc, I *EXPECT* the standin to *NEVER* be
> >called.
> 
> Yes!  This is where I claim the experimental evidence is against you.

You are still running the wrong experiment, I think.  The errno.h
static function and weak sumbol declaration should *ONLY* occur in
shared library compilations, and *not* when you include errno.h
in the objects for the program you are linking against the shared
libraries.


> Spurred by your description of load ordering, I built a small library
> (lib__error.so) containing just /usr/src/lib/libc/sys/__errno.c with
> an execution counter in __error_unthreaded.  I linked this to a small
> test program, using -lc_r as well.

[ ... ]

> Output of ldd:
> 
> foo:
> 	-l__error.0 => /syshome/syssgm/lib/lib__error.so.0.0 (0x20014000)
> 	-lc_r.3 => /usr/lib/libc_r.so.3.0 (0x20019000)
> 	-lc.3 => /usr/lib/libc.so.3.1 (0x2009b000)
> -----------------------------------------------------------------------------
> Output of foo:
> -----------------------------------------------------------------------------
> errno is 0
> count is 1
> errno is 21
> count is 3
> -----------------------------------------------------------------------------

The problem here is that foo is getting the __error = ___error from
foo.o, not from teh shared library.

I would expect the strong __error in /usr/lib/libc_r.so.3.0 to override
the weak __error = ___error in /syshome/syssgm/lib/lib__error.so.0.0

But it's *NOT* going to override the __error = ___error definition that
occurs in foo.o because of the ld bug (see above) which prevents strong
references in shared libraries from overriding weak references in the
user's code.

This may, in fact, be broken for certain shared library data definitions
as well (I haven't looked closely, only close enough to see that it
seems wrong, even in the case where common_size != 0).



> Now on to a hack that actually works:

[ ... hack to ld.so ... ]

I'm anxious about this hack because what you are doing is covering a
bug in ld that is interfering with your test case.  I think this can
be adequately dispensed with by doing the right thing in errno.h
and bsd.lib.mk.


> So, for the folks that really care about this, we now have 3 possible
> options:
> 
> 1) back out the errno change, and possibly put it back after ELF.
> 
> 2) hack ld.so (prototype works fine)
> 
> 3) bump ALL library major numbers
> 
> Which will it be?

There are two more:

  4) hack errno.h to define the weak symbol mechanism I proposed, and
     fix ld so that errno.h doesn't have to know that a shared library
     compilation unit is including it.

  5) hack errno.h to define the weak symbol mechanism I proposed, and
     hack bsd.lib.mk that errno.h knows that a shared library
     compilation unit is including it.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805231951.MAA10260>