Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Oct 1998 21:46:50 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        jin@george.lbl.gov (Jin Guojun)
Cc:        mike@smith.net.au, hackers@FreeBSD.ORG, jdp@FreeBSD.ORG
Subject:   Re: ld for loading dynamic library changed in 3.0-RELEASE?
Message-ID:  <199810232146.OAA06412@usr07.primenet.com>
In-Reply-To: <199810222351.QAA08906@george.lbl.gov> from "Jin Guojun" at Oct 22, 98 04:51:14 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> test.lbl.gov: cc -o test test.c -L/usr/local/lib -ltest
> /usr/local/lib/libtest.so: undefined reference to `b_printf'
> /usr/local/lib/libtest.so: undefined reference to `c_printf'

This behaviour is undesirable, yet it is desirable.

This is because of the "NULL" argument to dlopen() to access the
symbol space of the program code sections.


Consider a program that implements a scripting language that
allows access to locally defined symbols for the purposes of
allowing calls to be proxied to native code implementations of
script components.

You couls see this in a JAVA environment that provided a JIT or
JNI capability.  You could also see this in an incremental
compilation environment, such as LISP, FORTH, some BASIC
interpreters, and UCSD pcode.

In general, the exposure of unused symbols should have to be
enabled by a link flag similar to that necessary for local
symbol exposure.  This behaviour appears to be the default
because of ELF, and your argument is that the default is "wrong".


Next case:

> test.lbl.gov: setenv OBJFORMAT aout
> test.lbl.gov: make
> ...
> test.lbl.gov: !fil
> file *o
> a.o:  FreeBSD/i386 object not stripped
> a.so: FreeBSD/i386 PIC object not stripped
> b.o:  FreeBSD/i386 object not stripped
> b.so: FreeBSD/i386 PIC object not stripped
> c.o:  FreeBSD/i386 object not stripped
> c.so: FreeBSD/i386 PIC object not stripped
> test.lbl.gov:  cc -o test test.c -ltest
> test.lbl.gov:

This behaviour is desirable, yet it is undesirable.

Consider the case of a program depending on a library dependent
library, ie:


prog:
	main()
	{
		lib_func();
	}

first_order_lib:
	lib_func()
	{
		other_lib_func();
	}

second_order_lib:
	other_lib_func()
	{
		...
	}


In the a.out linkage case, due to a symbol stacking bug that is
not easily resolved (I estimate about 40 hours of work would be
required), the a.out linker acts as if the image linking process
were engaging in dlopen(3) RTLD_LAZY style symbol resolution, not
RTLD_NOW symbol resolution.

As a result, it's possible to build prog, link it against the
first_order_lib library, and then fail at runtime due to the lack
of second_order_lib.

This particular problem poses a great obstable to using FreeBSD
as a product developement environment, since even if you test,
you are unlikely to achieve full code coverage, and all it takes
is a single unexercised code path to core your program.


This is arguably a problem with the programmer who wrote the code
not doing a correct functional abstraction of their interfaces,
but it's not fatal to be an idiot (like it should be).  However,
regardless of how the promiscuous use of one library by another
is "hidden" from the linker, the act of hiding the unresolved
*explicitly referenced* dependency at compile time is *wrong*.

So in the case described above, the ELF linker is "right", and the
a.out linker is "wrong": precisely the opposite of the condition
you are complaining about.


Note: if first_order_lib and second_order_lib are static instead
of dynamic, the missing hidden dependency is correctly reported as
an error.


Or is it?

There is a third case, where the dlopen'ed object's behaviour is
*intended* to be variant based on invocation:


serial_terminal_emulator_prog:

	main()
	{
		dlopen( "vt100.so");
	}

	get_port_char()
	{
		...	/* read a character from the serial port*/
	}


network_terminal_emulator_prog:

	main()
	{
		dlopen( "vt100.so");
	}

	get_port_char()
	{
		...	/* read a character from the remote host*/
	}


vt100.so:
	process_input_from_port()
	{
		int ch;
	#ifdef BOGUS_GLUE
		int	(*get_port_char)();

		dlopen( NULL);
		get_port_char = dlsym( "get_port_char");
	#else	/* !BOGUS_GLUE*/
		extern int get_port_char();
	#endif	/* !BOGUS_GLUE*/

		ch = get_port_char();
		...
	}


In this case, the binding to symbols required by the shared object
is supposed to occur later, such that it imports the symbols from
the image that it is linked against.

One could argue that both the ELF and the a.out linkeres were broken
from the perspective that you should be able to directly reference
the get_port_char() function, not by proxy symbol via dlopen+dlsym,
and that the symbol should be resolved from the program's symbol
space as necessary.


Now we "weenie out" and put the async I/O and network I/O code into
shared lirbaries and/or seperately opene'd by a generic emulator
object:

generic_terminal_emulator_prog:

	main()
	{
		dlopen( "network_io.so");
		dlopen( "vt100.so");
	}

And so on.


The argument for this case is that the second dlopen() call should
fail if the symbols it needs are not available (you would use
RTLD_NOW in order to force this behaviour).


Which leaves us looking at how to hid unreferenced shared library
symbols in ELF shared liraries not specifically linked to expose all
symbols, or, because it's likely that libc.so symbols should *all*
be exposed, by default, how to explicitly hide the symbols in your
example.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810232146.OAA06412>