From owner-freebsd-current Wed Nov 1 11: 1: 8 2000 Delivered-To: freebsd-current@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 5658E37B4C5; Wed, 1 Nov 2000 11:01:04 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id OAA07648; Wed, 1 Nov 2000 14:00:36 -0500 (EST) Date: Wed, 1 Nov 2000 14:00:35 -0500 (EST) From: Daniel Eischen To: John Polstra Cc: current@freebsd.org, sobomax@freebsd.org, obrien@freebsd.org, deischen@freebsd.org Subject: Re: ABI is broken?? In-Reply-To: <200011011835.eA1IZl207585@vashon.polstra.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 1 Nov 2000, John Polstra wrote: > In article <3A005026.47B9978C@FreeBSD.org>, > Maxim Sobolev wrote: > > > > I'm not sure what exactly caused this behaviour (I can guess two potential > > victims: O'Brien's changes in crt stuff and recent Polstra's changes in > > libgcc_r), but it seems that some programs built on the previous -current from > > 27 October immediately segfault when I'm trying to run then on system installed > > from today's sources. The segfault disappeared when I recompiled affected > > program. With this message I'm attaching short backtrace. > [...] > > Program received signal SIGSEGV, Segmentation fault. > > 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 > > (gdb) bt > > #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 > > #1 0x806e782 in __register_frame_info () > > #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 > > #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 > > #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 > > Here are all the random facts which, when put together, explain what > is going on. > > Your old application was (like all -pthread programs) linked > with "/usr/lib/libgcc_r.a". That library contains a function > "__register_frame_info" which uses some of the facilities of the > pthreads library "libc_r". > > The pthreads library has to be initialized before it can be used, by > a call to _thread_init. If some functions such as pthread_mutex_lock > are called before the library has been initialized, a segmentation > violation results. > > _thread_init is called automatically from libc_r's _init function > when the dynamic linker loads the library. Unfortunately, that > isn't early enough. libgcc_r is the first thing to be initialized, > and it calls pthread_mutex_lock before _thread_init has been called. > Or rather I should say that OLD versions of libgcc_r did that -- > because they were buggy. > > In other words, your old application was linked with a buggy version > of libgcc_r, but it didn't become apparent until now. > > It didn't become apparent until now because our crtbegin.o and > crtend.o were also buggy. They failed to call __register_frame_info. > This was a problem for C++ programs using exceptions, especially when > the gcc port was used and DWARF2 exception handling was selected. > > Now we have fixed crtbegin.o and crtend.o, and we have fixed > libgcc_r.a. But it causes problems for your old application because > the new crtbegin.o and crtend.o (linked into the new shared libraries > such as libc_r) call __register_frame_info in your old, buggy, > statically linked libgcc_r.a. > > Are you dizzy yet? Yes ;-) > To sum up, your old executable contains the bug but > it wasn't triggered until the recent changes. > > Now, what can or should we do about this? Arguably we should simply > say in the release notes, "Relink your old multithreaded applications. > They had a bug which is now fixed." But if there are binary-only > commercial apps which exhibit the problem, this solution is useless. > I don't know whether there are any such apps, but I doubt it. N.B., > Linux apps don't count because they were never linked with our > libgcc_r in the first place. > > Or we can try to work around it, but there aren't any perfectly nice > ways to do so. Here are some possibilities: > > - Put a hack in the threads library so that whenever > pthread_mutex_lock is called it checks to make sure that the > threads library has been initialized, and if not, it calls > _thread_init. This is a poor solution because it adds overhead to > a rather performance-critical function -- though admittedly the > overhead is very small. Another potential problem is that there > could be a race condition if several threads all called > pthread_mutex_lock at once before the threads library had been > initialized. I don't think the race condition would materialize, > though, since the first call would come from libgcc_r, well before > the application had gotten control. > > - Put a hack into the dynamic linker to call _thread_init very early > if that symbol was defined. I like this solution even less, > because it's too hackish. The dynamic linker isn't the place for > special hooks like that. > > - Put a hack into crtbegin.o or crtend.o. But we are using the > standard GNU versions of these, and I really really don't want to > change that. In any case, it's the wrong place for the > work-around. > > Overall I would lean toward putting the hack into pthread_mutex_lock. > Comments? If that's the lesser evil, then I guess it's OK with me. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message