Date: Wed, 1 Nov 2000 10:35:47 -0800 (PST) From: John Polstra <jdp@polstra.com> To: current@freebsd.org Cc: sobomax@freebsd.org, obrien@freebsd.org, deischen@freebsd.org Subject: Re: ABI is broken?? Message-ID: <200011011835.eA1IZl207585@vashon.polstra.com> In-Reply-To: <3A005026.47B9978C@FreeBSD.org> References: <3A005026.47B9978C@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
In article <3A005026.47B9978C@FreeBSD.org>, Maxim Sobolev <sobomax@FreeBSD.ORG> wrote: > > I'm not sure what exactly caused this behaviour (I can guess two potential > victims: O'Brien's changes in crt stuff and recent Polstra's changes in > libgcc_r), but it seems that some programs built on the previous -current from > 27 October immediately segfault when I'm trying to run then on system installed > from today's sources. The segfault disappeared when I recompiled affected > program. With this message I'm attaching short backtrace. [...] > Program received signal SIGSEGV, Segmentation fault. > 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 > (gdb) bt > #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 > #1 0x806e782 in __register_frame_info () > #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 > #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 > #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 Here are all the random facts which, when put together, explain what is going on. Your old application was (like all -pthread programs) linked with "/usr/lib/libgcc_r.a". That library contains a function "__register_frame_info" which uses some of the facilities of the pthreads library "libc_r". The pthreads library has to be initialized before it can be used, by a call to _thread_init. If some functions such as pthread_mutex_lock are called before the library has been initialized, a segmentation violation results. _thread_init is called automatically from libc_r's _init function when the dynamic linker loads the library. Unfortunately, that isn't early enough. libgcc_r is the first thing to be initialized, and it calls pthread_mutex_lock before _thread_init has been called. Or rather I should say that OLD versions of libgcc_r did that -- because they were buggy. In other words, your old application was linked with a buggy version of libgcc_r, but it didn't become apparent until now. It didn't become apparent until now because our crtbegin.o and crtend.o were also buggy. They failed to call __register_frame_info. This was a problem for C++ programs using exceptions, especially when the gcc port was used and DWARF2 exception handling was selected. Now we have fixed crtbegin.o and crtend.o, and we have fixed libgcc_r.a. But it causes problems for your old application because the new crtbegin.o and crtend.o (linked into the new shared libraries such as libc_r) call __register_frame_info in your old, buggy, statically linked libgcc_r.a. Are you dizzy yet? To sum up, your old executable contains the bug but it wasn't triggered until the recent changes. Now, what can or should we do about this? Arguably we should simply say in the release notes, "Relink your old multithreaded applications. They had a bug which is now fixed." But if there are binary-only commercial apps which exhibit the problem, this solution is useless. I don't know whether there are any such apps, but I doubt it. N.B., Linux apps don't count because they were never linked with our libgcc_r in the first place. Or we can try to work around it, but there aren't any perfectly nice ways to do so. Here are some possibilities: - Put a hack in the threads library so that whenever pthread_mutex_lock is called it checks to make sure that the threads library has been initialized, and if not, it calls _thread_init. This is a poor solution because it adds overhead to a rather performance-critical function -- though admittedly the overhead is very small. Another potential problem is that there could be a race condition if several threads all called pthread_mutex_lock at once before the threads library had been initialized. I don't think the race condition would materialize, though, since the first call would come from libgcc_r, well before the application had gotten control. - Put a hack into the dynamic linker to call _thread_init very early if that symbol was defined. I like this solution even less, because it's too hackish. The dynamic linker isn't the place for special hooks like that. - Put a hack into crtbegin.o or crtend.o. But we are using the standard GNU versions of these, and I really really don't want to change that. In any case, it's the wrong place for the work-around. Overall I would lean toward putting the hack into pthread_mutex_lock. Comments? John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200011011835.eA1IZl207585>