Date: Tue, 6 Aug 2002 10:15:53 +0200 (CEST) From: Martin Blapp <mb@imp.ch> To: Alexander Kabaev <ak03@gte.com> Cc: <openoffice@FreeBSD.ORG>, <jdp@FreeBSD.ORG>, <hackers@FreeBSD.ORG> Subject: Help needed. Deadlock in rtld makes openoffice build hang again Message-ID: <20020806095745.M58571-100000@levais.imp.ch> In-Reply-To: <20020805110611.4292e3d5.ak03@gte.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi,
From 10 builds, about 6 are hanging, and I need to restart them.
This is not a usable solution for a package building cluster.
I end with a process consuming all CPU resources and hanging for
waiting for a lock to get released what never happens.
Problem is exit(). Replaceing exit() with _exit() did not help.
[Switching to Process 4968, Thread 1]
0x28050784 in sigprocmask () from /usr/libexec/ld-elf.so.1
(gdb) bt
#0 0x28050784 in sigprocmask () from /usr/libexec/ld-elf.so.1
#1 0x2804f2d1 in xprintf () from /usr/libexec/ld-elf.so.1
#2 0x2804df78 in find_symdef () from /usr/libexec/ld-elf.so.1
#3 0x2838dbd8 in exit () from /usr/lib/libc_r.so.4
#4 0x08048c77 in _start ()
I tried to add the following lines as proposed by Alexander Kabaev
to libexec/rtld-elf/i386/lockdflt.c
> Martin, try to add the loop below to the wlock_acquire function
> to make it look more like lock80386_acquire:
> while (l->lock != 0)
> ; /* Spin */
Now it hangs there ...
[Switching to Process 93059, Thread 1]
0x28050923 in wlock_acquire (lock=0x28067000)
at /usr/src/libexec/rtld-elf/i386/lockdflt.c:188
188 while (l->lock != 0)
(gdb) bt
#0 0x28050923 in wlock_acquire (lock=0x28067000)
at /usr/src/libexec/rtld-elf/i386/lockdflt.c:188
#1 0x280505ee in wlock_acquire () at /usr/src/libexec/rtld-elf/rtld.c:202
#2 0x2804ee60 in rtld_exit () at /usr/src/libexec/rtld-elf/rtld.c:1428
#3 0x28390bd8 in exit () from /usr/lib/libc_r.so.4
#4 0x08048c77 in _start ()
(gdb) p l->lock
$2 = 2
(gdb) p tmp_oldsigmask
$3 = {__bits = {0, 0, 0, 0}}
(gdb) p fullsigmask
$4 = {__bits = {4294963463, 4294967295, 4294967295, 4294967295}}
I tried to do this:
(gdb) set l->lock=0
(gdb) c
And got this ...
/usr/libexec/ld-elf.so.1: Application locking error: 1 readers
and 1 writers in
dynamic linker. See DLLOCKINIT(3) in manual pages.
I'll now try to change it like this:
static void
wlock_acquire(void *lock)
{
Lock *l = (Lock *)lock;
sigset_t tmp_oldsigmask;
for ( ; ; ) {
sigprocmask(SIG_BLOCK, &fullsigmask, &tmp_oldsigmask);
if (cmpxchgl(0, WAFLAG, &l->lock) == 0)
break;
sigprocmask(SIG_SETMASK, &tmp_oldsigmask, NULL);
+ while (l->lock & WAFLAG)
+ ; /* Spin */
}
oldsigmask = tmp_oldsigmask;
}
Anybody has any clue how to fix this issue ?
Martin
Martin Blapp, <mb@imp.ch> <mbr@FreeBSD.org>
------------------------------------------------------------------
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 061 826 93 00: +41 61 826 93 01
PGP: <finger -l mbr@freebsd.org>
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
------------------------------------------------------------------
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-openoffice" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020806095745.M58571-100000>
