From owner-freebsd-hackers Tue Aug 6 1:14:27 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 812F837B401; Tue, 6 Aug 2002 01:14:12 -0700 (PDT) Received: from mail.imp.ch (mail.imp.ch [157.161.1.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 69F3343E5E; Tue, 6 Aug 2002 01:14:11 -0700 (PDT) (envelope-from mb@imp.ch) Received: from nbs.imp.ch (nbs.imp.ch [157.161.4.7]) by mail.imp.ch (8.12.3/8.12.3) with ESMTP id g768EAbm057916; Tue, 6 Aug 2002 10:14:10 +0200 (CEST) (envelope-from Martin.Blapp@imp.ch) Received: from levais.imp.ch (levais.imp.ch [157.161.4.66]) by nbs.imp.ch (8.12.3/8.12.3) with ESMTP id g768E9YW520160; Tue, 6 Aug 2002 10:14:09 +0200 (MES) Date: Tue, 6 Aug 2002 10:15:53 +0200 (CEST) From: Martin Blapp To: Alexander Kabaev Cc: , , Subject: Help needed. Deadlock in rtld makes openoffice build hang again In-Reply-To: <20020805110611.4292e3d5.ak03@gte.com> Message-ID: <20020806095745.M58571-100000@levais.imp.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi, From 10 builds, about 6 are hanging, and I need to restart them. This is not a usable solution for a package building cluster. I end with a process consuming all CPU resources and hanging for waiting for a lock to get released what never happens. Problem is exit(). Replaceing exit() with _exit() did not help. [Switching to Process 4968, Thread 1] 0x28050784 in sigprocmask () from /usr/libexec/ld-elf.so.1 (gdb) bt #0 0x28050784 in sigprocmask () from /usr/libexec/ld-elf.so.1 #1 0x2804f2d1 in xprintf () from /usr/libexec/ld-elf.so.1 #2 0x2804df78 in find_symdef () from /usr/libexec/ld-elf.so.1 #3 0x2838dbd8 in exit () from /usr/lib/libc_r.so.4 #4 0x08048c77 in _start () I tried to add the following lines as proposed by Alexander Kabaev to libexec/rtld-elf/i386/lockdflt.c > Martin, try to add the loop below to the wlock_acquire function > to make it look more like lock80386_acquire: > while (l->lock != 0) > ; /* Spin */ Now it hangs there ... [Switching to Process 93059, Thread 1] 0x28050923 in wlock_acquire (lock=0x28067000) at /usr/src/libexec/rtld-elf/i386/lockdflt.c:188 188 while (l->lock != 0) (gdb) bt #0 0x28050923 in wlock_acquire (lock=0x28067000) at /usr/src/libexec/rtld-elf/i386/lockdflt.c:188 #1 0x280505ee in wlock_acquire () at /usr/src/libexec/rtld-elf/rtld.c:202 #2 0x2804ee60 in rtld_exit () at /usr/src/libexec/rtld-elf/rtld.c:1428 #3 0x28390bd8 in exit () from /usr/lib/libc_r.so.4 #4 0x08048c77 in _start () (gdb) p l->lock $2 = 2 (gdb) p tmp_oldsigmask $3 = {__bits = {0, 0, 0, 0}} (gdb) p fullsigmask $4 = {__bits = {4294963463, 4294967295, 4294967295, 4294967295}} I tried to do this: (gdb) set l->lock=0 (gdb) c And got this ... /usr/libexec/ld-elf.so.1: Application locking error: 1 readers and 1 writers in dynamic linker. See DLLOCKINIT(3) in manual pages. I'll now try to change it like this: static void wlock_acquire(void *lock) { Lock *l = (Lock *)lock; sigset_t tmp_oldsigmask; for ( ; ; ) { sigprocmask(SIG_BLOCK, &fullsigmask, &tmp_oldsigmask); if (cmpxchgl(0, WAFLAG, &l->lock) == 0) break; sigprocmask(SIG_SETMASK, &tmp_oldsigmask, NULL); + while (l->lock & WAFLAG) + ; /* Spin */ } oldsigmask = tmp_oldsigmask; } Anybody has any clue how to fix this issue ? Martin Martin Blapp, ------------------------------------------------------------------ ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 061 826 93 00: +41 61 826 93 01 PGP: PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E ------------------------------------------------------------------ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message