From owner-freebsd-hackers Wed Aug 7 16:31: 9 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9150937B405 for ; Wed, 7 Aug 2002 16:31:00 -0700 (PDT) Received: from wall.polstra.com (wall-gw.polstra.com [206.213.73.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7AE8E43E42 for ; Wed, 7 Aug 2002 16:30:59 -0700 (PDT) (envelope-from jdp@polstra.com) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.11.3/8.11.3) with ESMTP id g77NUrf15226; Wed, 7 Aug 2002 16:30:53 -0700 (PDT) (envelope-from jdp@vashon.polstra.com) Received: (from jdp@localhost) by vashon.polstra.com (8.12.4/8.12.4/Submit) id g77NUqVW058833; Wed, 7 Aug 2002 16:30:52 -0700 (PDT) (envelope-from jdp) Date: Wed, 7 Aug 2002 16:30:52 -0700 (PDT) Message-Id: <200208072330.g77NUqVW058833@vashon.polstra.com> To: hackers@freebsd.org From: John Polstra Cc: dillon@apollo.backplane.com, mb@imp.ch Subject: Re: Help needed. Deadlock in rtld makes openoffice build hang ag In-Reply-To: <200208072156.g77Lulgi000927@apollo.backplane.com> References: <20020807231439.F58571-100000@levais.imp.ch> <200208072140.g77LeL2b000769@apollo.backplane.com> <200208072146.g77Lkke8058706@vashon.polstra.com> <200208072156.g77Lulgi000927@apollo.backplane.com> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In article <200208072156.g77Lulgi000927@apollo.backplane.com>, Matthew Dillon wrote: > :Yes, that was the original idea behind the sleeps. But in practice > :it doesn't work, because the rtld isn't really linked with the rest > :of the application. When the rtld calls nanosleep(), it's getting > :the real system call rather than the threads package's version. > : > :John > > So the only solution may be a callback vector to switch threads that > the application can set. Probably, but there are some real gotchas with that approach too. It is what I did initially in the early revisions of lockdflt.c. But some problems came up -- see the log messages for revisions 1.1 thru 1.5 of that file. The worst problem was that when the rtld called one of the application-supplied locking functions, that function might reference a symbol which needed lazy binding. That caused it to call back into the rtld, which recursively tried to call the locking function, ad infinitum. I tried various ways to make sure the locking functions were pre-bound, but none were successful in all cases. > Martin's earlier comment in regards to the problem occuring in exit() > led me to search for 'atexit' use inside rtld-elf. I found a > comment in rtld_start.S (for i386) but no direct link. If there is > an at-exit function it could be deadlocking against a thread trying > to cause the program to exit. Odd, but possible. There is an atexit function. It's set up by crt0 or crt1 -- I forget which. It causes "rtld_exit" in rtld.c to be called via the atexit mechanism. Martin, it might be worth a try to remove the locking calls from rtld_exit and see if that fixes the problem. > (I'm getting into this conversation late. I am actually on vacation > and will not have email access for about a week starting in about a day). I'm going to be a bit tied up myself for the next few days, unfortunately. John -- John Polstra John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message