From owner-freebsd-sparc64@FreeBSD.ORG Wed Jun 4 20:09:44 2003 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AE3FD37B401; Wed, 4 Jun 2003 20:09:44 -0700 (PDT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E65D43FA3; Wed, 4 Jun 2003 20:09:43 -0700 (PDT) (envelope-from eischen@pcnet.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.8/8.12.1) with ESMTP id h5539fNc007265; Wed, 4 Jun 2003 23:09:42 -0400 (EDT) Date: Wed, 4 Jun 2003 23:09:41 -0400 (EDT) From: Daniel Eischen To: Thomas Moestl In-Reply-To: <20030604235607.GA682@crow.dom2ip.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Daniel Eischen cc: sparc64@FreeBSD.org cc: current@FreeBSD.org cc: Kris Kennaway Subject: Re: phoenix crash in libc_r on sparc64 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jun 2003 03:09:45 -0000 On Thu, 5 Jun 2003, Thomas Moestl wrote: > On Wed, 2003/06/04 at 00:30:36 -0700, Kris Kennaway wrote: > > On Mon, Jun 02, 2003 at 04:15:43PM -0700, Kris Kennaway wrote: > > > phoenix on my sparc64 crashed while idle with the following: > > > > > > Fatal error '_waitq_insert: Already in queue' at line 321 in file /usr/src/lib/libc_r/uthread/uthread_priority_queue.c (errno = 2) > > > > > > Any ideas? > > It should have dropped a core - can you please take a look at it with > gdb? > > > One of the libc_r tests seems to hang: > > > > Test static library: > > -------------------------------------------------------------------------- > > Test c_user c_system c_total chng > > passed/FAILED h_user h_system h_total % chng > > -------------------------------------------------------------------------- > > hello_d 0.00 0.02 0.02 > > passed > > -------------------------------------------------------------------------- > > hello_s 0.00 0.02 0.02 > > passed > > -------------------------------------------------------------------------- > > join_leak_d 0.77 0.18 0.95 > > passed > > -------------------------------------------------------------------------- > > mutex_d 9.08 92.42 101.50 > > passed > > -------------------------------------------------------------------------- > > sem_d 0.01 0.02 0.02 > > passed > > -------------------------------------------------------------------------- > > sigsuspend_d 0.00 0.02 0.02 > > passed > > -------------------------------------------------------------------------- > > sigwait_d 0.00 0.02 0.02 > > *** FAILED *** This one is suppose to kill the process at the end. > > -------------------------------------------------------------------------- > > guard_s.pl > > > > It's been sitting there for hours now. > > This an unfortunate failure mode, which is caused by a fault on the > stack while all signals are masked (by libc_r internals, I assume); > the kernel will fail to store the user register windows on the stack, > and because SIGILL is blocked, it cannot notify (or terminate) the > process and is stuck trying to copy out the register windows over and > over. > > > P.S. Why do 3 of the tests even fail on i386? > > The guard test includes constants which are machine- and > compiler-specific, probably this broke due to a gcc upgrade. > > The sigwait test is killed by it's own SIGUSR1, and this behaviour > actually looks correct to me (but I could easily be wrong, since the > signal behaviour of pthreads seems to be quite complex). Right, that is part of the test. I guess the expect script doesn't know that though. > The propagate test failure is due to problems in libc (failing to > use the underscored versions of functions overridden in libc_r). The > attached patch should fix that; Daniel, does this look OK to you? Yes, if those functions are used in libc, then that is what [un-]namespace.h is for. Any overridden functions in libc_r must use single underscore versions so that libc_r won't introduce cancellation points in places where there shouldn't be any or invoke signal handlers while a library-private lock is held. -- Dan Eischen