From owner-freebsd-threads@FreeBSD.ORG Sun Nov 16 09:18:35 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3725616A4CE; Sun, 16 Nov 2003 09:18:35 -0800 (PST) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2E1B643FCB; Sun, 16 Nov 2003 09:18:34 -0800 (PST) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id hAGHIX1G006463; Sun, 16 Nov 2003 12:18:33 -0500 (EST) Date: Sun, 16 Nov 2003 12:18:33 -0500 (EST) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Marcel Moolenaar In-Reply-To: <20031115193039.GA55917@dhcp01.pn.xcllnt.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: threads@freebsd.org cc: davidxu@freebsd.org Subject: Re: KSE/ia64 broken X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: deischen@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2003 17:18:35 -0000 On Sat, 15 Nov 2003, Marcel Moolenaar wrote: > On Sat, Nov 15, 2003 at 12:36:42PM -0500, Daniel Eischen wrote: > > On Fri, 14 Nov 2003, Marcel Moolenaar wrote: > > > > > Gang, > > > > > > The following change broke KSE on ia64: > > > > > > -------- > > > revision 1.18 > > > date: 2003/11/08 06:07:04; author: davidxu; state: Exp; lines: +16 -17 > > > Use THR lock instead of KSE lock to avoid scheduler be blocked in spinlock. > > > > > > Reviewed by: deischen > > > -------- > > > > > > We seem to be clobbering the thread structure instead of writing > > > to the mailbox. This happens at initialization. Can it be that > > > the change assumes PER_KSE and doesxn't work for PER_THREAD? > > > > I _think_ this may be because rltd-elf (at least for ia64) calls > > malloc with the rtld lock held. I'm not sure how to test this > > theory. > > No worries, I have a way to disproof it :-) > > Staticly linked binaries are as much broken as dynamicly linked > binaries. So, if we have a rtld problem, it's not the only one: Are you sure there's not an ia64 kernel bug or ia64 context restoring bug? If I enable debug messages in thread/thr_kern.c (uncomment #define DBG_MSG), I get: Found completed thread 6000000000014000, name initial thread Continuing thread 6000000000014000 in critical region Switching out thread 6000000000014000, state 0 Found completed thread 6000000000014000, name initial thread Switching out thread 6000000000014000, state 0 Threads in waiting queue: Found completed thread 6000000000014000, name initial thread Switching out thread 6000000000014000, state 0 Threads in waiting queue: ... repeatedly. The first two lines tell us that the thread blocked while in a critical region and the kernel thinks it is now unblocked. The critical region may be the malloc spinlock being held and the reason it blocked perhaps due to a page fault. Is it possible that the blocked context is incorrectly marked, or that it is just not being resumed properly? -- Dan Eischen