From owner-freebsd-threads@FreeBSD.ORG Sun Nov 16 16:54:28 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3A15916A4CE; Sun, 16 Nov 2003 16:54:28 -0800 (PST) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4D00643FEA; Sun, 16 Nov 2003 16:54:27 -0800 (PST) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id hAH0sO1G012494; Sun, 16 Nov 2003 19:54:26 -0500 (EST) Date: Sun, 16 Nov 2003 19:54:24 -0500 (EST) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Marcel Moolenaar In-Reply-To: <20031116222200.GA61279@dhcp01.pn.xcllnt.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: threads@freebsd.org cc: davidxu@freebsd.org Subject: Re: KSE/ia64 broken X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: deischen@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2003 00:54:28 -0000 On Sun, 16 Nov 2003, Marcel Moolenaar wrote: > On Sun, Nov 16, 2003 at 04:55:44PM -0500, Daniel Eischen wrote: > > On Sun, 16 Nov 2003, Marcel Moolenaar wrote: > > > > > > The same thread (main thread) is being resumed over and over again > > > > which shouldn't happen for this simple program. > > > > > > Can it be that the thread is deadlocked? There's no forward progress. > > > There's only context switching... > > > > I don't think so. I think the thread stack/frame is corrupted, either > > because it is copied out or resumed incorrectly. I'll do some more > > digging. > > I loaded it up in the simulator. The thread is continuously being > resumed because of a page fault that results in an upcall, which > ends up in the UTS, which selects the same thread, which causes the > page fault again. Is it possible the thread is marked for an upcall when the page is not yet present? > The page fault is the result of a bogus address > that in the debugger results in a SIGILL. However, when we don't > run in a debugger, the SIGILL doesn't get handled. Hence the non- > forward progress. > > The extensive debug information I posted earlier is therefore still > relevant. Now that I have things running in the simulator I'll see > if I can figure out where things go wrong. Chances are that we now > have an upcall where we didn't have one before and that it exposes > incomplete state (such as a thread pointer that hasn't been set). > The incomplete state causes the corruption we're seeing. This is kind of what I was thinking too. > Anyway: I'll be digging too... I'm not getting threads@ mail any longer, just the CC. Are you? -- Dan Eischen