From owner-freebsd-threads@FreeBSD.ORG Sun Nov 16 17:29:50 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B01AB16A4CE; Sun, 16 Nov 2003 17:29:50 -0800 (PST) Received: from exchhz01.viatech.com.cn (ip-167-164-97-218.anlai.com [218.97.164.167]) by mx1.FreeBSD.org (Postfix) with ESMTP id BCF7A43FE3; Sun, 16 Nov 2003 17:29:44 -0800 (PST) (envelope-from davidxu@viatech.com.cn) Received: from viatech.com.cn (ip-240-1-168-192.rev.dyxnet.com [192.168.1.240]) by exchhz01.viatech.com.cn with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id WZMFW3WP; Mon, 17 Nov 2003 09:09:40 +0800 Message-ID: <3FB825D9.6050407@viatech.com.cn> Date: Mon, 17 Nov 2003 09:35:21 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5b) Gecko/20030723 Thunderbird/0.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: deischen@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org cc: davidxu@freebsd.org cc: Marcel Moolenaar Subject: Re: KSE/ia64 broken X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2003 01:29:50 -0000 Daniel Eischen wrote: >On Sun, 16 Nov 2003, Marcel Moolenaar wrote: > > > >>On Sun, Nov 16, 2003 at 04:55:44PM -0500, Daniel Eischen wrote: >> >> >>>On Sun, 16 Nov 2003, Marcel Moolenaar wrote: >>> >>> >>> >>>>>The same thread (main thread) is being resumed over and over again >>>>>which shouldn't happen for this simple program. >>>>> >>>>> >>>>Can it be that the thread is deadlocked? There's no forward progress. >>>>There's only context switching... >>>> >>>> >>>I don't think so. I think the thread stack/frame is corrupted, either >>>because it is copied out or resumed incorrectly. I'll do some more >>>digging. >>> >>> >>I loaded it up in the simulator. The thread is continuously being >>resumed because of a page fault that results in an upcall, which >>ends up in the UTS, which selects the same thread, which causes the >>page fault again. >> >> > >Is it possible the thread is marked for an upcall when the >page is not yet present?] > Current, on IA64, page fault never schedules an upcall, I have only enabled it on i386, and peter enabled it on AMD64. > > > >>The page fault is the result of a bogus address >>that in the debugger results in a SIGILL. However, when we don't >>run in a debugger, the SIGILL doesn't get handled. Hence the non- >>forward progress. >> >>The extensive debug information I posted earlier is therefore still >>relevant. Now that I have things running in the simulator I'll see >>if I can figure out where things go wrong. Chances are that we now >>have an upcall where we didn't have one before and that it exposes >>incomplete state (such as a thread pointer that hasn't been set). >>The incomplete state causes the corruption we're seeing. >> >> > >This is kind of what I was thinking too. > > The returned memory block from malloc() is being used by unknown code, I don't know why it occurs, but if you waste a memory block by applying the following patch for thr_alloc(), then things work: Index: thr_kern.c =================================================================== RCS file: /home/ncvs/src/lib/libpthread/thread/thr_kern.c,v retrieving revision 1.102 diff -u -r1.102 thr_kern.c --- thr_kern.c 9 Nov 2003 00:37:14 -0000 1.102 +++ thr_kern.c 17 Nov 2003 01:24:59 -0000 @@ -2422,6 +2422,8 @@ struct pthread *thread = NULL; int i; + malloc(sizeof(struct pthread)); + if (curthread != NULL) { if (GC_NEEDED()) _thr_gc(curthread);