Date: Mon, 17 Nov 2003 09:35:21 +0800 From: David Xu <davidxu@viatech.com.cn> To: deischen@freebsd.org Cc: Marcel Moolenaar <marcel@xcllnt.net> Subject: Re: KSE/ia64 broken Message-ID: <3FB825D9.6050407@viatech.com.cn> In-Reply-To: <Pine.GSO.4.10.10311161951020.11563-100000@pcnet5.pcnet.com> References: <Pine.GSO.4.10.10311161951020.11563-100000@pcnet5.pcnet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Eischen wrote: >On Sun, 16 Nov 2003, Marcel Moolenaar wrote: > > > >>On Sun, Nov 16, 2003 at 04:55:44PM -0500, Daniel Eischen wrote: >> >> >>>On Sun, 16 Nov 2003, Marcel Moolenaar wrote: >>> >>> >>> >>>>>The same thread (main thread) is being resumed over and over again >>>>>which shouldn't happen for this simple program. >>>>> >>>>> >>>>Can it be that the thread is deadlocked? There's no forward progress. >>>>There's only context switching... >>>> >>>> >>>I don't think so. I think the thread stack/frame is corrupted, either >>>because it is copied out or resumed incorrectly. I'll do some more >>>digging. >>> >>> >>I loaded it up in the simulator. The thread is continuously being >>resumed because of a page fault that results in an upcall, which >>ends up in the UTS, which selects the same thread, which causes the >>page fault again. >> >> > >Is it possible the thread is marked for an upcall when the >page is not yet present?] > Current, on IA64, page fault never schedules an upcall, I have only enabled it on i386, and peter enabled it on AMD64. > > > >>The page fault is the result of a bogus address >>that in the debugger results in a SIGILL. However, when we don't >>run in a debugger, the SIGILL doesn't get handled. Hence the non- >>forward progress. >> >>The extensive debug information I posted earlier is therefore still >>relevant. Now that I have things running in the simulator I'll see >>if I can figure out where things go wrong. Chances are that we now >>have an upcall where we didn't have one before and that it exposes >>incomplete state (such as a thread pointer that hasn't been set). >>The incomplete state causes the corruption we're seeing. >> >> > >This is kind of what I was thinking too. > > The returned memory block from malloc() is being used by unknown code, I don't know why it occurs, but if you waste a memory block by applying the following patch for thr_alloc(), then things work: Index: thr_kern.c =================================================================== RCS file: /home/ncvs/src/lib/libpthread/thread/thr_kern.c,v retrieving revision 1.102 diff -u -r1.102 thr_kern.c --- thr_kern.c 9 Nov 2003 00:37:14 -0000 1.102 +++ thr_kern.c 17 Nov 2003 01:24:59 -0000 @@ -2422,6 +2422,8 @@ struct pthread *thread = NULL; int i; + malloc(sizeof(struct pthread)); + if (curthread != NULL) { if (GC_NEEDED()) _thr_gc(curthread);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FB825D9.6050407>