Date: Sun, 16 Nov 2003 12:18:33 -0500 (EST) From: Daniel Eischen <eischen@vigrid.com> To: Marcel Moolenaar <marcel@xcllnt.net> Cc: davidxu@freebsd.org Subject: Re: KSE/ia64 broken Message-ID: <Pine.GSO.4.10.10311161205380.3319-100000@pcnet5.pcnet.com> In-Reply-To: <20031115193039.GA55917@dhcp01.pn.xcllnt.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 15 Nov 2003, Marcel Moolenaar wrote: > On Sat, Nov 15, 2003 at 12:36:42PM -0500, Daniel Eischen wrote: > > On Fri, 14 Nov 2003, Marcel Moolenaar wrote: > > > > > Gang, > > > > > > The following change broke KSE on ia64: > > > > > > -------- > > > revision 1.18 > > > date: 2003/11/08 06:07:04; author: davidxu; state: Exp; lines: +16 -17 > > > Use THR lock instead of KSE lock to avoid scheduler be blocked in spinlock. > > > > > > Reviewed by: deischen > > > -------- > > > > > > We seem to be clobbering the thread structure instead of writing > > > to the mailbox. This happens at initialization. Can it be that > > > the change assumes PER_KSE and doesxn't work for PER_THREAD? > > > > I _think_ this may be because rltd-elf (at least for ia64) calls > > malloc with the rtld lock held. I'm not sure how to test this > > theory. > > No worries, I have a way to disproof it :-) > > Staticly linked binaries are as much broken as dynamicly linked > binaries. So, if we have a rtld problem, it's not the only one: Are you sure there's not an ia64 kernel bug or ia64 context restoring bug? If I enable debug messages in thread/thr_kern.c (uncomment #define DBG_MSG), I get: Found completed thread 6000000000014000, name initial thread Continuing thread 6000000000014000 in critical region Switching out thread 6000000000014000, state 0 Found completed thread 6000000000014000, name initial thread Switching out thread 6000000000014000, state 0 Threads in waiting queue: Found completed thread 6000000000014000, name initial thread Switching out thread 6000000000014000, state 0 Threads in waiting queue: ... repeatedly. The first two lines tell us that the thread blocked while in a critical region and the kernel thinks it is now unblocked. The critical region may be the malloc spinlock being held and the reason it blocked perhaps due to a page fault. Is it possible that the blocked context is incorrectly marked, or that it is just not being resumed properly? -- Dan Eischen
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.10.10311161205380.3319-100000>