Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Nov 2003 12:18:33 -0500 (EST)
From:      Daniel Eischen <eischen@vigrid.com>
To:        Marcel Moolenaar <marcel@xcllnt.net>
Cc:        davidxu@freebsd.org
Subject:   Re: KSE/ia64 broken
Message-ID:  <Pine.GSO.4.10.10311161205380.3319-100000@pcnet5.pcnet.com>
In-Reply-To: <20031115193039.GA55917@dhcp01.pn.xcllnt.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 15 Nov 2003, Marcel Moolenaar wrote:

> On Sat, Nov 15, 2003 at 12:36:42PM -0500, Daniel Eischen wrote:
> > On Fri, 14 Nov 2003, Marcel Moolenaar wrote:
> > 
> > > Gang,
> > > 
> > > The following change broke KSE on ia64:
> > > 
> > > --------
> > > revision 1.18
> > > date: 2003/11/08 06:07:04;  author: davidxu;  state: Exp;  lines: +16 -17
> > > Use THR lock instead of KSE lock to avoid scheduler be blocked in spinlock.
> > >  
> > > Reviewed by: deischen
> > > --------
> > > 
> > > We seem to be clobbering the thread structure instead of writing
> > > to the mailbox. This happens at initialization. Can it be that
> > > the change assumes PER_KSE and doesxn't work for PER_THREAD?
> > 
> > I _think_ this may be because rltd-elf (at least for ia64) calls
> > malloc with the rtld lock held.  I'm not sure how to test this
> > theory.
> 
> No worries, I have a way to disproof it :-)
> 
> Staticly linked binaries are as much broken as dynamicly linked
> binaries. So, if we have a rtld problem, it's not the only one:

Are you sure there's not an ia64 kernel bug or ia64 context
restoring bug?  If I enable debug messages in thread/thr_kern.c
(uncomment #define DBG_MSG), I get:

  Found completed thread 6000000000014000, name initial thread
  Continuing thread 6000000000014000 in critical region
  Switching out thread 6000000000014000, state 0
  Found completed thread 6000000000014000, name initial thread
  Switching out thread 6000000000014000, state 0
  Threads in waiting queue:
  Found completed thread 6000000000014000, name initial thread
  Switching out thread 6000000000014000, state 0
  Threads in waiting queue:
    ...

repeatedly.

The first two lines tell us that the thread blocked while in a
critical region and the kernel thinks it is now unblocked.
The critical region may be the malloc spinlock being held
and the reason it blocked perhaps due to a page fault.  Is
it possible that the blocked context is incorrectly marked,
or that it is just not being resumed properly?

-- 
Dan Eischen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.10.10311161205380.3319-100000>