From owner-freebsd-threads@FreeBSD.ORG Sun Apr 13 21:55:42 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B48A37B407; Sun, 13 Apr 2003 21:55:42 -0700 (PDT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE0B943F75; Sun, 13 Apr 2003 21:55:40 -0700 (PDT) (envelope-from eischen@pcnet1.pcnet.com) Received: from pcnet1.pcnet.com (localhost [127.0.0.1]) by mail.pcnet.com (8.12.8/8.12.1) with ESMTP id h3E4teBg016082; Mon, 14 Apr 2003 00:55:40 -0400 (EDT) Received: from localhost (eischen@localhost)h3E4tdhM016079; Mon, 14 Apr 2003 00:55:39 -0400 (EDT) Date: Mon, 14 Apr 2003 00:55:39 -0400 (EDT) From: Daniel Eischen To: David Xu In-Reply-To: <006001c3023a$65fe01d0$f001a8c0@davidw2k> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: libpthread patch X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2003 04:55:42 -0000 On Mon, 14 Apr 2003, David Xu wrote: > After tested Daniel's pthread patch, I found 50% of ACE test > program are core dumpped on my machine. So I studied the > libpthread source code after applied the patch, I found that > the main problem is thread state transition is not atomic, > for example in thr_mutex: I have an updated version that works for all but the Cached_Conn_Test (except for the ones that don't work with libc_r). I narrowed down the problem to joint and am working on a fix for that. > mutex_queue_enq(*m, curthread); > curthread->data.mutex = *m; > /* > This thread is active and is in a critical > region (holding the mutex lock); we should > be able to safely set the state. > */ > THR_SET_STATE(curthread, PS_MUTEX_WAIT); > > /* Unlock the mutex structure: */ > THR_LOCK_RELEASE(curthread, &(*m)->m_lock); > /* Schedule the next thread: */ > _thr_sched_switch(curthread); > > > thread sets its state to PS_MUTEX_WAIT, and call _thr_sched_switch, > but it is not under scheduler lock, so there is a race between > THR_SET_STATE and thr_sched_switch. Unlike libc_r, I disassociated thread state from being in any run or waiting queue. So it is OK if another thread comes along and makes the thread runnable. Making the thread runnable requires taking the scheduler lock, so by the time thr_sched_switch takes the scheduler lock, the thread could be runnable again. There are flags to indicate whether the thread is in the run queue or the wait queue, so we rely on those flags before adding/removing threads from those queues (instead of state). > I have inserted _kse_critical_enter() before THR_SET_STATE, > the code looks as following: > > mutex_queue_enq(*m, curthread); > curthread->data.mutex = *m; > _kse_critical_enter(); > /* > This thread is active and is in a critical > region (holding the mutex lock); we should > be able to safely set the state. > */ > THR_SET_STATE(curthread, PS_MUTEX_WAIT); > > /* Unlock the mutex structure: */ > THR_LOCK_RELEASE(curthread, &(*m)->m_lock); > /* Schedule the next thread: */ > _thr_sched_switch(curthread); > > I also commented out most code in thr_lock_wait() and > thr_lock_wakeup(), I think without better scheduler lock, > these code has race condition, and in most case will > this cause a thread be reinserted into runq while it > is already in this queue. It shouldn't any more :-) > now, I can run ACE test programs without any core dumpped, > and only the following program are failed: > > Cached_Conn_Test > Conn_Test > MT_Reactor_Timer_Test > Malloc_Test > Process_Strategy_Test > Thread_Pool_Test > > a complete log file is at: > http://people.freebsd.org/~davidxu/run_test.log > > the libpthread package I modified is at: > http://people.freebsd.org/~davidxu/libpthread.tgz I'll take a look at it to make sure I haven't missed anything. > Also, I can run crew test program without any problem. The next thing I want to try is KDE. It doesn't currently work, but I'm hopeful that with your changes and my last set of changes it will. > I think the whole scheduler lock should be reworked > to allow state transition is in atomic, my change is > not SMP safe, only works on UP, because kse_critical_enter > is only works for UP system. If we fixed this scheduler lock > problem, I think the libpthread will be stable enough. I thought that we might want to make changing the state and switching to the scheduler all under the same lock, but decided not to go that way. I was concerned that by requiring the scheduler lock to be held while atomically changing the thread state and switching to the scheduler might introduce weird locking order and wanted to avoid that. For instance, inherited priority adjustments and condition signal broadcasts. We should look at it again though, because it may make sense in most cases. I'll try to get a new patch set out after I fix pthread_join(). I should have posted my earlier version but I thought I could fix join quickly. It turns out it's a bit more tricky than I thought. One question about kse_create(). I was thinking about implementing pthread_setconcurrency(). We need to return an error if the requested concurrency level cannot be achieved. If we do return an error, though, we should not end up creating _any_ additional KSEs. If someone does: ret = pthread_setconcurrency(5); How do we know how many KSEs (within the same KSEG) we can create? And if we use ncpu or some other knob, it doesn't automatically mean we _will_ get the number of KSEs requested. We can currently only make one at a time, so we may end up with some additional KSEs but less than requested -- but we still have to return an error to the caller. -- Dan Eischen