Date: Sun, 27 Apr 2003 09:51:04 +0800 From: "David Xu" <davidxu@freebsd.org> To: "Daniel Eischen" <eischen@pcnet1.pcnet.com>, "David Xu" <davidxu@viatech.com.cn> Cc: freebsd-threads@freebsd.org Subject: Re: SMPing libpthread Message-ID: <006501c30c5f$77fe5950$0701a8c0@tiger> References: <Pine.GSO.4.10.10304260947370.1812-100000@pcnet1.pcnet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message -----=20 From: "Daniel Eischen" <eischen@pcnet1.pcnet.com> To: "David Xu" <davidxu@viatech.com.cn> Cc: <freebsd-threads@freebsd.org> Sent: Saturday, April 26, 2003 10:15 PM Subject: Re: SMPing libpthread > On Sat, 26 Apr 2003, David Xu wrote: > > ----- Original Message -----=20 > > From: "Daniel Eischen" <eischen@pcnet1.pcnet.com> > > To: "David Xu" <davidxu@freebsd.org> > > Cc: <freebsd-threads@freebsd.org> > > Sent: Saturday, April 26, 2003 12:52 PM > > Subject: Re: SMPing libpthread > >=20 > >=20 > > > On Thu, 24 Apr 2003, David Xu wrote: > > >=20 > > > > I have put a patch to enable userland support SMP scheduling.=20 > > > > http://people.freebsd.org/~davidxu/libpthread_smp.diff > > > > The patch works on my SMP machine, but not fully tested, > > > > and I will work on idle kses stuffs. At least, it seems > > > > the SMP speed is not slower than UP. :-) > > >=20 > > > David, I noticed that we hold the scheduling lock before and > > > after calling the scheduler. Is this necessary? And if so, > > > is it necessary to hold hold it after return from the > > > scheduling switch? One you're in the scheduler, and choose > > > another thread (releasing the lock after doing so), shouldn't > > > you be able to switch to it without having the lock? > > >=20 > >=20 > > I am trying to do things in atomic way. because I found on SMP, = thread > > state is always out of synchronized in current code. I want to make > > state change and context switch in an atomic operation, when thread = is > > in state transit, no other threads can use = _thr_setrunnable_unlocked() etc > > to change its state, this avoids lots of "thread already in = priority queue" > > messages, and if I test the "in priority queue" flag every where, = I sometimes > > lose chance to wakeup a thread because of some unknown races, whole > > process just suddenly stops there. >=20 > I think the problem is that the signal code doesn't properly > check the thread's state and whether it's currently active > or not. Sometimes, the signal code thinks that if the state > is not PS_RUNNING that the thread is inactive. It should > really be checking td->active and the UTS should make sure > it sets and clears this when switching the thread in and > out. I havn't tested signal code. >=20 > The other problem is that the mutex and cond variable code > doesn't take the scheduling lock when changing the thread's > state (it relies on holding the mutex or cv low-level lock > to protect the thread state). This is one of the things > I was going to fix. It is OK to take the scheduling lock > while holding the low-level mutex lock; we don't ever take > them in reverse (that I can see). >=20 One thing causes wrong thread state and SIGSEGV in mutex code under SMP is after a thread A put itself on a sync queue and it is then swapped = out by kernel,=20 a thread B on another CPU unlocks a mutex and tries to = _thr_setrunnable_unlocked thread A, and put it on RUNQ, next time thread A is selected to run, = because it is still in kernel not in userland, it causes SIGSEGV, other thread = state code have same problem. My code prevents this from happen, there is no = window opened for this issue. Thread A holds a low level lock and then holds a = schedule lock, unlocks low lever mutex lock, then does context switch, because when schedule = lock is held,=20 it is in critical region, kernel won't schedule an upcall, and thread B = want to=20 _thr_setrunnable_unlocked A should hold schedule lock, but it is = impossible until A is=20 switchout, so there is no race. In my patch, only a PS_RUNNING thread = will be blocked in kernel, however because thread state is still PS_RUNNING, =20 _thr_setrunnable_unlocked needn't to do any thing about it, I had code = already in that patch. > > I am just copying the idea from current kernel code, I don't intend = to inventing > > new things. :-) > >=20 > > One deadlock I found is in kse_sched_multi(), there is an = optimization: > > if (curthread =3D=3D NULL) > > ; /* Nothing to do here. */ > > else if ((curthread->need_switchout =3D=3D 0) && > > (curthread->blocked =3D=3D 0) && (THR_IN_CRITICAL(curthread))) { > > /* > > * Resume the thread and tell it to yield when > > * it leaves the critical region. > > */ > >=20 > > These code cause deadlock, at least there is a deadlock between = malloc/free and > > static mutex initializing code, I think unless we believe that = thread is locking a leaf=20 > > node, it is not safe to give cpu to the thread again. I suggest = comment out these=20 > > optimization code. >=20 > I think that's because spinlock is really still a spinlock (we really = have > to get rid of these). I think this can be solved by putting the = thread > in a critical region while it holds the spinlock. Currently, a higher > priority thread trying to get a spinlock that is held by a lower > priority thread will cause a deadlock (if both threads are running > on the same KSE). >=20 > We could treat spinlocks as static mutexes or as low-level locks. > There are 4 fields to the spinlock: >=20 > typedef struct { > volatile long access_lock; > volatile long lock_owner; > volatile char *fname; > volatile int lineno; > } spinlock_t; >=20 > #define _SPINLOCK_INITIALIZER { 0, 0, 0, 0 } >=20 > Since they are all initialized to 0, we could detect an uninitialized > spinlock. Is there any platform where sizeof(long) !=3D sizeof(void = *)? > Or I suppose we could use fname to point to a low-level lock or > mutex. >=20 Static mutex still needs malloc to initiailize its internal low level = lock,=20 it would recursively call malloc. I think only criticial count may work. > > I am still working on it, hope that I can post a patch soon. > > A bad news is kernel seems too unstable when testing KSE program, = it just silently > > locks up, not panic, no dump, just locks up, this is newest = kernel source code. :( >=20 > Hmm, I haven't got that yet. I have had a lot of lock order reversals > lately though. >=20 > --=20 > Dan Eischen >=20 > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to = "freebsd-threads-unsubscribe@freebsd.org" >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?006501c30c5f$77fe5950$0701a8c0>