From owner-freebsd-threads@FreeBSD.ORG Wed Apr 21 23:52:55 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EDD2116A4CE for ; Wed, 21 Apr 2004 23:52:55 -0700 (PDT) Received: from mail.violasystem.net (YahooBB219195104055.bbtec.net [219.195.104.55]) by mx1.FreeBSD.org (Postfix) with ESMTP id D777543D53 for ; Wed, 21 Apr 2004 23:52:54 -0700 (PDT) (envelope-from kaakun@highway.ne.jp) Received: from face.violasystem.net ([192.168.11.5])i3M6RftJ000926 for ; Thu, 22 Apr 2004 15:27:41 +0900 (JST) (envelope-from kaakun@highway.ne.jp) Date: Thu, 22 Apr 2004 15:27:05 +0900 From: Kazuaki Oda To: threads@freebsd.org Message-Id: <20040422152705.01f574e1.kaakun@highway.ne.jp> X-Mailer: Sylpheed version 0.9.8a-gtk2-20040109 (GTK+ 2.2.4; i386-portbld-freebsd5.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: kse_release and kse_wakeup problem X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Apr 2004 06:52:56 -0000 Hi, I think, after switching to use new sleep queue interface, there is a problem when using scope system threads on MP machine. This problem occurs when one CPU is executing kse_release and another is kse_wakeup. There is a scenario (thread A and B is on the same process): CPU0: thread A aquires PROC_LOCK in kse_release (kern_thread.c 621) CPU0: thread A releases PROC_LOCK in msleep (kern_synch.c 209) CPU1: thread B aquires PROC_LOCK in kse_wakeup (kern_thread.c 667) CPU1: thread B looks up kse_upcall (kern_thread.c 669-687) CPU1: thread B gets kse_upcall owner and this is thread A (kern_thread.c 689) CPU1: thread B sets KUF_DOUPCALL flag (kern_thread.c 697) because thread A is not on sleep queue yet, sleepq_abort (kern_thread.c 695) is not executed. CPU1: thread B releases PROC_LOCK (kern_thread.c 700) CPU0: thread A puts himself on sleep queue (kern_synch.c 221) CPU0: thread A sets TDF_SINTR flag (subr_sleepqueue.c 310) CPU0: thread A sleeps and context switch occurs... I think, thread B should call sleepq_abort and thread A should do upcall as soon as possible. The following patch is for thread A to release PROC_LOCK after putting himself on sleep queue and setting TDF_SINTR flag. I don't think this patch is so good (obviously setting TDF_SINTR here is not good), but enough for test. And, after patching, MySQL does not hang up on heavy load on my machine (P4 2.40GHz, HTT enabled). --- kern_synch.c.orig Thu Apr 22 14:00:40 2004 +++ kern_synch.c Thu Apr 22 12:20:06 2004 @@ -203,11 +203,6 @@ td, p->p_pid, p->p_comm, wmesg, ident); DROP_GIANT(); - if (mtx != NULL) { - mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED); - WITNESS_SAVE(&mtx->mtx_object, mtx); - mtx_unlock(mtx); - } /* * We put ourselves on the sleep queue and start our timeout @@ -219,6 +214,13 @@ * return from cursig(). */ sleepq_add(sq, ident, mtx, wmesg, 0); + if (catch) + td->td_flags |= TDF_SINTR; + if (mtx != NULL) { + mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED); + WITNESS_SAVE(&mtx->mtx_object, mtx); + mtx_unlock(mtx); + } if (timo) sleepq_set_timeout(ident, timo); if (catch) { -- Kazuaki Oda