From owner-freebsd-current@FreeBSD.ORG Thu Jul 1 07:52:14 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DD722106564A for ; Thu, 1 Jul 2010 07:52:14 +0000 (UTC) (envelope-from bryanv@daemoninthecloset.org) Received: from misery.daemoninthecloset.org (misery.daemoninthecloset.org [212.117.171.175]) by mx1.freebsd.org (Postfix) with ESMTP id 6835F8FC1C for ; Thu, 1 Jul 2010 07:52:14 +0000 (UTC) Received: from sage.daemoninthecloset.org (cpe-70-124-61-245.austin.res.rr.com [70.124.61.245]) by misery.daemoninthecloset.org (Postfix) with ESMTPS id 750A33763E9 for ; Thu, 1 Jul 2010 07:32:55 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by sage.daemoninthecloset.org (Postfix) with ESMTP id 6166B96178 for ; Thu, 1 Jul 2010 02:32:34 -0500 (CDT) Received: from sage.daemoninthecloset.org ([127.0.0.1]) by localhost (sage.daemoninthecloset.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FMf2IZ1qD3Lz for ; Thu, 1 Jul 2010 02:32:34 -0500 (CDT) Received: from sage.daemoninthecloset.org (sage [192.168.10.14]) by sage.daemoninthecloset.org (Postfix) with ESMTP id EFFAB96174 for ; Thu, 1 Jul 2010 02:32:33 -0500 (CDT) Date: Thu, 1 Jul 2010 02:32:33 -0500 (CDT) From: Bryan Venteicher To: freebsd-current@freebsd.org Message-ID: <269478215.24.1277969553870.JavaMail.root@sage.daemoninthecloset.org> In-Reply-To: <744734406.21.1277969273426.JavaMail.root@sage.daemoninthecloset.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_23_1910822332.1277969553866" X-Originating-IP: [192.168.10.20] X-Mailer: Zimbra 6.0.6_GA_2330.DEBIAN5_64 (ZimbraWebClient - FF3.0 ([unknown])/6.0.6_GA_2330.DEBIAN5_64) Subject: deadlkres() panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jul 2010 07:52:14 -0000 ------=_Part_23_1910822332.1277969553866 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On a recent -current, I got the following panic from deadlkres: Assertion wchan != NULL failed at /usr/src-nfs/sys/kern/subr_sleepqueue.c:680 Tracing pid 0 tid 100058 td 0xffffff00024bf7a0 kdb_enter() at kdb_enter+0x3d panic() at panic+0x176 sleepq_type() at sleepq_type+0x56 deadlkres() at deadlkres+0x224 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8074976d30, rbp = 0 --- (Hand transcribed, doadump() hung) deadlkres() came across a TD_IS_SLEEPING()'ing thread that was not a sleepqueue (ie, td->td_wchan == NULL). I don't think this is an invalid state for thread to be in: After adding itself to a sleepq and setting a timeout, the thread calls sleepq_timedwait_sig(). sleepq_catch_signals() determines there is a signal pending so it removes the thread from the sleepq via sleepq_resume_thread(). Returning to sleepq_timedwait_sig(), in the call to sleepq_check_timeout(), the thread is unable to cancel the timeout because it is already firing (likely waiting on thread_lock()). So the thread calls TD_SET_SLEEPING() followed by mi_switch(). deadlkres() then picks up thread_lock(), finding td is TD_IS_SLEEPING() && !TD_ON_SLEEPQ(). The attached patch takes care of the panic for me. ------=_Part_23_1910822332.1277969553866 Content-Type: text/x-patch; name=kern_clock.c.diff Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=kern_clock.c.diff --- /usr/src-nfs/sys/kern/kern_clock.c 2010-06-30 03:38:25.000000000 -0500 +++ kern_clock.c 2010-07-01 02:19:39.048697991 -0500 @@ -232,7 +232,8 @@ panic("%s: possible deadlock detected for %p, blocked for %d ticks\n", __func__, td, tticks); } - } else if (TD_IS_SLEEPING(td)) { + } else if (TD_IS_SLEEPING(td) && + TD_ON_SLEEPQ(td)) { /* Handle ticks wrap-up. */ if (ticks < td->td_blktick) { ------=_Part_23_1910822332.1277969553866--