From owner-freebsd-stable@FreeBSD.ORG Wed Aug 1 13:59:01 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 435BE106566B; Wed, 1 Aug 2012 13:59:01 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 16AA38FC12; Wed, 1 Aug 2012 13:59:01 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 73AB2B91C; Wed, 1 Aug 2012 09:59:00 -0400 (EDT) From: John Baldwin To: attilio@freebsd.org Date: Wed, 1 Aug 2012 08:53:11 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p17; KDE/4.5.5; amd64; ; ) References: <1342742294.2656.24.camel@powernoodle.corp.yahoo.com> <201207311634.24169.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201208010853.11447.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 01 Aug 2012 09:59:00 -0400 (EDT) Cc: freebsd-stable@freebsd.org Subject: Re: [stable 9] panic on reboot: ipmi_wd_event() X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Aug 2012 13:59:01 -0000 On Tuesday, July 31, 2012 4:51:19 pm Attilio Rao wrote: > On 7/31/12, John Baldwin wrote: > > On Thursday, July 19, 2012 7:58:14 pm Sean Bruno wrote: > >> Working on the Dell R420 today, got most of it working, even the > >> broadcom ethernet cards! However, I get the following when I reboot the > >> system: > >> > >> Syncing disks, vnodes remaining...4 Sleeping thread (tid 100107, pid 9) > >> owns a non-sleepable lock > >> KDB: stack backtrace of thread 100107: > >> sched_switch() at sched_switch+0x19f > >> mi_switch() at mi_switch+0x208 > >> sleepq_switch() at sleepq_switch+0xfc > >> sleepq_wait() at sleepq_wait+0x4d > >> _sleep() at _sleep+0x3f6 > >> ipmi_submit_driver_request() at ipmi_submit_driver_request+0x97 > >> ipmi_set_watchdog() at ipmi_set_watchdog+0xb1 > >> ipmi_wd_event() at ipmi_wd_event+0x8f > >> kern_do_pat() at kern_do_pat+0x10f > >> sched_sync() at sched_sync+0x1ea > >> fork_exit() at fork_exit+0x135 > >> fork_trampoline() at fork_trampoline+0xe > > > > Hmmm, the watchdog pat should probably happen without holding locks if > > possible. This is related to the IPMI watchdog being special and wanting > > to schedule a thread to work. > > The watchdog pat without the locks is not easy to do because we > register the watchdog callbacks in eventhandlers, which are indeed > locked (and you may also end up racing against watchdog detach, if you > don't use any lock at all). No, eventhandlers go through several hoops to not hold any locks while the eventhandler functions are running. It seems in this case that a lock is held in a higher layer (sched_sync()) and that is what I was talking about. Yes, it is the 'sync_mtx' that is held. Something like this may work: Index: vfs_subr.c =================================================================== --- vfs_subr.c (revision 238969) +++ vfs_subr.c (working copy) @@ -1868,8 +1868,11 @@ sched_sync(void) continue; } - if (first_printf == 0) + if (first_printf == 0) { + mtx_unlock(&sync_mtx); wdog_kern_pat(WD_LASTVAL); + mtx_lock(&sync_mtx); + } } if (!LIST_EMPTY(gslp)) { -- John Baldwin