Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Aug 2012 08:53:11 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        attilio@freebsd.org
Cc:        freebsd-stable@freebsd.org
Subject:   Re: [stable 9] panic on reboot: ipmi_wd_event()
Message-ID:  <201208010853.11447.jhb@freebsd.org>
In-Reply-To: <CAJ-FndC3pyfJNJBZMZEW9WGs7yA=xeAD2vMyuEeJjELcLOVbOA@mail.gmail.com>
References:  <1342742294.2656.24.camel@powernoodle.corp.yahoo.com> <201207311634.24169.jhb@freebsd.org> <CAJ-FndC3pyfJNJBZMZEW9WGs7yA=xeAD2vMyuEeJjELcLOVbOA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, July 31, 2012 4:51:19 pm Attilio Rao wrote:
> On 7/31/12, John Baldwin <jhb@freebsd.org> wrote:
> > On Thursday, July 19, 2012 7:58:14 pm Sean Bruno wrote:
> >> Working on the Dell R420 today, got most of it working, even the
> >> broadcom ethernet cards!  However, I get the following when I reboot the
> >> system:
> >>
> >> Syncing disks, vnodes remaining...4 Sleeping thread (tid 100107, pid 9)
> >> owns a non-sleepable lock
> >> KDB: stack backtrace of thread 100107:
> >> sched_switch() at sched_switch+0x19f
> >> mi_switch() at mi_switch+0x208
> >> sleepq_switch() at sleepq_switch+0xfc
> >> sleepq_wait() at sleepq_wait+0x4d
> >> _sleep() at _sleep+0x3f6
> >> ipmi_submit_driver_request() at ipmi_submit_driver_request+0x97
> >> ipmi_set_watchdog() at ipmi_set_watchdog+0xb1
> >> ipmi_wd_event() at ipmi_wd_event+0x8f
> >> kern_do_pat() at kern_do_pat+0x10f
> >> sched_sync() at sched_sync+0x1ea
> >> fork_exit() at fork_exit+0x135
> >> fork_trampoline() at fork_trampoline+0xe
> >
> > Hmmm, the watchdog pat should probably happen without holding locks if
> > possible.  This is related to the IPMI watchdog being special and wanting
> > to schedule a thread to work.
> 
> The watchdog pat without the locks is not easy to do because we
> register the watchdog callbacks in eventhandlers, which are indeed
> locked (and you may also end up racing against watchdog detach, if you
> don't use any lock at all).

No, eventhandlers go through several hoops to not hold any locks while
the eventhandler functions are running.  It seems in this case that a
lock is held in a higher layer (sched_sync()) and that is what I was
talking about.  Yes, it is the 'sync_mtx' that is held.  Something like this 
may work:

Index: vfs_subr.c
===================================================================
--- vfs_subr.c	(revision 238969)
+++ vfs_subr.c	(working copy)
@@ -1868,8 +1868,11 @@ sched_sync(void)
 				continue;
 			}
 
-			if (first_printf == 0)
+			if (first_printf == 0) {
+				mtx_unlock(&sync_mtx);
 				wdog_kern_pat(WD_LASTVAL);
+				mtx_lock(&sync_mtx);
+			}
 
 		}
 		if (!LIST_EMPTY(gslp)) {


-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201208010853.11447.jhb>