Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Sep 2012 17:37:12 -0600
From:      "Kenneth D. Merry" <ken@FreeBSD.org>
To:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r240822 - head/sys/geom
Message-ID:  <20120925233712.GA26920@nargothrond.kdm.org>
In-Reply-To: <201209221241.q8MCfnhJ067937@svn.freebsd.org>
References:  <201209221241.q8MCfnhJ067937@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 22, 2012 at 12:41:49 +0000, Pawel Jakub Dawidek wrote:
> Author: pjd
> Date: Sat Sep 22 12:41:49 2012
> New Revision: 240822
> URL: http://svn.freebsd.org/changeset/base/240822
> 
> Log:
>   Use the topology lock to protect list of providers while withering them.
>   It is possible that provider is destroyed while we are iterating over the
>   list.
>   
>   Reported by:	Brian Parkison <parkison@panzura.com>
>   Discussed with:	phk
>   MFC after:	1 week
> 
> Modified:
>   head/sys/geom/geom_disk.c
> 
> Modified: head/sys/geom/geom_disk.c
> ==============================================================================
> --- head/sys/geom/geom_disk.c	Sat Sep 22 12:40:52 2012	(r240821)
> +++ head/sys/geom/geom_disk.c	Sat Sep 22 12:41:49 2012	(r240822)
> @@ -635,10 +635,13 @@ disk_gone(struct disk *dp)
>  	struct g_geom *gp;
>  	struct g_provider *pp;
>  
> +	g_topology_lock();
>  	gp = dp->d_geom;
> -	if (gp != NULL)
> +	if (gp != NULL) {
>  		LIST_FOREACH(pp, &gp->provider, provider)
>  			g_wither_provider(pp, ENXIO);
> +	}
> +	g_topology_unlock();
>  }
>  
>  void

This breaks devices going away in CAM.

When the da(4) driver calls disk_gone(), it is necessarily holding the SIM
lock, which is a regular MTX_DEF mutex.  The GEOM topology lock is an sx
lock, and of WITNESS blows up because of that:

(noperiph:lock order reversal: (sleepable after non-sleepable)
isp0:0: 1st 0xffffffff81330c48 ctl2cam (ctl2cam) @
/usr/home/kenm/perforce5/vendor/FreeBSD/head/sys/cam/cam_periph.h:190
-1: 2nd 0xffffffff8136e720 GEOM topology (GEOM topology) @
/usr/home/kenm/perforce5/vendor/FreeBSD/head/sys/geom/geom_disk.c:638
-1): KDB: stack backtrace:
changing role on from 1 to 0
dctlfeasync: WWPN 0x21000024ff32658d port 0x0000e8 path 3 target 0 left
b_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
_witness_debugger() at _witness_debugger+0x2c
witness_checkorder() at witness_checkorder+0x875
_sx_xlock() at _sx_xlock+0x64
disk_gone() at disk_gone+0x40
daoninvalidate() at daoninvalidate+0x52
cam_periph_invalidate() at cam_periph_invalidate+0x57
daasync() at daasync+0x77
xpt_async_bcast() at xpt_async_bcast+0x42
xpt_async() at xpt_async+0x12a
cam_periph_error() at cam_periph_error+0x503
probedone() at probedone+0x1e1
camisr_runqueue() at camisr_runqueue+0x54
camisr() at camisr+0xcf
intr_event_execute_handlers() at intr_event_execute_handlers+0x6a
ithread_loop() at ithread_loop+0xab
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff80002bfcb0, rbp = 0 ---
(da5:ctl2cam0:0:1:0): lost device - 0 outstanding, 1 refs
(pass7:Sleeping thread (tid 100027, pid 12) owns a non-sleepable lock
ctl2cam0:0:KDB: stack backtrace of thread 100027:
1:0): passdevgonecb: devfs entry is gone
sched_switch() at sched_switch+0x19a
mi_switch() at mi_switch+0x208
sleepq_switch() at sleepq_switch+0xfc
sleepq_wait() at sleepq_wait+0x4d
_sx_xlock_hard() at _sx_xlock_hard+0x350
_sx_xlock() at _sx_xlock+0xf1
disk_gone() at disk_gone+0x40
daoninvalidate() at daoninvalidate+0x52
cam_periph_invalidate() at cam_periph_invalidate+0x57
daasync() at daasync+0x77
xpt_async_bcast() at xpt_async_bcast+0x42
xpt_async() at xpt_async+0x12a
cam_periph_error() at cam_periph_error+0x503
probedone() at probedone+0x2dd
camisr_runqueue() at camisr_runqueue+0x54
camisr() at camisr+0xcf
intr_event_execute_handlers() at intr_event_execute_handlers+0x6a
ithread_loop() at ithread_loop+0xab
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff80002bfcb0, rbp = 0 ---
panic: sleeping thread
cpuid = 4
KDB: enter: panic
[ thread pid 13 tid 100020 ]
Stopped at      kdb_enter+0x3b: movq    $0,0xaac5d2(%rip)
db> bt
Tracing pid 13 tid 100020 td 0xfffffe00029a5900
kdb_enter() at kdb_enter+0x3b
panic() at panic+0x1d1
propagate_priority() at propagate_priority+0x223
turnstile_wait() at turnstile_wait+0x252
_mtx_lock_sleep() at _mtx_lock_sleep+0xa1
_mtx_lock_flags() at _mtx_lock_flags+0x116
cam_periph_release() at cam_periph_release+0x4b
g_destroy_provider() at g_destroy_provider+0xae
g_run_events() at g_run_events+0x330
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800029bcb0, rbp = 0 ---
db> bt 100027
Tracing pid 12 tid 100027 td 0xfffffe0002a8d480
sched_switch() at sched_switch+0x19a
mi_switch() at mi_switch+0x208
sleepq_switch() at sleepq_switch+0xfc
sleepq_wait() at sleepq_wait+0x4d
_sx_xlock_hard() at _sx_xlock_hard+0x350
_sx_xlock() at _sx_xlock+0xf1
disk_gone() at disk_gone+0x40
daoninvalidate() at daoninvalidate+0x52
cam_periph_invalidate() at cam_periph_invalidate+0x57
daasync() at daasync+0x77
xpt_async_bcast() at xpt_async_bcast+0x42
xpt_async() at xpt_async+0x12a
cam_periph_error() at cam_periph_error+0x503
probedone() at probedone+0x2dd
camisr_runqueue() at camisr_runqueue+0x54
camisr() at camisr+0xcf
intr_event_execute_handlers() at intr_event_execute_handlers+0x6a
ithread_loop() at ithread_loop+0xab
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff80002bfcb0, rbp = 0 ---

disk_gone() needs to be callable from an interrupt context.  So it cannot
acquire the topology lock.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120925233712.GA26920>