From owner-svn-src-head@FreeBSD.ORG Tue Sep 25 23:37:13 2012 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 727B6106564A; Tue, 25 Sep 2012 23:37:13 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 2856B8FC0C; Tue, 25 Sep 2012 23:37:12 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id q8PNbCUQ026970; Tue, 25 Sep 2012 17:37:12 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id q8PNbCLa026969; Tue, 25 Sep 2012 17:37:12 -0600 (MDT) (envelope-from ken) Date: Tue, 25 Sep 2012 17:37:12 -0600 From: "Kenneth D. Merry" To: Pawel Jakub Dawidek Message-ID: <20120925233712.GA26920@nargothrond.kdm.org> References: <201209221241.q8MCfnhJ067937@svn.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201209221241.q8MCfnhJ067937@svn.freebsd.org> User-Agent: Mutt/1.4.2i Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r240822 - head/sys/geom X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Sep 2012 23:37:13 -0000 On Sat, Sep 22, 2012 at 12:41:49 +0000, Pawel Jakub Dawidek wrote: > Author: pjd > Date: Sat Sep 22 12:41:49 2012 > New Revision: 240822 > URL: http://svn.freebsd.org/changeset/base/240822 > > Log: > Use the topology lock to protect list of providers while withering them. > It is possible that provider is destroyed while we are iterating over the > list. > > Reported by: Brian Parkison > Discussed with: phk > MFC after: 1 week > > Modified: > head/sys/geom/geom_disk.c > > Modified: head/sys/geom/geom_disk.c > ============================================================================== > --- head/sys/geom/geom_disk.c Sat Sep 22 12:40:52 2012 (r240821) > +++ head/sys/geom/geom_disk.c Sat Sep 22 12:41:49 2012 (r240822) > @@ -635,10 +635,13 @@ disk_gone(struct disk *dp) > struct g_geom *gp; > struct g_provider *pp; > > + g_topology_lock(); > gp = dp->d_geom; > - if (gp != NULL) > + if (gp != NULL) { > LIST_FOREACH(pp, &gp->provider, provider) > g_wither_provider(pp, ENXIO); > + } > + g_topology_unlock(); > } > > void This breaks devices going away in CAM. When the da(4) driver calls disk_gone(), it is necessarily holding the SIM lock, which is a regular MTX_DEF mutex. The GEOM topology lock is an sx lock, and of WITNESS blows up because of that: (noperiph:lock order reversal: (sleepable after non-sleepable) isp0:0: 1st 0xffffffff81330c48 ctl2cam (ctl2cam) @ /usr/home/kenm/perforce5/vendor/FreeBSD/head/sys/cam/cam_periph.h:190 -1: 2nd 0xffffffff8136e720 GEOM topology (GEOM topology) @ /usr/home/kenm/perforce5/vendor/FreeBSD/head/sys/geom/geom_disk.c:638 -1): KDB: stack backtrace: changing role on from 1 to 0 dctlfeasync: WWPN 0x21000024ff32658d port 0x0000e8 path 3 target 0 left b_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2c witness_checkorder() at witness_checkorder+0x875 _sx_xlock() at _sx_xlock+0x64 disk_gone() at disk_gone+0x40 daoninvalidate() at daoninvalidate+0x52 cam_periph_invalidate() at cam_periph_invalidate+0x57 daasync() at daasync+0x77 xpt_async_bcast() at xpt_async_bcast+0x42 xpt_async() at xpt_async+0x12a cam_periph_error() at cam_periph_error+0x503 probedone() at probedone+0x1e1 camisr_runqueue() at camisr_runqueue+0x54 camisr() at camisr+0xcf intr_event_execute_handlers() at intr_event_execute_handlers+0x6a ithread_loop() at ithread_loop+0xab fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff80002bfcb0, rbp = 0 --- (da5:ctl2cam0:0:1:0): lost device - 0 outstanding, 1 refs (pass7:Sleeping thread (tid 100027, pid 12) owns a non-sleepable lock ctl2cam0:0:KDB: stack backtrace of thread 100027: 1:0): passdevgonecb: devfs entry is gone sched_switch() at sched_switch+0x19a mi_switch() at mi_switch+0x208 sleepq_switch() at sleepq_switch+0xfc sleepq_wait() at sleepq_wait+0x4d _sx_xlock_hard() at _sx_xlock_hard+0x350 _sx_xlock() at _sx_xlock+0xf1 disk_gone() at disk_gone+0x40 daoninvalidate() at daoninvalidate+0x52 cam_periph_invalidate() at cam_periph_invalidate+0x57 daasync() at daasync+0x77 xpt_async_bcast() at xpt_async_bcast+0x42 xpt_async() at xpt_async+0x12a cam_periph_error() at cam_periph_error+0x503 probedone() at probedone+0x2dd camisr_runqueue() at camisr_runqueue+0x54 camisr() at camisr+0xcf intr_event_execute_handlers() at intr_event_execute_handlers+0x6a ithread_loop() at ithread_loop+0xab fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff80002bfcb0, rbp = 0 --- panic: sleeping thread cpuid = 4 KDB: enter: panic [ thread pid 13 tid 100020 ] Stopped at kdb_enter+0x3b: movq $0,0xaac5d2(%rip) db> bt Tracing pid 13 tid 100020 td 0xfffffe00029a5900 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1d1 propagate_priority() at propagate_priority+0x223 turnstile_wait() at turnstile_wait+0x252 _mtx_lock_sleep() at _mtx_lock_sleep+0xa1 _mtx_lock_flags() at _mtx_lock_flags+0x116 cam_periph_release() at cam_periph_release+0x4b g_destroy_provider() at g_destroy_provider+0xae g_run_events() at g_run_events+0x330 fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800029bcb0, rbp = 0 --- db> bt 100027 Tracing pid 12 tid 100027 td 0xfffffe0002a8d480 sched_switch() at sched_switch+0x19a mi_switch() at mi_switch+0x208 sleepq_switch() at sleepq_switch+0xfc sleepq_wait() at sleepq_wait+0x4d _sx_xlock_hard() at _sx_xlock_hard+0x350 _sx_xlock() at _sx_xlock+0xf1 disk_gone() at disk_gone+0x40 daoninvalidate() at daoninvalidate+0x52 cam_periph_invalidate() at cam_periph_invalidate+0x57 daasync() at daasync+0x77 xpt_async_bcast() at xpt_async_bcast+0x42 xpt_async() at xpt_async+0x12a cam_periph_error() at cam_periph_error+0x503 probedone() at probedone+0x2dd camisr_runqueue() at camisr_runqueue+0x54 camisr() at camisr+0xcf intr_event_execute_handlers() at intr_event_execute_handlers+0x6a ithread_loop() at ithread_loop+0xab fork_exit() at fork_exit+0x135 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff80002bfcb0, rbp = 0 --- disk_gone() needs to be callable from an interrupt context. So it cannot acquire the topology lock. Ken -- Kenneth Merry ken@FreeBSD.ORG