From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 26 15:28:12 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7253FF15; Tue, 26 Nov 2013 15:28:12 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 41F97261E; Tue, 26 Nov 2013 15:28:09 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA15275; Tue, 26 Nov 2013 17:28:08 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VlKYZ-0001Fc-Ou; Tue, 26 Nov 2013 17:28:07 +0200 Message-ID: <5294BDCF.1050702@FreeBSD.org> Date: Tue, 26 Nov 2013 17:27:11 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: John Baldwin , freebsd-hackers@FreeBSD.org Subject: Re: taskqueue_block References: <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org> <201311211414.06849.jhb@freebsd.org> In-Reply-To: <201311211414.06849.jhb@freebsd.org> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , Bernhard Schmidt , Luigi Rizzo , ray@FreeBSD.org, Andrew Thompson , Pyun YongHyeon X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 15:28:12 -0000 on 21/11/2013 21:14 John Baldwin said the following: > On Tuesday, November 19, 2013 10:29:18 pm Adrian Chadd wrote: >> Yes, and lets fix this. :) > > Hmm, is taskqueue_block() always used in context where waiting is safe? I reviewed usage of taskqueue_block() and it appears that in all cases except for if_ath (which I'd like to discuss separately) the function is a part of the anti-pattern taskqueue_block + taskqueue_drain. That, according to my understanding, means that a) taskqueue_block() would be safe to use if it were blocking and b) in many cases it should not be necessary at all. Here are some annotated patches: ================================================ --- a/sys/dev/jme/if_jme.c +++ b/sys/dev/jme/if_jme.c @@ -2259,6 +2259,9 @@ jme_link_task(void *arg, int pending) jme_stop_rx(sc); jme_stop_tx(sc); + /* Unblock execution of task. */ + taskqueue_unblock(sc->jme_tq); + /* XXX Drain all queued tasks. */ JME_UNLOCK(sc); taskqueue_drain(sc->jme_tq, &sc->jme_int_task); @@ -2332,8 +2335,6 @@ jme_link_task(void *arg, int pending) ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; callout_reset(&sc->jme_tick_ch, hz, jme_tick, sc); - /* Unblock execution of task. */ - taskqueue_unblock(sc->jme_tq); /* Reenable interrupts. */ CSR_WRITE_4(sc, JME_INTR_MASK_SET, JME_INTRS); ================================================ I am not sure if taskqueue_block is really needed in this driver. I took a safer assumption and decided that it could be needed while jme_stop_rx, etc are being called. But the taskqueue should not be block over a call to taskqueue_drain. This patch could easily be wrong and some further reshuffling of the code could be needed, but the patch shows the direction. Particularly, I am concerned about possible interaction between JME_LOCK/JME_UNLOCK and taskqueue_block. ================================================ --- a/sys/dev/netmap/if_em_netmap.h +++ b/sys/dev/netmap/if_em_netmap.h @@ -53,13 +53,10 @@ em_netmap_block_tasks(struct adapter *adapter) struct rx_ring *rxr = adapter->rx_rings; for (i = 0; i < adapter->num_queues; i++, txr++, rxr++) { - taskqueue_block(txr->tq); taskqueue_drain(txr->tq, &txr->tx_task); - taskqueue_block(rxr->tq); taskqueue_drain(rxr->tq, &rxr->rx_task); } } else { /* legacy */ - taskqueue_block(adapter->tq); taskqueue_drain(adapter->tq, &adapter->link_task); taskqueue_drain(adapter->tq, &adapter->que_task); } @@ -69,18 +66,6 @@ em_netmap_block_tasks(struct adapter *adapter) static void em_netmap_unblock_tasks(struct adapter *adapter) { - if (adapter->msix > 1) { - struct tx_ring *txr = adapter->tx_rings; - struct rx_ring *rxr = adapter->rx_rings; - int i; - - for (i = 0; i < adapter->num_queues; i++) { - taskqueue_unblock(txr->tq); - taskqueue_unblock(rxr->tq); - } - } else { /* legacy */ - taskqueue_unblock(adapter->tq); - } } diff --git a/sys/dev/netmap/if_lem_netmap.h b/sys/dev/netmap/if_lem_netmap.h index 25e5c7c..76c8246 100644 --- a/sys/dev/netmap/if_lem_netmap.h +++ b/sys/dev/netmap/if_lem_netmap.h @@ -60,7 +60,6 @@ lem_netmap_reg(struct ifnet *ifp, int onoff) ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); #ifndef EM_LEGACY_IRQ // XXX do we need this ? - taskqueue_block(adapter->tq); taskqueue_drain(adapter->tq, &adapter->rxtx_task); taskqueue_drain(adapter->tq, &adapter->link_task); #endif /* !EM_LEGCY_IRQ */ @@ -83,10 +82,6 @@ fail: lem_init_locked(adapter); /* also enable intr */ } -#ifndef EM_LEGACY_IRQ - taskqueue_unblock(adapter->tq); // XXX do we need this ? -#endif /* !EM_LEGCY_IRQ */ - EM_CORE_UNLOCK(adapter); return (error); ================================================ taskqueue_block that is immediately followed by taskqueue_drain is completely redundant (in addition to being deadlock prone). So this is an easy case. ================================================ --- a/sys/dev/rt/if_rt.c +++ b/sys/dev/rt/if_rt.c @@ -810,22 +810,17 @@ rt_stop_locked(void *priv) ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); callout_stop(&sc->periodic_ch); callout_stop(&sc->tx_watchdog_ch); + + /* disable interrupts */ + RT_WRITE(sc, GE_PORT_BASE + FE_INT_ENABLE, 0); + RT_SOFTC_UNLOCK(sc); - taskqueue_block(sc->taskqueue); - /* - * Sometime rt_stop_locked called from isr and we get panic - * When found, I fix it - */ -#ifdef notyet taskqueue_drain(sc->taskqueue, &sc->rx_done_task); taskqueue_drain(sc->taskqueue, &sc->tx_done_task); taskqueue_drain(sc->taskqueue, &sc->periodic_task); -#endif - RT_SOFTC_LOCK(sc); - /* disable interrupts */ - RT_WRITE(sc, GE_PORT_BASE + FE_INT_ENABLE, 0); + RT_SOFTC_LOCK(sc); /* reset adapter */ RT_WRITE(sc, GE_PORT_BASE + FE_RST_GLO, PSE_RESET); ================================================ I am not sure why the calls to taskqueue_drain were commented out. In my opinion they are correct. I was not able to find any possible way for rt_stop_locked to be called from an ISR as the comment alleged. I suspect that the problem was actually caused by the taskqueue_block + taskqueue_drain deadlock. Also, I think that disabling interrupts before calling taskqueue_drain should ensure that no new tasks will be enqueued, but I could be wrong about this. ================================================ --- a/sys/net80211/ieee80211_proto.c +++ b/sys/net80211/ieee80211_proto.c @@ -1207,14 +1207,12 @@ update_chw(void *arg, int npending) void ieee80211_waitfor_parent(struct ieee80211com *ic) { - taskqueue_block(ic->ic_tq); ieee80211_draintask(ic, &ic->ic_parent_task); ieee80211_draintask(ic, &ic->ic_mcast_task); ieee80211_draintask(ic, &ic->ic_promisc_task); ieee80211_draintask(ic, &ic->ic_chan_task); ieee80211_draintask(ic, &ic->ic_bmiss_task); ieee80211_draintask(ic, &ic->ic_chw_task); - taskqueue_unblock(ic->ic_tq); } /* ================================================ Again, this is an obvious case of the wrong taskqueue_block usage. I've CC-ed the developers who seem to have worked recently on the affected code. I would like to ask you to review and/or test the changes. My understanding of the code is based on a quick glance, so I could be very wrong. In that case, I would like to discuss how the code actually works and what it expects to achieve by using taskqueue_block. Please see the quoted below text for my thoughts on why using taskqueue_block + taskqueue_drain is a problem. Thank you! >> On 19 November 2013 06:32, Andriy Gapon wrote: >>> >>> Forwarding this to the larger audience for a discussion. >>> >>> -------- Original Message -------- >>> Message-ID: <5287BDB9.10201@FreeBSD.org> >>> Date: Sat, 16 Nov 2013 20:47:21 +0200 >>> From: Andriy Gapon >>> Subject: taskqueue_block >>> >>> >>> >>> It seems that either I do not understand something about taskqueue_block code or >>> it is a quite dangerous and abused API. The fact that it is not properly >>> documented does not help either. >>> >>> The commit message said: >>>> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the >>>> owner of a queue to block and unblock execution of the tasks in the queue while >>>> allowing tasks to continue to be added queue. Combining this with >>>> taskqueue_drain() allows a queue to be safely disabled. The unblock function may >>> [...] >>> >>> I indeed see this (anti?) pattern being used in the code. >>> But what about the following case. One thread calls taskqueue_block() and sets >>> TQ_FLAGS_BLOCKED. Another thread calls taskqueue_enqueue, this adds a task to >>> the queue and sets ta_pending of the task to 1. tq_enqueue is not called, so an >>> actual queue runner is not called or waken up. Then the first thread calls >>> taskqueue_drain() on the task. As far as I can see, the thread would then just >>> wait forever because the task is pending and is not going to be executed. >>> >>> Additionally, it is impossible to reason about the taskqueue's state after >>> taskqueue_block call, because the call just sets the flag and does not do any >>> synchronization. And as described above, it is not safe to call APIs that could >>> allow the taskqueue or the task state to become known. >>> >>> I think that taskqueue_block() should wait on the currently active tasks to >>> complete. I don't think that this behavior could be optional. I do see any >>> reasonable and safe use for "non-blocking" taskqueue_block(). >>> taskqueue_drain() calls after taskqueue_block() must be removed. The code >>> should either use taskqueue_drain() or "blocking" taskqueue_block() depending on >>> concrete circumstances. >>> >>> What do you think? >>> Thank you. -- Andriy Gapon