Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Oct 2011 13:42:46 +0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        freebsd-wireless@freebsd.org
Subject:   ath 11n update: locking, regressions, testing
Message-ID:  <CAJ-Vmo=9KkQbiCCBQm5Xm_9dNZtY6jQ1hzO0SWSZ4z323jF=Hw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hi all,

In working out kinks in how the ath driver handles things during an
interface reset or reconfigure, I've discovered a couple of things:

* all the frames in the tx/rx queues are just dropped on the floor,
which breaks 802.11n aggregation state in very bad ways; and
* there's just a lot of cases where things can overlap each other,
causing temporary situations of unpredictability (and likely some of
the "wtf?" situations that people have been seeing with the stock
freebsd ath driver.)

I've been tinkering with some code which does two things:

* it introduces locking of most operational paths, much like how the
iwn driver does things, and
* in cases of a recoverable reset (eg a configuration change such as
enabling/disabling ANI, or a stuck beacon condition or watchdog
timeout), the TX/RX queues are properly handled:
  + the TX queue is kept how it is and DMA is simply restarted once
the reset completes;
  + the RX queue is completely drained before the reset occurs, so all
frames are thrown at net80211 and handled.

For reference, linux doesn't do this kind of locking - the ath9k
driver hides reset, channel change and RX behind a new lock - the "PCU
lock" - whilst the TX path doesn't use the PCU lock, it just uses the
normal per-queue TX locking that freebsd/ath9k currently has.

Now, this has shown a few issues which make me kind of wary about
committing this as-is and I'd like some testing and feedback before I
am even remotely comfortable.

Firstly - holding any lock during _both_ TX _and_ RX causes all kinds
of LOR'ing in the IP stack, leading to very quick deadlocks.
The iwn driver releases the IWN_LOCK before calling ieee80211_input,
so it avoids this - but by doing that, it too could lead to some
inconsistencies (eg another process could cause a state change during
active RX.)
So I fixed that in my code - and yes, this means that things such as
ath_start() (ie, the normal TX path), or ath_raw_xmit() (the raw TX
path, for things like management/probe frames) can occur. So I
introduced some variables which keep track of whether the code is in
tx, tx complete, rx process, rx tasklet and reset. This way the driver
can avoid situations where:

* there's a reset, or rx process occuring, which grabs the lock;
* some other task occurs - say, you change the ANI configuration, or
add/remove a VAP;
* rxproc releases the lock to call net80211_input;
* .. and the configuration change, VAP change, etc occurs;
* .. then rxproc continues trying to process RX frames, after the
hardware has been reset.

Now this kind of thing currently can occur, so I'm leaning to just
ignoring it and adding very loud printf()s if the situation is
detected. It can then be fixed at a later stage.

Next - the net80211 scan code (which ath uses, but iwn doesn't as the
firmware does its own scanning) holds the net80211 interface lock
(comlock) during the duration of sending probe requests to a channel.
This means that when scanning, this occurs:

* grab comlock;
* iterate over scan cache contents, sending probe requests
* each probe request calls ath_raw_xmit(), which grabs ATH_LOCK for
the duration of the TX.

Then for normal TX completion:

* ATH_LOCK is grabbed;
* the completed frame has net80211_free_node() called, which grabs the comlock

.. and the above is a very obvious LOR.

I'm not sure how to resolve/fix this without either:

* abandoning my idea for locking the ath driver this way and instead
having to shoehorn in some more complicated state change / taskqueue
drain code (eg so on an operation change, all RX/TX is suspended in
its entirety before the reset occurs); which will require some
net80211 cooperation as the ic_raw_xmit() API can be called by any
thing, and is typically done without any locks being held (except for
the scan code example), so it's likely this will require some
complicated hackery to avoid this;
* fix the scan cache code to not hold the comlock whilst TX'ing probe
request frames - and this likely requires introducing some kind of
generation count to the scan cache contents and if the generation
count changes, the scan process restarts TX'ing probe frames or
something.

That said, this still doesn't fix all of the ath locking problems (and
likely some net80211 locking problems too), which seem increasingly
like they should be fixed, as they seem to be making my 802.11n ath
work _much_ more complicated than it needs to be..

So, I'd like some testing of the attached patch against  my if_ath_tx
branch, and I'd like some feedback/comments. You can trigger beacon
stuck conditions by:

# sysctl dev.ath.X.forcebstuck=1

So you can spam that and generate errors/resets (along with the
occasional hardware error, which is kind of surprising, and I'm going
to try and track down.)

Thanks!



Adrian

[-- Attachment #2 --]
Index: sys/dev/ath/if_ath_tx.c
===================================================================
--- sys/dev/ath/if_ath_tx.c	(revision 226691)
+++ sys/dev/ath/if_ath_tx.c	(working copy)
@@ -501,6 +501,30 @@
 	KASSERT(txq->axq_qnum != ATH_TXQ_SWQ,
 	     ("ath_tx_handoff_hw called for mcast queue"));
 
+	ATH_LOCK_ASSERT(sc);
+	if (sc->sc_in_reset) {
+		device_printf(sc->sc_dev, "%s: called with sc_in_reset != 0\n",
+		    __func__);
+		DPRINTF(sc, ATH_DEBUG_XMIT,
+		    "%s: queued: TXDP[%u] = %p (%p) depth %d\n",
+		    __func__, txq->axq_qnum,
+		    (caddr_t)bf->bf_daddr, bf->bf_desc,
+		    txq->axq_depth);
+		ATH_TXQ_INSERT_TAIL(txq, bf, bf_list);
+		if (bf->bf_state.bfs_aggr)
+			txq->axq_aggr_depth++;
+		/*
+		 * We still have to link descriptors here:
+		 * when DMA is restarted, these need to be
+		 * properly chained or the DMA engine will
+		 * get upset.
+		 */
+		if (txq->axq_link != NULL)
+			*txq->axq_link = bf->bf_daddr;
+		txq->axq_link = &bf->bf_lastds->ds_link;
+		return;
+	}
+
 	/* For now, so not to generate whitespace diffs */
 	if (1) {
 #ifdef IEEE80211_SUPPORT_TDMA
@@ -1688,6 +1712,22 @@
 	struct ath_buf *bf;
 	int error;
 
+	ATH_LOCK(sc);
+	sc->sc_in_txsend++;
+	/*
+	 * If we're called during a reset, just fail.
+	 * It'd be nice if the net80211 stack would allow us to buffer
+	 * raw/management frames.
+	 *
+	 * Allow it to be called during a non-reset RX.
+	 */
+	if (sc->sc_in_reset) {
+		device_printf(sc->sc_dev,
+		    "%s: sc_in_reset=%d; bailing\n", __func__, sc->sc_in_reset);
+		error = EIO;
+		goto bad;
+	}
+
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || sc->sc_invalid) {
 		DPRINTF(sc, ATH_DEBUG_XMIT, "%s: discard frame, %s", __func__,
 		    (ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 ?
@@ -1729,7 +1769,8 @@
 	sc->sc_wd_timer = 5;
 	ifp->if_opackets++;
 	sc->sc_stats.ast_tx_raw++;
-
+	sc->sc_in_txsend--;
+	ATH_UNLOCK(sc);
 	return 0;
 bad2:
 	ATH_TXBUF_LOCK(sc);
@@ -1738,6 +1779,8 @@
 bad:
 	ifp->if_oerrors++;
 	sc->sc_stats.ast_tx_raw_fail++;
+	sc->sc_in_txsend--;
+	ATH_UNLOCK(sc);
 	ieee80211_free_node(ni);
 	return error;
 }
@@ -3721,6 +3764,7 @@
 	 */
 	r = sc->sc_addba_response(ni, tap, status, code, batimeout);
 
+	ATH_LOCK(sc);
 	ATH_TXQ_LOCK(sc->sc_ac2q[atid->ac]);
 	/*
 	 * XXX dirty!
@@ -3728,8 +3772,11 @@
 	 * Read above for more information.
 	 */
 	tap->txa_start = ni->ni_txseqs[tid];
+	sc->sc_in_txsend++;
 	ath_tx_tid_resume(sc, atid);
+	sc->sc_in_txsend--;
 	ATH_TXQ_UNLOCK(sc->sc_ac2q[atid->ac]);
+	ATH_UNLOCK(sc);
 	return r;
 }
 
@@ -3758,7 +3805,11 @@
 	 * it'll set the cleanup flag, and it'll be unpaused once
 	 * things have been cleaned up.
 	 */
+	ATH_LOCK(sc);
+	sc->sc_in_txsend++;
 	ath_tx_cleanup(sc, an, tid);
+	sc->sc_in_txsend--;
+	ATH_UNLOCK(sc);
 }
 
 /*
@@ -3794,9 +3845,13 @@
 	 * XXX TID here or it'll never be done.
 	 */
 	if (status == 0 || attempts == 50) {
+		ATH_LOCK(sc);
 		ATH_TXQ_LOCK(sc->sc_ac2q[atid->ac]);
+		sc->sc_in_txsend++;
 		ath_tx_tid_resume(sc, atid);
+		sc->sc_in_txsend--;
 		ATH_TXQ_UNLOCK(sc->sc_ac2q[atid->ac]);
+		ATH_UNLOCK(sc);
 	}
 }
 
@@ -3820,7 +3875,11 @@
 	sc->sc_addba_response_timeout(ni, tap);
 
 	/* Unpause the TID; which reschedules it */
+	ATH_LOCK(sc);
 	ATH_TXQ_LOCK(sc->sc_ac2q[atid->ac]);
+	sc->sc_in_txsend++;
 	ath_tx_tid_resume(sc, atid);
+	sc->sc_in_txsend--;
 	ATH_TXQ_UNLOCK(sc->sc_ac2q[atid->ac]);
+	ATH_UNLOCK(sc);
 }
Index: sys/dev/ath/if_ath_misc.h
===================================================================
--- sys/dev/ath/if_ath_misc.h	(revision 226559)
+++ sys/dev/ath/if_ath_misc.h	(working copy)
@@ -57,15 +57,14 @@
 extern void ath_freebuf(struct ath_softc *sc, struct ath_buf *bf);
 
 extern int ath_reset(struct ifnet *, ATH_RESET_TYPE);
+extern int ath_reset_locked(struct ifnet *, ATH_RESET_TYPE);
 extern void ath_tx_draintxq(struct ath_softc *sc, struct ath_txq *txq);
 extern void ath_tx_default_comp(struct ath_softc *sc, struct ath_buf *bf,
 	    int fail);
 extern void ath_tx_update_ratectrl(struct ath_softc *sc,
 	    struct ieee80211_node *ni, struct ath_rc_series *rc,
 	    struct ath_tx_status *ts, int frmlen, int nframes, int nbad);
-
 extern void ath_tx_freebuf(struct ath_softc *sc, struct ath_buf *bf,
     int status);
-extern void ath_tx_sched_proc_sched(struct ath_softc *sc, struct ath_txq *txq);
 
 #endif
Index: sys/dev/ath/if_ath.c
===================================================================
--- sys/dev/ath/if_ath.c	(revision 226694)
+++ sys/dev/ath/if_ath.c	(working copy)
@@ -131,6 +131,7 @@
 static void	ath_stop_locked(struct ifnet *);
 static void	ath_stop(struct ifnet *);
 static void	ath_start(struct ifnet *);
+static void	ath_start_locked(struct ifnet *);
 static int	ath_reset_vap(struct ieee80211vap *, u_long);
 static int	ath_media_change(struct ifnet *);
 static void	ath_watchdog(void *);
@@ -869,6 +870,62 @@
 	return free;
 }
 
+/*
+ * This is a diagnostic routine which should be run whenever
+ * any reset-related routine grabs the ATH_LOCK.
+ *
+ * Since ath_rx_proc() -releases- ATH_LOCK during normal operation,
+ * it is entirely possible that another parallel task (eg channel
+ * change, reset, VAP add/delete, etc) is also running.
+ * In this case, that'll begin to occur whilst ath_rx_proc() is
+ * executing.
+ *
+ * And it gets better. Since ath_rx_proc() is now being called during
+ * the reset, channel change, etc path (ie, anywhere where frames
+ * need to be drained), ATH_LOCK is being released and re-asserted.
+ * So it becomes possible for those tasks to start taking over from
+ * each other.
+ *
+ * For now, if sc_in_rxtasklet or sc_in_rxproc is >0, we print out
+ * a warning so that the reason for said issue can be investigated.
+ *
+ * In the future, routines which are trying to do reset type actions
+ * will likely need to do some kind of lock reacquisition once the
+ * rxproc has completed. I'm not entirely sure of the correct way
+ * to handle that though.
+ *
+ * We don't have to check for TX state - the TX lock is held for
+ * the duration of the TX functions, so reset routines won't clash
+ * with TX. TX may occur inside a reset routine - ie, when the
+ * RX code releases the ATH_LOCK - but as sc->sc_in_reset > 0,
+ * the TX routines won't continue.
+ *
+ * I'm still not sure what to do if keycache changes come through
+ * during all of this - I should look into this..
+ *
+ * Finally, we'll log if sc_in_reset > 1 - that means that some kind
+ * of recursion or parallel reset path has occured and it's not clear
+ * which order what will return in. This should be fixed.
+ */
+static void _ath_lock_rx_check(struct ath_softc *sc, const char *file,
+    int line)
+{
+	ATH_LOCK_ASSERT(sc);
+	if (sc->sc_in_rxtasklet || sc->sc_in_rxproc) {
+		device_printf(sc->sc_dev,
+		    "%s: %s:%d: rxtasklet=%d, rxproc=%d, danger!\n",
+		    __func__, file, line,
+		    sc->sc_in_rxtasklet, sc->sc_in_rxproc);
+	}
+	if (sc->sc_in_reset > 1) {
+		device_printf(sc->sc_dev,
+		    "%s: %s:%d: sc_in_reset > 1 (= %d), danger!\n",
+		    __func__, file, line, sc->sc_in_reset);
+	}
+}
+
+#define	ath_lock_rx_check(sc)	_ath_lock_rx_check((sc), __FILE__, __LINE__)
+
 static struct ieee80211vap *
 ath_vap_create(struct ieee80211com *ic,
 	const char name[IFNAMSIZ], int unit, int opmode, int flags,
@@ -887,6 +944,10 @@
 	IEEE80211_ADDR_COPY(mac, mac0);
 
 	ATH_LOCK(sc);
+	sc->sc_in_reset++;
+	/* XXX TODO: suspend taskqueue, flush taskqueue ops, suspend RX? */
+	ath_lock_rx_check(sc);
+
 	ic_opmode = opmode;		/* default to opmode of new vap */
 	switch (opmode) {
 	case IEEE80211_M_STA:
@@ -985,6 +1046,8 @@
 	error = ieee80211_vap_setup(ic, vap, name, unit, opmode, flags,
 	    bssid, mac);
 	ATH_LOCK(sc);
+	ath_lock_rx_check(sc);
+
 	if (error != 0) {
 		device_printf(sc->sc_dev, "%s: error %d creating vap\n",
 		    __func__, error);
@@ -1105,6 +1168,7 @@
 		 */
 		sc->sc_swbmiss = 1;
 	}
+	sc->sc_in_reset--;
 	ATH_UNLOCK(sc);
 
 	/* complete setup */
@@ -1115,6 +1179,7 @@
 	ath_hal_setbssidmask(sc->sc_ah, sc->sc_hwbssidmask);
 bad:
 	free(avp, M_80211_VAP);
+	sc->sc_in_reset--;
 	ATH_UNLOCK(sc);
 	return NULL;
 }
@@ -1130,18 +1195,25 @@
 
 	DPRINTF(sc, ATH_DEBUG_RESET, "%s: called\n", __func__);
 
+	/* XXX TODO: suspend taskqueue, flush taskqueue ops, suspend RX? */
+	ATH_LOCK(sc);
+	sc->sc_in_reset++;
+
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		/*
 		 * Quiesce the hardware while we remove the vap.  In
 		 * particular we need to reclaim all references to
 		 * the vap state by any frames pending on the tx queues.
 		 */
+		ath_lock_rx_check(sc);
 		ath_hal_intrset(ah, 0);		/* disable interrupts */
 		ath_draintxq(sc, ATH_RESET_DEFAULT);		/* stop hw xmit side */
-		/* XXX Do all frames from all vaps/nodes need draining here? */
 		ath_stoprecv(sc);		/* stop recv side */
+		/* XXX this drops the lock temporarily */
+		ath_rx_proc(sc, 0);
 	}
 
+	ATH_UNLOCK(sc);
 	ieee80211_vap_detach(vap);
 
 	/*
@@ -1161,9 +1233,10 @@
 	 * call!)
 	 */
 
+	ATH_LOCK(sc);
+	ath_lock_rx_check(sc);
 	ath_draintxq(sc, ATH_RESET_DEFAULT);
 
-	ATH_LOCK(sc);
 	/*
 	 * Reclaim beacon state.  Note this must be done before
 	 * the vap instance is reclaimed as we may have a reference
@@ -1230,6 +1303,7 @@
 		}
 		ath_hal_intrset(ah, sc->sc_imask);
 	}
+	sc->sc_in_reset--;
 	ATH_UNLOCK(sc);
 }
 
@@ -1335,6 +1409,10 @@
 
 /*
  * Interrupt handler.  Most of the actual processing is deferred.
+ *
+ * It may be a bad idea to grab the ATH_LOCK here to protect kickpcu.
+ * Perhaps a different lock is needed. That way ath_intr() doesn't end up
+ * waiting for a long time to handle an interrupt when TX/RX is busy.
  */
 void
 ath_intr(void *arg)
@@ -1653,6 +1731,9 @@
 		__func__, ifp->if_flags);
 
 	ATH_LOCK(sc);
+	sc->sc_in_reset++;
+	/* XXX TODO: suspend taskqueue, flush taskqueue ops, suspend RX? */
+	ath_lock_rx_check(sc);
 	/*
 	 * Stop anything previously setup.  This is safe
 	 * whether this is the first time through or not.
@@ -1740,6 +1821,7 @@
 	callout_reset(&sc->sc_wd_ch, hz, ath_watchdog, sc);
 	ath_hal_intrset(ah, sc->sc_imask);
 
+	sc->sc_in_reset--;
 	ATH_UNLOCK(sc);
 
 #ifdef ATH_TX99_DIAG
@@ -1795,6 +1877,8 @@
 		ath_draintxq(sc, ATH_RESET_DEFAULT);
 		if (!sc->sc_invalid) {
 			ath_stoprecv(sc);
+			/* XXX this drops the lock temporarily */
+			ath_rx_proc(sc, 0);
 			ath_hal_phydisable(ah);
 		} else
 			sc->sc_rxlink = NULL;
@@ -1808,10 +1892,24 @@
 	struct ath_softc *sc = ifp->if_softc;
 
 	ATH_LOCK(sc);
+	ath_lock_rx_check(sc);
 	ath_stop_locked(ifp);
 	ATH_UNLOCK(sc);
 }
 
+int
+ath_reset(struct ifnet *ifp, ATH_RESET_TYPE reset_type)
+{
+	struct ath_softc *sc = ifp->if_softc;
+	int r;
+
+	ATH_LOCK(sc);
+	ath_lock_rx_check(sc);
+	r = ath_reset_locked(ifp, reset_type);
+	ATH_UNLOCK(sc);
+	return r;
+}
+
 /*
  * Reset the hardware w/o losing operational state.  This is
  * basically a more efficient way of doing ath_stop, ath_init,
@@ -1820,7 +1918,7 @@
  * to reset or reload hardware state.
  */
 int
-ath_reset(struct ifnet *ifp, ATH_RESET_TYPE reset_type)
+ath_reset_locked(struct ifnet *ifp, ATH_RESET_TYPE reset_type)
 {
 	struct ath_softc *sc = ifp->if_softc;
 	struct ieee80211com *ic = ifp->if_l2com;
@@ -1829,18 +1927,15 @@
 
 	DPRINTF(sc, ATH_DEBUG_RESET, "%s: called\n", __func__);
 
-	ATH_LOCK(sc);
+	ATH_LOCK_ASSERT(sc);
 	sc->sc_in_reset++;
-	ATH_UNLOCK(sc);
+	ath_lock_rx_check(sc);
 
 	ath_hal_intrset(ah, 0);		/* disable interrupts */
 	ath_draintxq(sc, reset_type);	/* stop xmit side */
-	/*
-	 * XXX Don't flush if ATH_RESET_NOLOSS;but we have to first
-	 * XXX need to ensure this doesn't race with an outstanding
-	 * XXX taskqueue call.
-	 */
 	ath_stoprecv(sc);		/* stop recv side */
+	/* XXX this drops the lock temporarily */
+	ath_rx_proc(sc, 0);
 	ath_settkipmic(sc);		/* configure TKIP MIC handling */
 	/* NB: indicate channel change so we do a full reset */
 	if (!ath_hal_reset(ah, sc->sc_opmode, ic->ic_curchan, AH_TRUE, &status))
@@ -1867,13 +1962,26 @@
 #endif
 			ath_beacon_config(sc, NULL);
 	}
+
+	/* Restart TX; schedule TX frames if needed */
+	if (reset_type == ATH_RESET_NOLOSS) {
+		int i;
+
+		for (i = 0; i < HAL_NUM_TX_QUEUES; i++) {
+			if (ATH_TXQ_SETUP(sc, i)) {
+				ATH_TXQ_LOCK(&sc->sc_txq[i]);
+				ath_txq_restart_dma(sc, &sc->sc_txq[i]);
+				ath_txq_sched(sc, &sc->sc_txq[i]);
+				ATH_TXQ_UNLOCK(&sc->sc_txq[i]);
+			}
+		}
+	}
+
 	ath_hal_intrset(ah, sc->sc_imask);
 
-	ATH_LOCK(sc);
 	sc->sc_in_reset--;
-	ATH_UNLOCK(sc);
 
-	ath_start(ifp);			/* restart xmit */
+	ath_start_locked(ifp);			/* restart xmit */
 	return 0;
 }
 
@@ -2013,14 +2121,45 @@
 ath_start(struct ifnet *ifp)
 {
 	struct ath_softc *sc = ifp->if_softc;
+
+	ATH_LOCK(sc);
+
+	/*
+	 * We may be called during an active reset from another thread;
+	 * at this point we just defer until the reset is complete.
+	 * The reset will call ath_start_locked for us.
+	 *
+	 * For now, don't defer if called during a non-reset ath_rx_proc;
+	 * this maintains the same behaviour that the current driver code
+	 * can do (ie, TX and RX paths can interleave.)
+	 */
+	if (sc->sc_in_reset) {
+		device_printf(sc->sc_dev,
+		    "%s: sc_in_reset=%d; skipping\n",
+		        __func__, sc->sc_in_reset);
+	} else {
+		ath_start_locked(ifp);
+	}
+	ATH_UNLOCK(sc);
+}
+
+static void
+ath_start_locked(struct ifnet *ifp)
+{
+	struct ath_softc *sc = ifp->if_softc;
 	struct ieee80211_node *ni;
 	struct ath_buf *bf;
 	struct mbuf *m, *next;
 	ath_bufhead frags;
 	int tx = 0;
 
+	ATH_LOCK_ASSERT(sc);
+
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || sc->sc_invalid)
 		return;
+
+	sc->sc_in_txsend++;
+
 	for (;;) {
 		/*
 		 * Grab a TX buffer and associated resources.
@@ -2105,6 +2244,7 @@
 
 		sc->sc_wd_timer = 5;
 	}
+	sc->sc_in_txsend--;
 }
 
 static int
@@ -3421,6 +3561,8 @@
 	struct mbuf *m;
 	struct ath_desc *ds;
 
+	ATH_LOCK_ASSERT(sc);
+
 	m = bf->bf_m;
 	if (m == NULL) {
 		/*
@@ -3714,10 +3856,41 @@
 ath_rx_tasklet(void *arg, int npending)
 {
 	struct ath_softc *sc = arg;
+	struct ifnet *ifp = sc->sc_ifp;
 
 	CTR1(ATH_KTR_INTR, "ath_rx_proc: pending=%d", npending);
 	DPRINTF(sc, ATH_DEBUG_RX_PROC, "%s: pending %u\n", __func__, npending);
+	ATH_LOCK(sc);
+
+	/*
+	 * If we've occured whilst we're busy in a reset,
+	 * log a warning and bail. This may occur whilst
+	 * ath_rx_proc() is flushing frames and releases
+	 * the comlock.
+	 */
+	if (sc->sc_in_reset) {
+		device_printf(sc->sc_dev, "%s: sc_in_reset=%d, bailing!\n",
+		    __func__, sc->sc_in_reset);
+		taskqueue_enqueue_fast(sc->sc_tq, &sc->sc_rxtask);
+		ATH_UNLOCK(sc);
+		return;
+	}
+	sc->sc_in_rxtasklet++;
 	ath_rx_proc(sc, 1);
+
+	/*
+	 * Age frames on the FF assembly queue; then
+	 * kick off another round of TX.
+	 */
+	if ((ifp->if_drv_flags & IFF_DRV_OACTIVE) == 0) {
+#ifdef IEEE80211_SUPPORT_SUPERG
+		ieee80211_ff_age_all(ic, 100);
+#endif
+		if (!IFQ_IS_EMPTY(&ifp->if_snd))
+			ath_start_locked(ifp);
+	}
+	sc->sc_in_rxtasklet--;
+	ATH_UNLOCK(sc);
 }
 
 static void
@@ -3740,6 +3913,31 @@
 	u_int64_t tsf;
 	int npkts = 0;
 
+	/*
+	 * This is separate to sc_in_rxtasklet.
+	 *
+	 * It's possible that some other function will try
+	 * to complete some work once we let go of the ATH_LOCK -
+	 * eg a reset, channel change function. Since we don't want
+	 * this to occur, we could check sc_in_rxtasklet - but then
+	 * what if the reset/chanchange is what has called ath_rx_proc?
+	 */
+	ATH_LOCK_ASSERT(sc);
+	if (sc->sc_in_reset > 1 || sc->sc_in_rxproc >= 1) {
+		/*
+		 * If this is greater than 1, we're being recursively
+		 * called OR being scheduled on >1 CPU.
+		 * Even though there's locking involved here, I haven't
+		 * checked to ensure that two rxproc's running on the same
+		 * list, complete with locking, will correctly handle
+		 * things.
+		 */
+		device_printf(sc->sc_dev,
+		    "%s: danger! sc_in_reset=%d, sc_in_rxproc=%d\n",
+		    __func__, sc->sc_in_reset, sc->sc_in_rxproc);
+	}
+	sc->sc_in_rxproc++;
+
 	DPRINTF(sc, ATH_DEBUG_RX_PROC, "%s: called\n", __func__);
 	ngood = 0;
 	nf = ath_hal_getchannoise(ah, sc->sc_curchan);
@@ -4022,6 +4220,19 @@
 		m_adj(m, -IEEE80211_CRC_LEN);
 
 		/*
+		 * Drop the ATH_LOCK here.
+		 *
+		 * We can't hold the lock across the net80211 calls or
+		 * bad stuff occurs.
+		 *
+		 * Instead, we drop the lock, do the net80211 RX, then
+		 * we reacquire the lock. Later on we should check
+		 * whether the interface state has changed, requiring
+		 * us to break out early.
+		 */
+		ATH_UNLOCK(sc);
+
+		/*
 		 * Locate the node for sender, track state, and then
 		 * pass the (referenced) node up to the 802.11 layer
 		 * for its use.
@@ -4068,6 +4279,9 @@
 		} else {
 			type = ieee80211_input_all(ic, m, rs->rs_rssi, nf);
 		}
+
+		ATH_LOCK(sc);
+
 		/*
 		 * Track rx rssi and do any rx antenna management.
 		 */
@@ -4136,19 +4350,13 @@
 		ath_mode_init(sc);		/* set filters, etc. */
 		ath_hal_startpcurecv(ah);	/* re-enable PCU/DMA engine */
 
-		ATH_LOCK(sc);
 		ath_hal_intrset(ah, sc->sc_imask);
 		sc->sc_kickpcu = 0;
-		ATH_UNLOCK(sc);
 	}
 
-	if (resched && (ifp->if_drv_flags & IFF_DRV_OACTIVE) == 0) {
-#ifdef IEEE80211_SUPPORT_SUPERG
-		ieee80211_ff_age_all(ic, 100);
-#endif
-		if (!IFQ_IS_EMPTY(&ifp->if_snd))
-			ath_start(ifp);
-	}
+	/* Finished - decrement this */
+	sc->sc_in_rxproc--;
+
 #undef PA2DESC
 }
 
@@ -4502,6 +4710,8 @@
 	int nacked;
 	HAL_STATUS status;
 
+	ATH_LOCK_ASSERT(sc);
+
 	DPRINTF(sc, ATH_DEBUG_TX_PROC, "%s: tx queue %u head %p link %p\n",
 		__func__, txq->axq_qnum,
 		(caddr_t)(uintptr_t) ath_hal_gettxbuf(sc->sc_ah, txq->axq_qnum),
@@ -4641,9 +4851,9 @@
 	uint32_t txqs;
 
 	ATH_LOCK(sc);
+	sc->sc_in_txproc++;
 	txqs = sc->sc_txq_active;
 	sc->sc_txq_active &= ~txqs;
-	ATH_UNLOCK(sc);
 
 	if (TXQACTIVE(txqs, 0) && ath_tx_processq(sc, &sc->sc_txq[0], 1))
 		/* XXX why is lastrx updated in tx code? */
@@ -4656,7 +4866,9 @@
 	if (sc->sc_softled)
 		ath_led_event(sc, sc->sc_txrix);
 
-	ath_start(ifp);
+	ath_start_locked(ifp);
+	sc->sc_in_txproc--;
+	ATH_UNLOCK(sc);
 }
 
 /*
@@ -4672,9 +4884,9 @@
 	uint32_t txqs;
 
 	ATH_LOCK(sc);
+	sc->sc_in_txproc++;
 	txqs = sc->sc_txq_active;
 	sc->sc_txq_active &= ~txqs;
-	ATH_UNLOCK(sc);
 
 	/*
 	 * Process each active queue.
@@ -4699,7 +4911,9 @@
 	if (sc->sc_softled)
 		ath_led_event(sc, sc->sc_txrix);
 
-	ath_start(ifp);
+	ath_start_locked(ifp);
+	sc->sc_in_txproc--;
+	ATH_UNLOCK(sc);
 }
 
 /*
@@ -4714,9 +4928,9 @@
 	uint32_t txqs;
 
 	ATH_LOCK(sc);
+	sc->sc_in_txproc++;
 	txqs = sc->sc_txq_active;
 	sc->sc_txq_active &= ~txqs;
-	ATH_UNLOCK(sc);
 
 	/*
 	 * Process each active queue.
@@ -4734,7 +4948,9 @@
 	if (sc->sc_softled)
 		ath_led_event(sc, sc->sc_txrix);
 
-	ath_start(ifp);
+	ath_start_locked(ifp);
+	sc->sc_in_txproc--;
+	ATH_UNLOCK(sc);
 }
 #undef	TXQACTIVE
 
@@ -4903,11 +5119,18 @@
 	struct ifnet *ifp = sc->sc_ifp;
 	int i;
 
+	ATH_LOCK_ASSERT(sc);
+
 	(void) ath_stoptxdma(sc);
 
-	for (i = 0; i < HAL_NUM_TX_QUEUES; i++)
-		if (ATH_TXQ_SETUP(sc, i))
-			ath_tx_draintxq(sc, &sc->sc_txq[i]);
+	for (i = 0; i < HAL_NUM_TX_QUEUES; i++) {
+		if (ATH_TXQ_SETUP(sc, i)) {
+			if (reset_type == ATH_RESET_NOLOSS)
+				ath_tx_processq(sc, &sc->sc_txq[i], 0);
+			else
+				ath_tx_draintxq(sc, &sc->sc_txq[i]);
+		}
+	}
 #ifdef ATH_DEBUG
 	if (sc->sc_debug & ATH_DEBUG_RESET) {
 		struct ath_buf *bf = TAILQ_FIRST(&sc->sc_bbuf);
@@ -4921,8 +5144,10 @@
 		}
 	}
 #endif /* ATH_DEBUG */
-	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
-	sc->sc_wd_timer = 0;
+	if (reset_type != ATH_RESET_NOLOSS) {
+		ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
+		sc->sc_wd_timer = 0;
+	}
 }
 
 /*
@@ -4936,6 +5161,10 @@
 		((_pa) - (_sc)->sc_rxdma.dd_desc_paddr)))
 	struct ath_hal *ah = sc->sc_ah;
 
+	DPRINTF(sc, ATH_DEBUG_ANY, "%s: called\n", __func__);
+
+	ATH_LOCK_ASSERT(sc);
+
 	ath_hal_stoppcurecv(ah);	/* disable PCU */
 	ath_hal_setrxfilter(ah, 0);	/* clear recv filter */
 	ath_hal_stopdmarecv(ah);	/* disable DMA engine */
@@ -4976,6 +5205,8 @@
 	struct ath_hal *ah = sc->sc_ah;
 	struct ath_buf *bf;
 
+	ATH_LOCK_ASSERT(sc);
+
 	sc->sc_rxlink = NULL;
 	sc->sc_rxpending = NULL;
 	TAILQ_FOREACH(bf, &sc->sc_rxbuf, bf_list) {
@@ -5027,6 +5258,8 @@
 	struct ieee80211com *ic = ifp->if_l2com;
 	struct ath_hal *ah = sc->sc_ah;
 
+	ATH_LOCK_ASSERT(sc);
+
 	DPRINTF(sc, ATH_DEBUG_RESET, "%s: %u (%u MHz, flags 0x%x)\n",
 	    __func__, ieee80211_chan2ieee(ic, chan),
 	    chan->ic_freq, chan->ic_flags);
@@ -5041,6 +5274,8 @@
 		ath_hal_intrset(ah, 0);		/* disable interrupts */
 		ath_draintxq(sc, ATH_RESET_FULL);	/* clear pending tx frames */
 		ath_stoprecv(sc);		/* turn off frame recv */
+		/* XXX this drops the lock temporarily */
+		ath_rx_proc(sc, 0);
 		if (!ath_hal_reset(ah, sc->sc_opmode, chan, AH_TRUE, &status)) {
 			if_printf(ifp, "%s: unable to reset "
 			    "channel %u (%u MHz, flags 0x%x), hal status %u\n",
@@ -5129,7 +5364,7 @@
 			DPRINTF(sc, ATH_DEBUG_CALIBRATE,
 				"%s: rfgain change\n", __func__);
 			sc->sc_stats.ast_per_rfgain++;
-			ath_reset(ifp, ATH_RESET_NOLOSS);
+			ath_reset_locked(ifp, ATH_RESET_NOLOSS);
 		}
 		/*
 		 * If this long cal is after an idle period, then
@@ -5247,11 +5482,7 @@
 	struct ifnet *ifp = ic->ic_ifp;
 	struct ath_softc *sc = ifp->if_softc;
 
-	/* This isn't strictly a reset, but we still have to drain */
 	ATH_LOCK(sc);
-	sc->sc_in_reset++;
-	ATH_UNLOCK(sc);
-
 	(void) ath_chan_set(sc, ic->ic_curchan);
 	/*
 	 * If we are returning to our bss channel then mark state
@@ -5261,9 +5492,6 @@
 	 */
 	if (!sc->sc_scanning && ic->ic_curchan == ic->ic_bsschan)
 		sc->sc_syncbeacon = 1;
-
-	ATH_LOCK(sc);
-	sc->sc_in_reset--;
 	ATH_UNLOCK(sc);
 }
 
@@ -5804,7 +6032,7 @@
 			    hangs & 0xff ? "bb" : "mac", hangs);
 		} else
 			if_printf(ifp, "device timeout\n");
-		ath_reset(ifp, ATH_RESET_NOLOSS);
+		ath_reset_locked(ifp, ATH_RESET_NOLOSS);
 		ifp->if_oerrors++;
 		sc->sc_stats.ast_watchdog++;
 	}

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=9KkQbiCCBQm5Xm_9dNZtY6jQ1hzO0SWSZ4z323jF=Hw>