From owner-svn-src-user@FreeBSD.ORG Wed Aug 17 23:59:55 2011 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 989311065670; Wed, 17 Aug 2011 23:59:55 +0000 (UTC) (envelope-from adrian@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 883208FC08; Wed, 17 Aug 2011 23:59:55 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id p7HNxtvw028188; Wed, 17 Aug 2011 23:59:55 GMT (envelope-from adrian@svn.freebsd.org) Received: (from adrian@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id p7HNxt7K028186; Wed, 17 Aug 2011 23:59:55 GMT (envelope-from adrian@svn.freebsd.org) Message-Id: <201108172359.p7HNxt7K028186@svn.freebsd.org> From: Adrian Chadd Date: Wed, 17 Aug 2011 23:59:55 +0000 (UTC) To: src-committers@freebsd.org, svn-src-user@freebsd.org X-SVN-Group: user MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r224952 - user/adrian/if_ath_tx/sys/dev/ath X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 23:59:55 -0000 Author: adrian Date: Wed Aug 17 23:59:55 2011 New Revision: 224952 URL: http://svn.freebsd.org/changeset/base/224952 Log: Close up any BAW races as much as possible; document the current problem and the workaround. In the first cut of this code, packet scheduling (ie, stuffing into a software or hardware TXQ) occured in the same context as the serialised TX from the net80211/netif code. When addba was set, the TID could be paused, and anything queued into the software queue would sit there and wait. Once the TID was unpaused, all those queued packets with sequence numbers would fall into an aggregate TID, and their sequence numbers would happily fall into the BAW. Now, packet scheduling occurs in the task context rather than the TX context. So when the TID is paused, any currently running packet scheduler/queue function in the ath task may be running concurrently. So it's possible that some packets would be dumped to the hardware before the TID pause value was checked. This commit (mostly) fixes the races in checking the TID paused flag. It doesn't check for races when forming aggregates; I'll worry about that later when I'm actively making aggregate frames. But as the TID isn't being locked for the entire duration of the packet scheduling and hardware queue, it's quite possible one will leak by between when TID->paused is set, and the next time it's checked. I'm also "sliding" the left edge of the BAW along to match whatever sequence numbers net80211 assigns between addba being setup and addba being actually enabled. The (likely) correct way to handle this is to queue frames during addba setup for this TID as aggregates without creating sequence numbers for them. If the TID pause is handled correctly, this should occur. This needs to be revisited and fixed before the code is merged into -HEAD. Modified: user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c Modified: user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c ============================================================================== --- user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c Wed Aug 17 20:55:56 2011 (r224951) +++ user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c Wed Aug 17 23:59:55 2011 (r224952) @@ -1104,7 +1104,7 @@ ath_tx_start(struct ath_softc *sc, struc /* Don't do it whilst pending; the net80211 layer still assigns them */ if (is_ampdu_tx) { /* - * Always set a seqno; this function will + * Always call; this function will * handle making sure that null data frames * don't get a sequence number from the current * TID and thus mess with the BAW. @@ -1676,6 +1676,9 @@ ath_tx_tid_sched(struct ath_softc *sc, s ATH_TXQ_LOCK_ASSERT(txq); + if (atid->paused) + return; /* paused, can't schedule yet */ + if (atid->sched) return; /* already scheduled */ @@ -2564,11 +2567,24 @@ ath_tx_tid_hw_queue_aggr(struct ath_soft for (;;) { ATH_TXQ_LOCK(atid); + + /* + * If the upper layer has paused the TID, don't + * queue any further packets. + * + * This can also occur from the completion task because + * of packet loss; but as its serialised with this code, + * it won't "appear" half way through queuing packets. + */ + if (atid->paused) + break; + bf = STAILQ_FIRST(&atid->axq_q); if (bf == NULL) { ATH_TXQ_UNLOCK(atid); break; } + DPRINTF(sc, ATH_DEBUG_SW_TX, "%s: bf=%p: tid=%d\n", __func__, bf, bf->bf_state.bfs_tid); if (bf->bf_state.bfs_tid != tid) @@ -2656,11 +2672,20 @@ ath_tx_tid_hw_queue_norm(struct ath_soft for (;;) { ATH_TXQ_LOCK(atid); + + /* + * If the upper layers have paused the TID, don't + * queue any further packets. + */ + if (atid->paused) + break; + bf = STAILQ_FIRST(&atid->axq_q); if (bf == NULL) { ATH_TXQ_UNLOCK(atid); break; } + ATH_TXQ_REMOVE_HEAD(atid, bf_list); ATH_TXQ_UNLOCK(atid); @@ -2817,9 +2842,32 @@ ath_addba_request(struct ieee80211_node struct ath_node *an = ATH_NODE(ni); struct ath_tid *atid = &an->an_tid[tid]; - DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, "%s: called\n", __func__); + /* + * XXX This isn't enough. + * + * The taskqueue may be running and scheduling some more packets. + * It acquires the TID lock to serialise access to the TID paused + * flag but as the rest of the code doesn't hold the TID lock + * for the duration of any activity (outside of adding/removing + * items from the software queue), it can't possibly guarantee + * consistency. + * + * This pauses future scheduling, but it doesn't interrupt the + * current scheduling, nor does it wait for that scheduling to + * finish. So the txseq window has moved, and those frames + * in the meantime have "normal" completion handlers. + * + * The addba teardown pause/resume likely has the same problem. + */ ath_tx_tid_pause(sc, atid); + DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, + "%s: called; dialogtoken=%d, baparamset=%d, batimeout=%d\n", + __func__, dialogtoken, baparamset, batimeout); + DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, + "%s: txa_start=%d, ni_txseqs=%d\n", + __func__, tap->txa_start, ni->ni_txseqs[tid]); + return sc->sc_addba_request(ni, tap, dialogtoken, baparamset, batimeout); } @@ -2831,6 +2879,18 @@ ath_addba_request(struct ieee80211_node * * Any packets TX'ed from this point should be "aggregate" (whether * aggregate or not) so the BAW is updated. + * + * Note! net80211 keeps self-assigning sequence numbers until + * ampdu is negotiated. This means the initially-negotiated BAW left + * edge won't match the ni->ni_txseq. + * + * So, being very dirty, the BAW left edge is "slid" here to match + * ni->ni_txseq. + * + * What likely SHOULD happen is that all packets subsequent to the + * addba request should be tagged as aggregate and queued as non-aggregate + * frames; thus updating the BAW. For now though, I'll just slide the + * window. */ int ath_addba_response(struct ieee80211_node *ni, struct ieee80211_tx_ampdu *tap, @@ -2842,6 +2902,14 @@ ath_addba_response(struct ieee80211_node struct ath_tid *atid = &an->an_tid[tid]; int r; + DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, + "%s: called; status=%d, code=%d, batimeout=%d\n", __func__, + status, code, batimeout); + + DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, + "%s: txa_start=%d, ni_txseqs=%d\n", + __func__, tap->txa_start, ni->ni_txseqs[tid]); + /* * Call this first, so the interface flags get updated * before the TID is unpaused. Otherwise a race condition @@ -2850,7 +2918,13 @@ ath_addba_response(struct ieee80211_node */ r = sc->sc_addba_response(ni, tap, status, code, batimeout); - DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, "%s: called\n", __func__); + /* + * XXX dirty! + * Slide the BAW left edge to wherever net80211 left it for us. + * Read above for more information. + */ + tap->txa_start = ni->ni_txseqs[tid]; + ath_tx_tid_resume(sc, atid); return r; }