Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Mar 2012 23:32:58 GMT
From:      Adrian Chadd <adrian@FreeBSD.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/165866: [ath] TX hangs, requiring a "scan" to properly reset the interface
Message-ID:  <201203082332.q28NWw6E050059@red.freebsd.org>
Resent-Message-ID: <201203082340.q28NeAnM039894@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         165866
>Category:       kern
>Synopsis:       [ath] TX hangs, requiring a "scan" to properly reset the interface
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 08 23:40:10 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Adrian Chadd
>Release:        FreeBSD-HEAD
>Organization:
>Environment:
FreeBSD home-11bg-ap 10.0-CURRENT FreeBSD 10.0-CURRENT #18 r232400:232625M: Wed Dec 31 16:00:00 PST 1969     adrian@dummy:/home/adrian/work/freebsd/svn/obj/mipseb/mips.mipseb/usr/home/adrian/work/freebsd/svn/src/sys/TP-WN1043ND  mips

>Description:
I've been seeing TX hangs during my tests.

Investigating showed that the TX queue would grow and busy buffers would stay busy.

Eg, from sysctl dev.ath.0.txagg=1:


HW TXQ 0: axq_depth=0, axq_aggr_depth=0
HW TXQ 1: axq_depth=184, axq_aggr_depth=0
HW TXQ 2: axq_depth=0, axq_aggr_depth=0
HW TXQ 3: axq_depth=0, axq_aggr_depth=0
HW TXQ 8: axq_depth=1, axq_aggr_depth=0
Busy: 14
Total TX buffers: 15; Total TX buffers busy: 1

This occured even with a completely idle access point that only responded to probe requests - ie, no active associations.

the only way to flush things was a 'scan' - this forcibly flushes the TX queue and pending frames are either handled or deleted.

I then flipped on reset debugging (sysctl dev.ath.0.debug=0x20) and forced a scan whenever I saw this occur.

I also dumped the relevant registers when this occured. I found that the TXDP for this queue was completely in the wrong place.

I also found that the TX descriptor list made no sense - there were incomplete and complete descriptor lists in the same TX queue, as well as NULL link pointers half way through the list.

So, I figured something is splicing the list together incorrectly.

>How-To-Repeat:
This kernel was compiled with TDMA support, so the ATH_BUF_BUSY flag would be set.

* set it up on a 2.4GHz channel;
* make sure there's lots of STAs and APs around;
* notice the high level of probe request traffic;
* .. wait.

>Fix:
This particular patch seems to quieten down the issues. I'm going to run this a bit more and see what happens.


Index: if_ath_tx.c
===================================================================
--- if_ath_tx.c (revision 232400)
+++ if_ath_tx.c (working copy)
@@ -623,19 +623,22 @@
 ath_txq_restart_dma(struct ath_softc *sc, struct ath_txq *txq)
 {
        struct ath_hal *ah = sc->sc_ah;
-       struct ath_buf *bf;
+       struct ath_buf *bf, *bf_last;
 
        ATH_TXQ_LOCK_ASSERT(txq);
 
        /* This is always going to be cleared, empty or not */
        txq->axq_flags &= ~ATH_TXQ_PUTPENDING;
 
+       /* XXX make this ATH_TXQ_FIRST */
        bf = TAILQ_FIRST(&txq->axq_q);
+       bf_last = ATH_TXQ_LAST(txq, axq_q_s);
+
        if (bf == NULL)
                return;
 
        ath_hal_puttxbuf(ah, txq->axq_qnum, bf->bf_daddr);
-       txq->axq_link = &bf->bf_lastds->ds_link;
+       txq->axq_link = &bf_last->bf_lastds->ds_link;
        ath_hal_txstart(ah, txq->axq_qnum);
 }
 


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203082332.q28NWw6E050059>