Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Mar 2012 07:32:55 GMT
From:      Adrian Chadd <adrian@FreeBSD.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/165966: [ath] ath0: device timeout on SMP machines due to race conditions
Message-ID:  <201203120732.q2C7WtFj008096@red.freebsd.org>
Resent-Message-ID: <201203120740.q2C7eASQ072246@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         165966
>Category:       kern
>Synopsis:       [ath] ath0: device timeout on SMP machines due to race conditions
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 12 07:40:09 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Adrian Chadd
>Release:        9.0-RELEASE w/ -HEAD net80211/ath
>Organization:
>Environment:
FreeBSD kimberley 9.0-RELEASE FreeBSD 9.0-RELEASE #1: Sun Mar 11 23:12:54 PDT 2012     root@kimberley:/usr/src/sys/i386/compile/MARILYN  i386

>Description:
I've traced down a device timeout to the following race condition.

* Something queues a frame via ath_start();
* The TX is queued to the hardware;
* An interrupt occurs immediately;
* The taskqueue is scheduled;
* The completion occurs via ath_tx_processq();
* sc_wd_timer is set to 0;
* .. then, back to ath_start() we go and sc_wd_timer is set to 5.

I'm not (yet) sure whether it's occuring on both CPUs or whether it's occuring on a single CPU with preemption going on.

Here's a snippet of the debug log (with sysctl dev.ath.0.debug=0x3023) and a couple of statements which print something whenever sc_wd_timer is set to something non-zero:


[100080] ath0: ath_tx_dmasetup: m 0xc5d3a900 len 128
TODS 00:03:7f:07:44:78->f4:ec:38:9c:47:24(00:1b:b1:58:f6:f0) data QoS [TID 0] WEP [IV 05 00 00 00 00 00 KID 0] 6M
 8841 3c00 001b b158 f6f0 0003 7f07 4478 f4ec 389c 4724 4000 0000 4724 0500 0020 0000 0000 aaaa 0300 0000 0800 4500 0054 001a 0000 4001 ad94 0a0b 0147 c0a8 0101 0800 8bea 8908 0000 4f5d 95ad 0007 12f8 0809 0a0b 0c0d 0e0f 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 3435 3637
[100080] ath0: ath_tx_chaindesclist: 0: 00000000 0253a95c 613f008a 00008080 24348000 0004b50c
[100080] ath0: ath_tx_handoff_hw: TXDP[1] = 0x1f2a2b00 (0xd8a58b00) depth 1
[100022] ath0: ath_intr: status 0xc0
[100124] ath0: ath_tx_processq: tx queue 1 head 0x1f2a2b00 link 0xd8a58b00
Q1[  0] (DS.V:0xd8a58b00 DS.P:0x1f2a2b00) L:00000000 D:0253a95c F:0011 *
[100022] ath0: ath_intr: status 0x9
        TXF: 0011 Seq: 64 swtry: 0 ADDBAW?: 0 DOBAW?: 0
        613f008a 00008080 24348000 0004b50c c32a0001 0005e081
  [end]
[100022] ath0: ath_intr: status 0x9

. then 5 seconds later:


[100022] ath0: ath_intr: status 0x9
ath0: device timeout
[100124] ath0: ath_reset: called

The number at the beginning is the thread id, (via patching device_printf()).
>How-To-Repeat:

>Fix:
I'm honestly not sure at this point. :-)

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203120732.q2C7WtFj008096>