From owner-freebsd-wireless@FreeBSD.ORG Mon Jul 23 19:24:36 2012 Return-Path: Delivered-To: wireless@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83898106566C; Mon, 23 Jul 2012 19:24:36 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id BAA288FC0A; Mon, 23 Jul 2012 19:24:35 +0000 (UTC) Received: by laai10 with SMTP id i10so283274laa.13 for ; Mon, 23 Jul 2012 12:24:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=tFnALR2Fpr290zumRhHtJNbaQ3PSeHfHc7m3oIVrfn8=; b=LEZviwr6ucRX3Gvp0JadZXxbHMJqdm2W319o/ypIZAnyoMknez1JrFE7dBAGH4Sp7X Uf0JFChwKk/NkzuNwVi6xm/bjktGvflX+qgSDDi60olndg+0kJxQ0rCEJu8MWyCaU+5S vC1FIxwhfBLxNksNS/JqHbBINtzVffMRRY8Fkvb7Vv3p9Hr3kfn0/Xua2d0xOPu7YAqb 4chVxTcyTVOif8PMnDVTCCcRBJOAC/kbhH7IngsH3L/T5wgX5bjkgMdLgXH3+fTvxHQc iA/EObjO8OQxa1Tx+EDgPhRLwDhumTwYIbkMoNxtDhDTY5Y3plJS8w8tFiNFhTr8GoCK 914w== MIME-Version: 1.0 Received: by 10.152.105.51 with SMTP id gj19mr18167722lab.38.1343071474730; Mon, 23 Jul 2012 12:24:34 -0700 (PDT) Received: by 10.112.20.197 with HTTP; Mon, 23 Jul 2012 12:24:34 -0700 (PDT) In-Reply-To: <1468023399.20120723175744@serebryakov.spb.ru> References: <298155894.20120709002844@serebryakov.spb.ru> <1491344515.20120709013411@serebryakov.spb.ru> <1252921508.20120709132351@serebryakov.spb.ru> <802688919.20120709140808@serebryakov.spb.ru> <1966637204.20120713194935@serebryakov.spb.ru> <5862675.20120713203432@serebryakov.spb.ru> <7750188.20120716011053@serebryakov.spb.ru> <1214412854.20120722020752@serebryakov.spb.ru> <1468023399.20120723175744@serebryakov.spb.ru> Date: Mon, 23 Jul 2012 12:24:34 -0700 Message-ID: From: Adrian Chadd To: lev@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: wireless@freebsd.org Subject: Re: ath0: device timeout with 802.11n client X-BeenThere: freebsd-wireless@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of 802.11 stack, tools device driver development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jul 2012 19:24:36 -0000 .. and is that "bar_unsuspend" the first line to appear in your debugging output? Time to braindump what's going on in this aggregation/BAR path: * the TID pause/unpause only occurs for a few things - negotiating aggregation, tearing down aggregation, sending BAR; * and it's refcounted, because it's entirely possible multiple things will occur (eg sending BAR overlapping with tearing down aggregation); * So I need to try and establish whether this is something specific to say, just BAR handling, or whether it's overlapping with a call to ath_addba_stop() which will call ath_tx_tid_cleanup() to "clean" up the aggregation frames in preparation for dropping back to non-aggregation. Now - ath_addba_stop() is called when: * the station re-associates for whatever reason, as the state needs to be torn down and re-established; * the station times out and the state is being freed. So maybe some overlapping traffic and BAR TX is occuring, whilst the station re-associates. Finally, the way the cleanup path works is a bit hokey: * the caller pauses the TID; * it then calls ath_tx_tid_cleanup(); * ath_tx_tid_cleanup() then walks the list of frames that it thinks are in the hardware queue (don't ask why it does this - I stole it from Linux/Atheros code, before I just added a per-TID hardware queue counter) and it marks the TID as "cleanup in progress". * The TID is then only unpaused once all the hardware-queued frames have completed (ie, when incomp is decremented and equal to 0.) * .. that occurs in either ath_tx_comp_cleanup_aggr() and ath_tx_comp_cleanup_unaggr(). Now, incomp is 0, so it's unlikely due to something being stuck in the TX queue or the math calculating the queue depth being wrong. The interesting warning there is the debugging from ath_tx_tid_bar_unsuspend() - that shouldn't be called with both of those flags equal to 0. There's only a few places it's called from. There aren't many places where it could even _be_ set to 0. However, there's no locking in the net80211 stack for handling BAR, so I have to compensate by doing locking and workarounds in the TX / BAR path in the driver. So it's entirely possible I've missed something there. I think I recall one instance where aggregation was being attempted concurrently, multiple times, but only calling the completion routine once. Eek. Anyway, I hope that braindump helps you or someone else in potentially going for a dive through the code. Oh, one last thing - I actually have a local kernel modification to device_printf() in sys/kern/subr_bus.c (I think, just grep for it) which prints out the current thread ID before any device output. It makes concurrent/overlapping/racy behaviour _really_ obvious, as they come from separate thread IDs. You may wish to hack that in to your kernel and see if any of the debug statements are coming from different threads. Oh, and another last thing - you could also turn on association debugging (wlandebug -i wlanX +assoc) to capture whether the station is reassociating constantly. That'll help me out too. Thanks! Adrian On 23 July 2012 06:57, Lev Serebryakov wrote: > Hello, Adrian. > You wrote 22 =D0=B8=D1=8E=D0=BB=D1=8F 2012 =D0=B3., 2:10:42: > >>> AC> Hm. With debuggimg it works, but turning it off causes timeouts? >>> I'll turn it off tomorrow and will see > AC> Thanks. If that's the case, it's quite likely a timing / race > AC> condition bug and even your debugging logs as they are will be very > AC> helpful. > Again (without debug): > > ath0: ath_tx_tid_bar_unsuspend: bar_tx=3D0, bar_wait=3D0: ? > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: device timeout > ath0: ath_tx_tid_drain: node 0xc404b000: bf=3D0xc34fda60: addbaw=3D0, dob= aw=3D0, seqno=3D3113, retry=3D0 > ath0: ath_tx_tid_drain: node 0xc404b000: bf=3D0xc34fda60: tid txq_depth= =3D503 hwq_depth=3D0, bar_wait=3D0 > ath0: ath_tx_tid_drain: node 0xc404b000: tid 0: txq_depth=3D1, txq_aggr_d= epth=3D0, sched=3D0, paused=3D-1, hwq_depth=3D0, incomp=3D0, baw_head=3D111= , baw_tail=3D111 txa_start=3D3113, ni_txseqs=3D5107 > FRDS 00:0c:42:64:69:92->70:d4:f2:ad:6a:26(00:0c:42:64:69:92) data QoS [TI= D 0] WEP [IV 29 0c 00 00 00 00 KID 0] 0M > 8862 0000 70d4 f2ad 6a26 000c 4264 6992 000c 4264 6992 90c2 0000 0101 29= 0c 0020 0000 0000 aaaa 0300 0000 0800 > ath0: stuck beacon; resetting (bmiss count 4) > ath0: stuck beacon; resetting (bmiss count 4) > ath0: stuck beacon; resetting (bmiss count 4) > ath0: stuck beacon; resetting (bmiss count 4) > ath0: stuck beacon; resetting (bmiss count 4) > > > -- > // Black Lion AKA Lev Serebryakov >