From owner-freebsd-wireless@FreeBSD.ORG Mon Mar 19 00:10:05 2012 Return-Path: Delivered-To: freebsd-wireless@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0C20C106566C; Mon, 19 Mar 2012 00:10:05 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id C4B748FC0C; Mon, 19 Mar 2012 00:10:04 +0000 (UTC) Received: by pbcwz17 with SMTP id wz17so937950pbc.13 for ; Sun, 18 Mar 2012 17:10:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=Ev58C5BFInt2lZpne50Uv4sL8F5HNiIO71l4DOFAFNw=; b=sm4ZBXizZVqrOaR11LVmjMtFNOJo3lqh/WKjzibRxz9bTyrpvsLdP5y5OWDJo/hm20 w3cifBmTA+x+I5C/2YmIFLOiaDwfLnOfbNijutxcDU1jkppx/FzLhBhLdoICqrzbAAAv /VZiSaEkrbbGsN4gnsRIQfbdfUazgKxEVvRj6CQyw5qQvdiJKfHoinbDbIur6xNYUL5l DfwCqovHH9rA/FA4lyo7JWT8l6pwiAXnQDW3jpDknz6laDXoDNNx3u5P7Zu8x32oIs81 xY2x5otNGxY/K9Cz2SJiHSf7nYowt2rMxOWdaauQslmuNFe7yiXsKz2/LhlslrAIV24i bNeg== MIME-Version: 1.0 Received: by 10.68.234.195 with SMTP id ug3mr35525553pbc.4.1332115804296; Sun, 18 Mar 2012 17:10:04 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.143.33.5 with HTTP; Sun, 18 Mar 2012 17:10:04 -0700 (PDT) In-Reply-To: <201203170440.q2H4esnb099802@freefall.freebsd.org> References: <201203170440.q2H4esnb099802@freefall.freebsd.org> Date: Sun, 18 Mar 2012 17:10:04 -0700 X-Google-Sender-Auth: 5nZiGSP7i4SZM_Qe0pVIw9xcTkY Message-ID: From: Adrian Chadd To: bug-followup@freebsd.org, Vincent Hoffman Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-wireless@freebsd.org Subject: Re: kern/166190: [ath] TX hangs and frames stuck in TX queue X-BeenThere: freebsd-wireless@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of 802.11 stack, tools device driver development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Mar 2012 00:10:05 -0000 I think I understand what's going on here. It turns out that multiple instances of the TX code (via if_start()) were running at the same time. These were processing frames from the input queue and assigning them sequence numbers. This seems to be occuring: * thread A would allocate sequence number 5 * thread B would concurrency allocate sequence number 6 * thread B would then "win" the race to add it to the BAW, as the sequence numbers were allocated early but it wouldn't be added to the queue until much later * then thread A would try adding its frame to the BAW, but since the BAW left edge is now 6, 5 is now "out of window". I have a local patch here which I'm going to test tonight/tomorrow. It delays the sequence number allocation until _right before_ the frame may be added to the BAW. This is done inside the same lock, so there's no chance that it'll race with another concurrent thread. I won't commit it until I have committed some verification code to -HEAD to complain loudly when a frame _before_ the BAW is trying to be queued. Since that shouldn't happen in reality, I'm going to guess that it'll pop up in my testing and Vincents use. Once I've verified that (a) my sanity checking code is firing as I expect it to, (b) Vincent also sees the same, and (c) this is fixed by my patch, I'll look at committing it. Vincent - thanks so very much for persisting with this bug! I'd not have really found it at all if you didn't point the odd behaviour out to me. Now - yes, the solution would also be "serialise the whole TX queue damnit." Yes, that'd solve it, but as I'm seeing 802.11ac around the corner, I'd like to actually debug, diagnose and document how a multi-threaded TX/RX path could work. Serialising the driver TX path isn't going to help me do that. :-) Adrian