Date: Mon, 19 Mar 2012 00:10:14 GMT From: Adrian Chadd <adrian@freebsd.org> To: freebsd-wireless@FreeBSD.org Subject: Re: kern/166190: [ath] TX hangs and frames stuck in TX queue Message-ID: <201203190010.q2J0AElW089620@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/166190; it has been noted by GNATS. From: Adrian Chadd <adrian@freebsd.org> To: bug-followup@freebsd.org, Vincent Hoffman <vince@unsane.co.uk> Cc: freebsd-wireless@freebsd.org Subject: Re: kern/166190: [ath] TX hangs and frames stuck in TX queue Date: Sun, 18 Mar 2012 17:10:04 -0700 I think I understand what's going on here. It turns out that multiple instances of the TX code (via if_start()) were running at the same time. These were processing frames from the input queue and assigning them sequence numbers. This seems to be occuring: * thread A would allocate sequence number 5 * thread B would concurrency allocate sequence number 6 * thread B would then "win" the race to add it to the BAW, as the sequence numbers were allocated early but it wouldn't be added to the queue until much later * then thread A would try adding its frame to the BAW, but since the BAW left edge is now 6, 5 is now "out of window". I have a local patch here which I'm going to test tonight/tomorrow. It delays the sequence number allocation until _right before_ the frame may be added to the BAW. This is done inside the same lock, so there's no chance that it'll race with another concurrent thread. I won't commit it until I have committed some verification code to -HEAD to complain loudly when a frame _before_ the BAW is trying to be queued. Since that shouldn't happen in reality, I'm going to guess that it'll pop up in my testing and Vincents use. Once I've verified that (a) my sanity checking code is firing as I expect it to, (b) Vincent also sees the same, and (c) this is fixed by my patch, I'll look at committing it. Vincent - thanks so very much for persisting with this bug! I'd not have really found it at all if you didn't point the odd behaviour out to me. Now - yes, the solution would also be "serialise the whole TX queue damnit." Yes, that'd solve it, but as I'm seeing 802.11ac around the corner, I'd like to actually debug, diagnose and document how a multi-threaded TX/RX path could work. Serialising the driver TX path isn't going to help me do that. :-) Adrian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203190010.q2J0AElW089620>