Date: Sat, 5 Jan 2013 20:53:16 -0800 From: Adrian Chadd <adrian@freebsd.org> To: freebsd-wireless@freebsd.org Subject: [CFT] ath(4) migration to if_transmit() and a transmit tasklet, rather than direct dispatch Message-ID: <CAJ-Vmo=b2p_HMvSUAYFaY0ZunuB6Sjbrx8cEpDFfVBM=SGVEYg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I've written up a replacement ath(4) TX path that does a bunch of things. The patch I'd like everyone to try: http://people.freebsd.org/~adrian/ath/20130105-if_transmit_txfrag_2.diff What it does: * It implements a driver staging queue for frames from if_start and if_transmit(); * It populates that staging queue with ath_buf's, that contain the mbuf and node ref; * The actual TX occurs in a taskqueue (the default ath taskqueue for now, but I'll move it to another one soon) in order to serialise things; * The tx fragment list is populated correctly (but doesn't quite work, see below); * the reliance on "peeking" at the next mbuf in a fragment list is gone - instead, I now store the data length of the next fragment in the current buffer and fire that across. Now, in station mode I get exactly the same throughput as before - 150-180mbit TCP iperf tests. It's great. I need to do some more (single core) MIPS testing - if it drops in throughput there, it'll likely be the ridiculously huge calls to taskqueue_enqueue() and/or some lock contention. I'd like to commit this to -HEAD and then begin next the next phase, which is: * Figure out why TX fragments are transmitted but dropped by (some) receivers. FreeBSD -> FreeBSD works fine, but FreeBSD -> (some random 11g cable modem router) just plain drops the fragments in question. Sigh; * Tidy up some more of the locking, which likely involves separating out the TX queue lock from the TX taskqueue lock; * Push raw xmit frames into the same queue mechanism, so they are queued in the same fashion and obey the same sequence number / CCMP IV allocation ordering that data frames do (which is important as things like EAPOL frames are encrypted and have sequence numbers, but come in the raw xmit path. Grr.) * Finish tidying up when things are called - specifically, I'd like to make sure all the buffer completions occur outside of the locks behind held, so I can finally avoid a bunch of potential LORs when doing things like transmitting BAR frames from the TX completion path. I'd really appreciate any testing that can be done for this. It doesn't matter which mode you're in - adhoc, hostap, mesh, sta, 11n or non-11n - all ath(4) chips share the same TX path code and this all happens before the software TX queue and aggregation handling. Once this is all verified and working, I'll work on migrating the net80211 TX path to actually use if_transmit() itself and use a TX taskqueue to serialise all TX. That should fix a whole bunch of subtle, niggling little TX side bugs that have been the bane of my existence since I took this code on a couple years ago. Finally - although I'd like to _fix_ TX fragment handling, it still has its .. quirks. I'm still worried that a very active STA or AP with multiple traffic sessions will end up with TX fragments in the software queue that aren't kept 100% in order (ie, other frames from other sessions get interspersed with other session traffic. For now it won't happen - the TX lock is held for the duration of running the TX queuing and so (in theory!) nothing should appear in the TX queue in between TX fragments. But I don't trust it. Chances are the correct fix is a lot more nasty than the current net80211 way of "just do fragmentation in the net80211 layer and the driver will figure it out" lets me do cleanly. (Ie, I think the clean way is to do fragmentation at the point where you're about to queue it to the actual hardware and not in net80211, but I digress.) Adrian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=b2p_HMvSUAYFaY0ZunuB6Sjbrx8cEpDFfVBM=SGVEYg>