Date: Wed, 13 Feb 2013 21:14:53 -0800 From: Adrian Chadd <adrian@freebsd.org> To: freebsd-wireless@freebsd.org Subject: [RFC] serialising net80211 TX Message-ID: <CAJ-VmonS0cds9nCFYxc_nZuDRL93=2_4T2B4tUzPuGC3Bhz2FA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I'd like to work on the net80211 TX serialisation now. I'll worry about driver serialisation and if_transmit methods later. The 30 second version - it happens in parallel, which means preemption and multi-core devices can and will hit lots of subtle and hard-to-debug races in the TX path. We actually need an end-to-end serialisation method - not only for the 802.11 state (sequence number, correct aggregation handling, etc) but to line up 802.11 sequence number allocation with the encryption IV/PN values. Otherwise you end up with lots of crazy subtle out of order packets occuring. The other is the seqno/CCMP IV race between the raw transmit path and the normal transmit path. There are other nagging issues that I'm trying to resolve - but, one thing at a time. So there are three current contenders: * wrap everything in net80211 TX in a per-vap TX lock; grab it at the beginning of ieee80211_output() and ieee80211_start(), and don't release it until the frame is queued to something (a power save queue, an age queue, the driver.) That guarantees that the driver is called in lock-step with each frame being processed. * do deferred transmit- ie, the net80211 entry points simply queue mbufs to a queue, and a taskqueue runs over the ifnet queue and runs those frames in-order. There's no need for a lock here as there's only one sending context (either per-VAP or per-IC). * A hybrid setup - use a per-vap TX lock; do a try-acquire on it and direct dispatch from the queue head if we have it; otherwise defer frames into a queue and have a taskqueue handle those. 1) is what drivers like iwn(4) do internally. 2) is what I've tinkered with - but we become a slave to the scheduler. Task switching is expensive and unpredictable; doubly so for a non-preemption kernel. We'd have to run the TX taskqueue at some very high priority to get something resembling direct-dispatch behaviour. 3) is what the gige/10ge drivers do. They hold a big TX lock for each TX (from xxx_transmit() to hardware dispatch) and if they can't acquire the TX lock, they defer it to a drbd lockless ring buffer and service that via a taskqueue. I can implement any of the above. architecturally I'd prefer 2) - it massively simplifies and streamlines things, but the scheduling latency is just plain stab-worthy.I'm tempted to just do 1) for now and turn it into 3) if we need to. The main reason against doing 1) (and why 2) is nicer) is recursion - if the TX path wants to call the net80211 TX code for some odd reason, we'll hit lock recursion. I'd rather have the system crash at this point (and then fix the misbehaving driver) but that's just me. So - what do people think? Once this is done I'd like to make sure that the wifi chipset drivers do the same - ie, ensure that the frame order is preserved both between the normal and the raw xmit paths. That should fix all of the odd CCMP out of order crap that I see under heavy, heavy test conditions. Thanks, Adrian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonS0cds9nCFYxc_nZuDRL93=2_4T2B4tUzPuGC3Bhz2FA>