From owner-freebsd-net@FreeBSD.ORG Fri Jun 1 07:46:02 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9F98106564A; Fri, 1 Jun 2012 07:46:02 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9A3BC8FC12; Fri, 1 Jun 2012 07:46:02 +0000 (UTC) Received: by pbbro2 with SMTP id ro2so2783188pbb.13 for ; Fri, 01 Jun 2012 00:46:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=fUrxBai8IanPq9d7CZuGpDA9MEUWiH3PmS6IE84TuMs=; b=Rr69UdW822DLCZccTQiel7YzAL231FB6eE4OPfQQML3U97zuN8Tc9TVgtXjUdtTHPE I8K2ijkltlfWmYf+YVj5++Zr00Ni3ozjEu59wSLMbhdCFKeBhJrjyzDxHkqsfcO5jYFC p3d5i+HdAhMx9IGgUlBeLBPQivA48uK/b4xbXMsLcnpSpYUj2IYVPD23pi4AfZvtMSJb JqfiXB3npI/Ak6hZBdmOoatIi5zvpE0po1A2X0KnkzCwRtdEaMOhJ+zFDzGbVGcgGJkZ WWcnL3jAG/mf2F/izgAVc+yhIFoGtgQUjpwr3DvXOOElMKd3tMHzy+jjJtxdh+CGeQ3C l6dQ== MIME-Version: 1.0 Received: by 10.68.135.201 with SMTP id pu9mr7516597pbb.146.1338536762123; Fri, 01 Jun 2012 00:46:02 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.142.203.2 with HTTP; Fri, 1 Jun 2012 00:46:02 -0700 (PDT) Date: Fri, 1 Jun 2012 00:46:02 -0700 X-Google-Sender-Auth: Fch6zSN3RyKHCEY0JIZEdj8yVJE Message-ID: From: Adrian Chadd To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-wireless@freebsd.org Subject: if_start / if_transmit handling and packet ordering - how can I guarantee it? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jun 2012 07:46:03 -0000 Hi all, I've been pushing my ath 11n driver hard (250Mbit UDP) and I've found a rather interesting behaviour, at least on my SMP machine. I'm running single direction UDP iperf tests. This is all on stable/9, with -head ath/net80211 stacks. The receiver logs about 40/20000 frames a second as being "out of order". The net80211 stack and ath software TX aggregation path isn't reordering frames. So I reviewed the code again and I wondered - is what I'm doing with ath_start() and the TX side locking allowing for multiple ath_start() invocations to occur, where they interleave processing frames? The short answer is yes. Then, something else hit me - I wondered whether ath_start() itself is being preempted by the ath taskqueue (say, doing RX, or TX completion) - where ath_start() is then called. Ie; * iperf -> sendto() -> socket layer -> net80211 ieee80211_start() -> if_transmit -> if_start() -> ath_start() And then an interrupt coming in, during ath_start() having dequeued a frame, but before it had placed it on any hardware/software queue: * ath0 taskqueue -> ath_tx_proc() (tx completion) -> ath_start() Or, similarly, CPU #0 running iperf, CPU #1 running the ath taskqueue, and one or the other being preempted by something higher priority (eg an interrupt coming in) between having removed a frame from the ifnet queue and it being thrown into the software/hardware queue. The ath driver locking only holds the TX locks as long as is needed to do individual TX queue operations (ie, queue a frame to the software / hardware queue) rather than holding it for the entirety of the TX path. Because of this, I wonder if preemption is possibly causing issues. So I then went grovelling through ixgbe to see what it does with if_transmit. It holds the TX lock for that particular NIC hardware queue for the entirety of: * a TX to hardware operation (which can be an individual frame queue, or servicing the whole bufring contents); * completing the frames via _txeof(), until nothing more needs to be done. So by holding the TX lock for the entirety of the queue operation, it means that any other entries into that particular TX path for that NIC queue will stall, waiting for the lock, and thus effectively be serialised, avoiding my initial issue. Any TX completion which leads to further transmissions from the bufring (and simultaneous incoming TX frames) will block each other. Ok, so now that I've mostly tried to lucidly dump what's going on- what do people think about holding the locks for (potentially) so long? I know iwn(4) holds the driver lock for as long as it can for _everything_, so it avoids this issue. But again, I don't really like the idea of holding a lock for this long. Does anyone else have any other ideas? FWIW - I temporarily converted the ath driver to make ath_start() enqueue a taskqueue task, which then did all of the TX inside the taskqueue. This serialised with TX completion and RX, which can also start TX; and all of my out of order issues went away. I unfortunately then become very, very susceptible to scheduling latency (hence my initial post about KTR, SMP, preemption and what looks to be like Cx/idle/powerd issues. If I run my laptop (Lenovo T60) with all the power/Cx stuff disabled, I can quite happily sustain 240-250MBit of UDP without any reordering or packet latency. Thanks in advance (and phew!) Adrian