From owner-svn-src-all@FreeBSD.ORG Fri Jun 17 20:25:40 2011 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DC9C106564A; Fri, 17 Jun 2011 20:25:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4547B8FC18; Fri, 17 Jun 2011 20:25:40 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id EBC1C46B1A; Fri, 17 Jun 2011 16:25:39 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7BCA38A01F; Fri, 17 Jun 2011 16:25:39 -0400 (EDT) From: John Baldwin To: src-committers@freebsd.org Date: Fri, 17 Jun 2011 16:25:02 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201106172006.p5HK6qZs005000@svn.freebsd.org> In-Reply-To: <201106172006.p5HK6qZs005000@svn.freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201106171625.03191.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Fri, 17 Jun 2011 16:25:39 -0400 (EDT) Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org Subject: Re: svn commit: r223198 - head/sys/dev/e1000 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2011 20:25:40 -0000 On Friday, June 17, 2011 4:06:52 pm John Baldwin wrote: > Author: jhb > Date: Fri Jun 17 20:06:52 2011 > New Revision: 223198 > URL: http://svn.freebsd.org/changeset/base/223198 > > Log: > - Use a dedicated task to handle deferred transmits from the if_transmit > method instead of reusing the existing per-queue interrupt task. > Reusing the per-queue interrupt task could result in both an interrupt > thread and the taskqueue thread trying to handle received packets on a > single queue resulting in out-of-order packet processing. > - Don't define igb_start() at all on 8.0 and where if_transmit is used. > Replace last remaining call to igb_start() with a loop to kick off > transmit on each queue instead. > - Call ether_ifdetach() earlier in igb_detach(). > - Drain tasks and free taskqueues during igb_detach(). > > Reviewed by: jfv > MFC after: 1 week > > Modified: > head/sys/dev/e1000/if_igb.c > head/sys/dev/e1000/if_igb.h FYI, I ran into a workload where the concurrent reception of packets was breaking TCP. Specifically, the two threads could both attempt to process ACKs for a connection in the syncache. The first thread would "win" and create a connection, but the second thread had already done a pcb lookup and found the listen socket before waiting for a write lock on the TCP pcbinfo. As a result, the second thread also attempted to create a new connection based on the syncookie. However, it failed in in_pcbconnect_setup() with EADDRINUSE when it found the first connection in the PCB hash. When it failed, it dropped the ACK and sent a RST to the remote end causing the other end to drop the connection silently. Unfortunately, the first thread had created a valid socket which was returned to userland via accept(). That socket contained all the inflight data sent by the remote end before it received the RST. The net effect was that a user app would see a connection that only sent part of its data and then returned EOF. Note that a truly bidirectional application-level protocol would still break in this case with an EPIPE/SIGPIPE. However, if the remote peer is just opening a socket, dumping some data into it and then closing it without reading any data, it may close the socket before the RST arrives and thus encounter no errors completely unaware that the data it just sent over TCP was partially (or completely) lost. Note that that can still happen when using the syncache since we may fail to create a socket when expanding a syncache entry due to resource exhaustion giving similarly unpleasant failure semantics (i.e. the remote user app doesn't get an error and has no clue that their data is in fact lost). -- John Baldwin