From owner-freebsd-current@FreeBSD.ORG Wed Sep 15 15:48:08 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3E06106564A for ; Wed, 15 Sep 2010 15:48:07 +0000 (UTC) (envelope-from oppermann@networx.ch) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 555BE8FC14 for ; Wed, 15 Sep 2010 15:48:06 +0000 (UTC) Received: (qmail 72458 invoked from network); 15 Sep 2010 15:42:53 -0000 Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 15 Sep 2010 15:42:53 -0000 Message-ID: <4C90EAB7.2000902@networx.ch> Date: Wed, 15 Sep 2010 17:48:07 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: "Bjoern A. Zeeb" References: <4C8E0C1E.2020707@networx.ch> <20100915151632.E31898@maildrop.int.zabbadoz.net> In-Reply-To: <20100915151632.E31898@maildrop.int.zabbadoz.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Wed, 15 Sep 2010 17:08:05 +0000 Cc: freebsd-net@freebsd.org, freebsd-current@freebsd.org Subject: Re: TCP loopback socket fusing X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 15:48:08 -0000 On 15.09.2010 17:19, Bjoern A. Zeeb wrote: > On Mon, 13 Sep 2010, Andre Oppermann wrote: > > Hey, > >> When a TCP connection via loopback back to localhost is made the whole >> send, segmentation and receive path (with larger packets though) is still >> executed. This has some considerable overhead. >> >> To short-circuit the send and receive sockets on localhost TCP connections >> I've made a proof-of-concept patch that directly places the data in the >> other side's socket buffer without doing any packetization and other protocol >> overhead (like UNIX domain sockets). The connections setup (SYN, SYN-ACK, >> ACK) and shutdown are still handled by normal TCP segments via loopback so >> that firewalling stills works. The actual payload data during the session >> won't be seen and the sequence numbers don't move other than for SYN and FIN. >> The sequence are remain valid though. Obviously tcpdump won't see any data >> transfers either if the connection has fused sockets. >> >> Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable >> operation and a rough doubling of the throughput on loopback connections. >> I've tested most socket teardown cases and it behaves fine. I'm not entirely >> sure I've got all possible path's but the way it is integrated should properly >> defuse the sockets in all situations. > > Three comments in reverse order: > > 1 If S/S+A/A and shutdown aren't shortcut, can you always rely on proper > payload order, especially in the shutdown case? Yes. The payload is always directly placed in the receive socket buffer of the other socket, never in the send buffer. There is never any unsent data left in the send buffer that could become reordered. > 2 Given my experience with epairs, which are basically a loop with two > interfaces and even interface queues, any significant delay you are > seeing is _not_ due to longer code paths through the stack but > simply because of the netisr. I haven't measured delay, only bandwidth. And that's with WITNESS and INVARIANTS enabled. You are probably right, the netisr is taking its toll. Especially the TCP_INFO lock may have some contention in the loopback case on SMP. Though a lot of mbuf allocations, packet manipulations and instructions (instruction cache) are avoided by fusing the sockets together. > 3 If properly doing this for TCP, we should probably also do it for > other protocols. UNIX domain sockets already do this. This implementation is particular for TCP and only touches the protocol specific parts. It's not done at the socket layer. For UDP it's not that easy to do as most UDP connections are one-off packets and no permanent binding between two sockets exists. For SCTP I don't know. From glancing over the code it seems they have, at least partially, their own socket buffer code. How difficult a fused socket there would be I can't say. -- Andre