From owner-freebsd-current@FreeBSD.ORG Wed Sep 15 15:20:06 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA673106564A; Wed, 15 Sep 2010 15:20:06 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [IPv6:2001:4068:10::3]) by mx1.freebsd.org (Postfix) with ESMTP id 840EF8FC0C; Wed, 15 Sep 2010 15:20:06 +0000 (UTC) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id D6B4941C7AD; Wed, 15 Sep 2010 17:20:05 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([192.168.74.103]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id aRXBqrc5MKsX; Wed, 15 Sep 2010 17:20:05 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id 5B12441C76D; Wed, 15 Sep 2010 17:20:05 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id B40CE4448F3; Wed, 15 Sep 2010 15:19:50 +0000 (UTC) Date: Wed, 15 Sep 2010 15:19:50 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Andre Oppermann In-Reply-To: <4C8E0C1E.2020707@networx.ch> Message-ID: <20100915151632.E31898@maildrop.int.zabbadoz.net> References: <4C8E0C1E.2020707@networx.ch> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-current@freebsd.org Subject: Re: TCP loopback socket fusing X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 15:20:06 -0000 On Mon, 13 Sep 2010, Andre Oppermann wrote: Hey, > When a TCP connection via loopback back to localhost is made the whole > send, segmentation and receive path (with larger packets though) is still > executed. This has some considerable overhead. > > To short-circuit the send and receive sockets on localhost TCP connections > I've made a proof-of-concept patch that directly places the data in the > other side's socket buffer without doing any packetization and other protocol > overhead (like UNIX domain sockets). The connections setup (SYN, SYN-ACK, > ACK) and shutdown are still handled by normal TCP segments via loopback so > that firewalling stills works. The actual payload data during the session > won't be seen and the sequence numbers don't move other than for SYN and FIN. > The sequence are remain valid though. Obviously tcpdump won't see any data > transfers either if the connection has fused sockets. > > Preliminary testing (with WITNESS and INVARIANTS enabled) has shown stable > operation and a rough doubling of the throughput on loopback connections. > I've tested most socket teardown cases and it behaves fine. I'm not entirely > sure I've got all possible path's but the way it is integrated should > properly > defuse the sockets in all situations. Three comments in reverse order: 1 If S/S+A/A and shutdown aren't shortcut, can you always rely on proper payload order, especially in the shutdown case? 2 Given my experience with epairs, which are basically a loop with two interfaces and even interface queues, any significant delay you are seeing is _not_ due to longer code paths through the stack but simply because of the netisr. 3 If properly doing this for TCP, we should probably also do it for other protocols. /bz -- Bjoern A. Zeeb Welcome a new stage of life.