From owner-freebsd-net@FreeBSD.ORG Sat Mar 3 22:30:53 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9A4E816A401; Sat, 3 Mar 2007 22:30:53 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4A44613C478; Sat, 3 Mar 2007 22:30:53 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9EAAA46CE9; Sat, 3 Mar 2007 17:30:52 -0500 (EST) Date: Sat, 3 Mar 2007 22:30:52 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Andre Oppermann In-Reply-To: <45E8276D.60105@freebsd.org> Message-ID: <20070303222356.S60688@fledge.watson.org> References: <45E8276D.60105@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, gallatin@freebsd.org, freebsd-current@freebsd.org, kmacy@freebsd.org Subject: Re: New optimized soreceive_stream() for TCP sockets, proof of concept X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2007 22:30:53 -0000 On Fri, 2 Mar 2007, Andre Oppermann wrote: > Instead of the unlock-lock dance soreceive_stream() pulls a properly sized > (relative to the receive system call buffer space) from the socket buffer > drops the lock and gives copyout as much time as it needs. In the mean time > the lower half can happily add as many new packets as it wants without > having to wait for a lock. It also allows the upper and lower halfs to run > on different CPUs without much interference. There is a unsolved nasty race > condition in the patch though. When the socket closes and we still have data > around or the copyout failed it tries to put the data back into the socket > buffer which is gone already by then leading to a panic. Work is underway > to find a realiable fix for this. I wanted to get this out to the community > nonetheless to give it some more exposure. I'll try to take a look at this in the next few days. However, I find the description above of soreceive() a bit odd -- I'm pretty sure it doesn't do some of the things you're describing. For example, soreceive() does release the locks acquired by the network input processing path while copying to user space: there should be no contention during the copyout(), only while processing the socket buffer between copyout() calls. This is possible because the socket receive sleep lock (not the mutex) holds sb_mb constant if it is non-NULL, making copyout() of sb_mb->m_data safe while not holding the socket buffer mutex in the current implementation. In my experience, soreceive() is an incredibly complicated function, and could stand significant simplification. However, it has to be done very carefully for exactly this reason :-). There are some existing bugs in soreceive(), one involving incorrect handling of interlaced I/O due to a label being in the wrong place, that we should resolve. BTW, the point of not pulling the data out of the socket buffer until copyout() is complete is not error handling reversion so much as not changing the advertised window size until the copy is done, since the data isn't delivered to user space. Copyout() can take a very long time to run, due to page faults, for example, and the socket buffer represents a maximum bound on in-flight traffic as specified by the application. Whether this is a property we want to keep is another question, but I believe that's the rationale. Robert N M Watson Computer Laboratory University of Cambridge > > The patch is here: > > http://people.freebsd.org/~andre/soreceive_stream-20070302.diff > > Any testing, especially on 10Gig cards, and feedback appreciated. > > -- > Andre > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >