Date: Sat, 23 Sep 2006 12:12:55 +0200 From: Andre Oppermann <andre@freebsd.org> To: Robert Watson <rwatson@FreeBSD.org> Cc: alc@freebsd.org, freebsd-net@freebsd.org, freebsd-current@freebsd.org, tegge@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu> Subject: Re: Much improved sendfile(2) kernel implementation Message-ID: <451508A7.8020209@freebsd.org> In-Reply-To: <20060922234708.V11343@fledge.watson.org> References: <4511B9B1.2000903@freebsd.org> <17683.63162.919620.114649@grasshopper.cs.duke.edu> <45145F1D.8020005@freebsd.org> <20060922234708.V11343@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > > On Sat, 23 Sep 2006, Andre Oppermann wrote: > >>> Without patch: >>> 87380 393216 393216 10.00 2163.08 100.00 19.35 3.787 >>> 1.466 Without patch + TSO: >>> 87380 393216 393216 10.00 4367.18 71.54 42.07 1.342 >>> 1.578 With patch: >>> 87380 393216 393216 10.01 1882.73 86.15 18.43 3.749 >>> 1.604 With patch + TSO: >>> 87380 393216 393216 10.00 6961.08 47.69 60.11 0.561 >>> 1.415 > > The impact of TSO is clearly dramatic, especially when combined with the > patch, but I'm a bit concerned by the drop in performance in the patched > non-TSO case. For network cards which will always have TSO enabled, > this isn't an issue, but do we see a similar affect for drivers without > TSO? What can we put this drop down to? If you look at my GigE numbers there is no drop for the new-sendfile w/o TSO case. In this 10Gig test the drop is really and artifact of how the whole setup and the way netperf makes use of the sendfile call. Internally new-sendfile waits until 50% of the socket buffer are free to be bulk filled again. This value can be modified by setting a low watermark on the send socket buffer. Netperf does buffer sized sendfile invocations and this is very timing critical with 10G. Which gives this picture: call sendfile(380K) -> fill socket buffer -> wait -> fill rest -> return -> call sendfile(380K) ... Not to mention all the additional work tcp_output() has to do w/o TSO. Especially with large buffers it has to loop over the mbuf chain for each packet to find out where to start copying. And besides there is no point in having a non-TSO capable interface at above 1-2Gbit. Not even Linux can keep up there. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?451508A7.8020209>