Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Sep 2006 12:12:55 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        alc@freebsd.org, freebsd-net@freebsd.org, freebsd-current@freebsd.org, tegge@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu>
Subject:   Re: Much improved sendfile(2) kernel implementation
Message-ID:  <451508A7.8020209@freebsd.org>
In-Reply-To: <20060922234708.V11343@fledge.watson.org>
References:  <4511B9B1.2000903@freebsd.org> <17683.63162.919620.114649@grasshopper.cs.duke.edu> <45145F1D.8020005@freebsd.org> <20060922234708.V11343@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote:
> 
> On Sat, 23 Sep 2006, Andre Oppermann wrote:
> 
>>> Without patch:
>>>  87380 393216 393216    10.00      2163.08   100.00   19.35    3.787 
>>> 1.466 Without patch + TSO:
>>>  87380 393216 393216    10.00      4367.18   71.54    42.07    1.342 
>>> 1.578 With patch:
>>>  87380 393216 393216    10.01      1882.73   86.15    18.43    3.749 
>>> 1.604 With patch + TSO:
>>>  87380 393216 393216    10.00      6961.08   47.69    60.11    0.561 
>>> 1.415
> 
> The impact of TSO is clearly dramatic, especially when combined with the 
> patch, but I'm a bit concerned by the drop in performance in the patched 
> non-TSO case.  For network cards which will always have TSO enabled, 
> this isn't an issue, but do we see a similar affect for drivers without 
> TSO?  What can we put this drop down to?

If you look at my GigE numbers there is no drop for the new-sendfile w/o
TSO case.  In this 10Gig test the drop is really and artifact of how the
whole setup and the way netperf makes use of the sendfile call.  Internally
new-sendfile waits until 50% of the socket buffer are free to be bulk
filled again.  This value can be modified by setting a low watermark on
the send socket buffer.  Netperf does buffer sized sendfile invocations
and this is very timing critical with 10G.  Which gives this picture:
call sendfile(380K) -> fill socket buffer -> wait -> fill rest -> return ->
call sendfile(380K) ...  Not to mention all the additional work tcp_output()
has to do w/o TSO.  Especially with large buffers it has to loop over the
mbuf chain for each packet to find out where to start copying.  And besides
there is no point in having a non-TSO capable interface at above 1-2Gbit.
Not even Linux can keep up there.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?451508A7.8020209>