From owner-freebsd-net@FreeBSD.ORG Thu Sep 28 22:10:27 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B35216A40F for ; Thu, 28 Sep 2006 22:10:27 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 23C3A43D55 for ; Thu, 28 Sep 2006 22:10:25 +0000 (GMT) (envelope-from andre@freebsd.org) Received: (qmail 86885 invoked from network); 28 Sep 2006 22:11:47 -0000 Received: from dotat.atdotat.at (HELO [62.48.0.47]) ([62.48.0.47]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 28 Sep 2006 22:11:47 -0000 Message-ID: <451C4850.5030302@freebsd.org> Date: Fri, 29 Sep 2006 00:10:24 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8b) Gecko/20050217 MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, gallatin@cs.duke.edu Subject: Much improved sosend_*() functions X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Sep 2006 22:10:27 -0000 The recent addition of TSO (TCP Segmentation Offload) has highlighted some shortcommings in our sosend_*() kernel implementation. The current code uses a sosend_copyin() function that loops over the supplied struct uio and does interleaved mbuf allocations and uiomove() calls. I have rewritten m_getm() to be simpler and to allocate PAGE_SIZE sized jumbo mbuf clusters (4k on most architectures) as well as m_uiotombuf() to use the new m_getm() to obtain all mbuf space in one go. It then loops over it an copies the data into the mbufs by using uiomove(). sosend_dgram() and sosend_generic() are change to use m_uiotombuf() instead of sosend_copyin(). Looking at the benchmarks we see some very nice improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) The sender is an AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and the receiver is a DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex. The patch is available here: http://people.freebsd.org/~andre/sosend+m_uiotombuf-20060928.diff Any testing and heavy (code) beating and reviews welcome. -- Andre Here are the raw numbers (netperf at 95% confidence, +-2.5% error margin, the cpu load reported by netperf is different from the one reported by time(1), all performance references are made based on time(1) output, netperf 2.4.2 used): a) is old sosend kernel implementation b) is new sosend kernel implementation 1) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s32K -S32K [non-TSO] 2) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s32K -S32K [TSO] 3) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s64K -S64K [non-TSO] 4) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s64K -S64K [TSO] 5) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s128K -S128K [non-TSO] 6) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s128K -S128K [TSO] Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 1a) 32768 32768 32768 10.00 921.28 28.42 32.48 2.527 2.888 0.007u 1.747s 0:10.00 17.4% 99+5252k 0+0io 0pf+0w 1b) 32768 32768 32768 10.00 921.39 24.51 31.50 2.179 2.801 0.028u 0.768s 0:10.00 7.8% 78+4210k 0+0io 0pf+0w 2a) 32768 32768 32768 10.00 897.63 24.29 37.74 2.216 3.445 0.000u 1.359s 0:10.02 13.4% 96+5152k 5+0io 3pf+0w 2b) 32768 32768 32768 10.00 919.71 15.64 33.01 1.393 2.940 0.008u 0.528s 0:10.00 5.2% 90+4830k 0+0io 0pf+0w 3a) 65536 65536 65536 10.00 941.60 30.90 32.01 2.689 2.785 0.000u 1.827s 0:10.00 18.2% 96+5180k 0+0io 0pf+0w 3b) 65536 65536 65536 10.00 941.59 26.39 32.03 2.296 2.787 0.014u 0.617s 0:10.00 6.2% 101+5362k 0+0io 0pf+0w 4a) 65536 65536 65536 10.00 921.98 26.09 39.47 2.318 3.507 0.000u 1.467s 0:10.02 14.5% 93+5028k 3+0io 0pf+0w 4b) 65536 65536 65536 10.00 938.44 16.24 34.29 1.418 2.993 0.000u 0.511s 0:10.00 5.1% 91+4851k 0+0io 0pf+0w 5a) 131072 131072 131072 10.00 941.62 33.81 33.68 2.941 2.930 0.000u 2.158s 0:10.00 21.5% 97+5247k 0+0io 0pf+0w 5b) 131072 131072 131072 10.00 941.60 28.55 31.65 2.484 2.754 0.000u 0.676s 0:10.00 6.7% 95+5132k 0+0io 0pf+0w 6a) 131072 131072 131072 10.00 922.92 28.72 40.80 2.549 3.621 0.000u 1.713s 0:10.00 17.1% 93+5016k 1+0io 0pf+0w 6b) 131072 131072 131072 10.00 939.14 18.20 34.44 1.587 3.004 0.000u 0.587s 0:10.00 5.8% 78+4197k 1+0io 0pf+0w