From owner-freebsd-hackers@FreeBSD.ORG Mon Apr 21 20:33:57 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7C2C937B401; Mon, 21 Apr 2003 20:33:57 -0700 (PDT) Received: from adsl-63-198-35-122.dsl.snfc21.pacbell.net (adsl-63-198-35-122.dsl.snfc21.pacbell.net [63.198.35.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9ED9743FB1; Mon, 21 Apr 2003 20:33:55 -0700 (PDT) (envelope-from j_guojun@lbl.gov) Received: from lbl.gov (localhost.pacbell.net [127.0.0.1]) ESMTP id h3M2ZW8l000382; Mon, 21 Apr 2003 19:35:38 -0700 (PDT) (envelope-from j_guojun@lbl.gov) Sender: jin@adsl-63-198-35-122.dsl.snfc21.pacbell.net Message-ID: <3EA4AA74.F9993276@lbl.gov> Date: Mon, 21 Apr 2003 19:35:32 -0700 From: "Jin Guojun [NCS]" X-Mailer: Mozilla 4.76 [en] (X11; U; FreeBSD 4.8-RELEASE i386) X-Accept-Language: zh, zh-CN, en-US, en MIME-Version: 1.0 To: bj@dc.luth.se References: <200304210827.h3L8Rx2F032265@dc.luth.se> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org Subject: Re: patch for test (Was: tcp_output starving -- is due to mbuf get delay?) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Apr 2003 03:33:58 -0000 It is hard to compare your netstat output due to short NetBSD output. In NetBSD, it is too short to know if TCP output has been saturated (reach the maximum packet/sec?) at second 3,it reached 18.6Kpkt/s or 28KB/s, which means MTU = 28KB / 18.5 = 1500, right? The FreeBSD seems did better job, at second 2, it already reached 72KB/s. The pkt/s is low because you had Jumbo frame. The net.inet.tcp.liondmask=7 has doubled your TCP window from 1,314,022 to 2,204,667, which is a fully opened cwnd for 22ms + 1Gb/s path. There is nothing to be better than that. The only thing left to chew your CPU is the memory copy. On my web page, it shows that we have reduced all mbuf chain overhead, but there is still a second memory copy overhead, which is can be also reduced. However, this is no a patch any more. It requires to modify mbuf operation. I am BCC this to core@freebsd.org, but I am not sure it will get throughput. To reduce the second memory copy, the mbuf structure needs to have another flag -- EOP -- end of packet. At this point, in xxx_usr_send(), we can simply copy each t_maxseg to a mbuf chain, and set EOP bit in the mbuf flags, then chain this mbuf into the sb_mb. At the tcp_output(), where I modified the mbuf chain for m_copydata & m_copy, get rid of these two copy routines, and simply hand the m to the if_queue. Since we just pass the handle, when NIC driver passes the mbuf to the m_free, m_free will do nothing for these mbufs since EOP is set. Therefore, we will reduce the mbuf operation on both en_queue and m_free. This will left only one memory copy. For a system with 64-bit PCI chipset, this will not be a bottleneck at all. Of course, we can further reduce this one (to make zero copy TCP). Lock (wire) down the user buffer, and simply assign the user space to the mcluster as E_USR_EXT. This may be completely make sense since future computer will have large memory (at least 1GB), where rarely have some applications write a buffer large than 1MB at once (typically 64KB up to 640KB). So lock down 0.1% of total system memory is not a bad thing. If you want even better, the new TCP (Lion) stack is going for that goal, but it will not be available till it is stabilized. As Terry mentioned, now, you may try to play the NetBSD TCP stack first since you have seen that NetBSD does a better job, and provide some feedback. -Jin Borje Josefsson wrote: > On Sun, 20 Apr 2003 13:12:42 PDT "Jin Guojun [NCS]" wrote: > > > Now the patch is ready. It has been tested on both 4.7 and 4.8. > > For 4.7, one has to manually add an empty line before the comment prior to the > > tcp_output() routine. > comment for beginning the tcp_output() in 4.7-RELEASE :-( > > > > > Some more hints for tracing: (net.inet.tcp.liondmask is a bitmap) > > bit 0-1 (value 1, 2, or 3) is for enabling tcp_output() mbuf chain modification > > bit 2 (value 4) is for enabling sbappend() mbuf chain modification > > bit 3 (value 8) is for tcp_input (DO NOT TRY IT, it is not ready). > > > > bit 9 (value 512) is for enabling check routine (dump errors to /var/log/messag). > > > > If you do have problem, set net.inet.tcp.liondmask to 512 and look what message says. > > If you would like to know which part causing problem or not working properly, > > set net.inet.tcp.liondmask to 1, 2, 3 or 4 to test individual module. > > Thanks!! > > This patch definitively works, and gives much higher PPS (32000 instead of > 19000). This is on a low-end system (PIII 900MHz with 33MHz bus), I'll > test one of my larger systems later today. > > One question though - is there any way of having the code being more > "aggressive"? As You see, in the netstat output below, it takes ~35 > seconds(!) before reaching full speed. On NetBSD I reach maxPPS almost > immediately. Even if we now (with Your patch) can utilize the hardware > much more, it only helps if You have connections that lasts for a very > long time, so that the "ramping" time is not significant. > > *Note* (the very last output below) that this seems to be highly dependant > on RTT. On a 2ms connection (~50 miles) I reach max RTT almost > immediately. (can't explain why I go to 51kpps and then fall back to > 35kpps, this is repeatable). > > Apart from vanilla 4.8R I have set: > > kern.ipc.maxsockbuf=8388608 > net.inet.tcp.sendspace=3217968 > net.inet.tcp.recvspace=3217968 > kern.ipc.nmbclusters=8192 > > And this test is done on a connection with RTT in the order of 22 ms. > > --Börje > > =========== "netstat 1" **on NetBSD** (for comparation) ===== > > bge0 in bge0 out total in total out > packets errs packets errs colls packets errs packets > 1 0 1 0 0 1 0 1 > 7118 0 11315 0 0 7118 0 11315 > 18604 0 28014 0 0 18604 0 28014 > 18610 0 28005 0 0 18611 0 28005 > > (NOTE that this example is using larger MTU, and not on the same hardware > as below, but the behaviour of reaching maxPPS "immediately" is the same) > > =========== "netstat 1" with liondmask=7 ================ > > input (Total) output > packets errs bytes packets errs bytes colls > 6 0 540 3 0 228 0 > 37 0 2712 56 0 72216 0 > 646 0 42636 823 0 1244686 0 > 1548 0 102168 1966 0 2975188 0 > 2432 0 160512 3039 0 4604252 0 > 3301 0 217866 4193 0 6345352 0 > 4174 0 275484 5254 0 7950192 0 > 5011 0 330726 6373 0 9650414 0 > 5836 0 385176 7448 0 11271908 0 > 6675 0 440550 8519 0 12896430 0 > 7528 0 496848 9596 0 14527008 0 > 8408 0 554928 10626 0 16089456 0 > 9212 0 607992 11652 0 17636764 0 > 9962 0 657492 12698 0 19223436 0 > 10699 0 706134 13694 0 20731380 0 > 11368 0 750288 14648 0 22175736 0 > 12144 0 801504 15697 0 23768464 0 > 12802 0 844932 16693 0 25267324 0 > 13412 0 885192 17552 0 26576934 0 > 14001 0 924066 18495 0 28001608 0 > 14444 0 953304 19415 0 29384230 0 > 15041 0 992706 20275 0 30701070 0 > 15681 0 1034946 21327 0 32283200 0 > 16224 0 1070784 22202 0 33610978 0 > 16621 0 1096986 22888 0 34651096 0 > 17050 0 1125300 23568 0 35682130 0 > 17721 0 1169586 24573 0 37200672 0 > 18256 0 1204896 25361 0 38401274 0 > 18782 0 1239612 26128 0 39550400 0 > 19359 0 1277694 26972 0 40834272 0 > 20150 0 1329900 28015 0 42413374 0 > 20900 0 1379400 28962 0 43854702 0 > 21523 0 1420518 30024 0 45447430 0 > 22256 0 1468896 30891 0 46767638 0 > 22882 0 1510212 31655 0 47924334 0 > 23087 0 1523742 31865 0 48243788 0 > 23225 0 1532850 32038 0 48502682 0 > > It seems that I reach the limit about here - 35-36 sec after start > > 23170 0 1529220 32121 0 48629858 0 > 23223 0 1532718 32036 0 48501168 0 > 23200 0 1531200 32121 0 48629858 0 > 23103 0 1524792 32122 0 48631372 0 > 23104 0 1524864 32080 0 48565096 0 > 23214 0 1532124 32079 0 48566270 0 > 23147 0 1527696 32036 0 48501168 0 > 10318 0 680988 13543 0 20495142 0 > 1 0 66 1 0 178 0 > 1 0 66 1 0 178 0 > > =========== "netstat 1" with liondmask=7 ================ > > With plain 4.8 (liondmask=0) I get: > > root@stinky 8# netstat 1 > input (Total) output > packets errs bytes packets errs bytes colls > 7 0 732 10 0 2394 0 > 437 0 28842 556 0 840448 0 > 1343 0 88638 1669 0 2531586 0 > 2201 0 145266 2757 0 4166706 0 > 3082 0 203406 3857 0 5841190 0 > 4021 0 265386 4959 0 7503562 0 > 4877 0 321882 6017 0 9111430 0 > 5621 0 370986 7064 0 10690532 0 > 6471 0 427086 8136 0 12319596 0 > 7216 0 476256 9177 0 13889614 0 > 8006 0 528396 10181 0 15415726 0 > 8725 0 575850 11215 0 16975146 0 > 9482 0 625812 12259 0 18561818 0 > 10205 0 673530 13258 0 20071276 0 > 10846 0 715836 14115 0 21365746 0 > 11563 0 763158 15223 0 23046286 0 > 12399 0 818334 16266 0 24628416 0 > 13024 0 859584 17119 0 25913802 0 > 13609 0 898194 17949 0 27173450 0 > 14316 0 944856 18798 0 28458836 0 > 14391 0 949806 18842 0 28522764 0 > 14463 0 954558 19010 0 28779804 0 > > Here I reach the limit after 20 seconds. > > 14500 0 957000 19095 0 28908494 0 > 14534 0 959244 19053 0 28844906 0 > 14599 0 963534 19052 0 28843392 0 > 14526 0 958716 19053 0 28844906 0 > 14484 0 955944 18967 0 28714702 0 > 14330 0 945780 18968 0 28716216 0 > 14581 0 962346 19137 0 28972082 0 > 14531 0 959046 19180 0 29037184 0 > 14465 0 954690 19095 0 28908494 0 > 14514 0 957924 19095 0 28908494 0 > 14403 0 950598 19095 0 28908494 0 > 14493 0 956538 19052 0 28843392 0 > 14544 0 959904 19095 0 28908494 0 > 14546 0 960036 19095 0 28908494 0 > 14558 0 960828 19095 0 28908494 0 > 14559 0 960894 19053 0 28844906 0 > 14597 0 963402 19094 0 28906980 0 > 14509 0 957594 19053 0 28844906 0 > 14527 0 958782 19137 0 28972082 0 > 14576 0 962016 19139 0 28973936 0 > 14575 0 961950 19096 0 28908494 0 > 14578 0 962148 19052 0 28843392 0 > 14519 0 958254 18968 0 28716216 0 > 14579 0 962214 19052 0 28843392 0 > 14533 0 959178 19095 0 28908494 0 > 14588 0 962808 19137 0 28972082 0 > 14503 0 957198 19053 0 28844906 0 > 14580 0 962280 19095 0 28908494 0 > 14479 0 955614 18968 0 28716216 0 > 14477 0 955482 19052 0 28843392 0 > 14618 0 964788 19137 0 28972082 0 > 14569 0 961554 19053 0 28844906 0 > 14586 0 962676 19095 0 28908494 0 > 4462 0 294492 5438 0 8224172 0 > > ============ "netstat 1" with liondmask on a 2ms RTT connection ==== > > root@stinky 17# netstat 1 > input (Total) output > packets errs bytes packets errs bytes colls > 2 0 132 2 0 0 0 > 3908 0 258086 7004 0 10856439 0 > 29353 0 1937298 51940 0 78631282 0 > 29317 0 1934922 51911 0 78629768 0 > 29344 0 1936704 51894 0 78502592 0 > 29340 0 1936440 51841 0 78501078 0 > 29298 0 1933668 51860 0 78567694 0 > 29376 0 1938816 51947 0 78629768 0 > 29344 0 1936704 51928 0 78566180 0 > 20988 0 1385208 37580 0 56660114 0 > 19687 0 1299336 35473 0 53704786 0 > 19705 0 1300530 35431 0 53641198 0 > 19705 0 1300530 35431 0 53641198 0 > 19670 0 1298220 35346 0 53512508 0 > 19680 0 1298880 35388 0 53576096 0