Date: Thu, 27 Feb 2014 11:32:47 +0100 From: Julien Charbon <jcharbon@verisign.com> To: freebsd-net@freebsd.org Subject: Re: TCP stack lock contention with short-lived connections Message-ID: <len481$sfv$2@ger.gmane.org> In-Reply-To: <op.w56mamc0ak5tgc@dul1rjacobso-l3.vcorp.ad.vrsn.com> References: <op.w51mxed6ak5tgc@fri2jcharbon-m1.local> <op.w56mamc0ak5tgc@dul1rjacobso-l3.vcorp.ad.vrsn.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Hi, On 07/11/13 14:55, Julien Charbon wrote: > On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon > <jcharbon@verisign.com> wrote: >> just a follow-up of vBSDCon discussions about FreeBSD TCP >> performances with short-lived connections. In summary: <snip> >> >> I have put technical and how-to-repeat details in below PR: >> >> kern/183659: TCP stack lock contention with short-lived connections >> http://www.freebsd.org/cgi/query-pr.cgi?pr=183659 >> >> We are currently working on this performance improvement effort; it >> will impact only the TCP locking strategy not the TCP stack logic >> itself. We will share on freebsd-net the patches we made for >> reviewing and improvement propositions; anyway this change might also >> require enough eyeballs to avoid tricky race conditions introduction >> in TCP stack. Just a follow-up on this TCP performance improvements task: 1. Our first related patch has been applied in HEAD: Decrease lock contention within the TCP accept case by removing the INP_INFO lock from tcp_usr_accept. http://svnweb.freebsd.org/base?view=revision&revision=261242 Thanks to reviewers. 2. We studied an another lock contention related to INP_INFO when TCP connections in TIME_WAIT state are cleaned-up. The context: This lock contention was found when checking why our Intel 10G was dropping packets in reception even with plenty of free bandwidth. To study this issue we computed the distribution of TCP connection time lengths (i.e. time between the first SYN and the last ACK for a given TCP session) at the maximum rate of TCP connections per second _without_ a single packet drop using FreeBSD 10.0-RELEASE (see joined conntime-bsd10.0-release.pdf) In this graph, in X you have time in second where the SYN was received, and in Y you have TCP session duration in millisecond (i.e. the difference between first SYN received time and last ACK send time). As you can see at some point every 500ms the TCP connection time raises in spikes. This periodicity led us to tcp_slowtimo() that calls tcp_tw_2msl_scan() with the INP_INFO lock taken. Our theory is: Every 500ms there is a competition for the INP_INFO lock between tcp_slowtimo() and tcp_input(), and tcp_input() is indeed directly called by the NIC RX interruption handler. Then, during the whole duration of tcp_tw_2msl_scan() call, packets are no more dequeued from NIC RX rings, and once all RX rings are full the NIC starts to drop packets in reception. The calculus of time needed to filled-up all available RX rings descriptors follows this theory: - 40k TCP connections per second (with 5 packets received per TCP connection) = 200k packets per second - We use 4 RX queues of 2048 descriptors each = 8192 RX descriptors overall Time to filled-up all available NIC descriptors at 200k packet per second: 8192/200000 = 40.96 milliseconds Which is coherent with what we see on the conntime-bsd10.0-release.pdf graph. To confirm this theory, we introduced a new lock (see joined patch tw-lock.patch) to protect the TIME_WAIT global list instead of using INP_INFO, and now the TIME_WAIT states are cleanup one by one in order to prioritize the NIC interruption handler against tcp_tw_2msl_scan(). See joined conntime-bsd10.0-patched.pdf: No more spikes and the maximum TCP connection rate without dropping a single packet becomes: - FreeBSD 10.0: 40k - FreeBSD 10.0 + patch: 53k Obviously, to mitigate this lock contention there are various solutions: - Introduce a new time-wait lock as proposed in joined patch - Call tcp_tw_2msl_scan() more often in case of high workload - Use INP_INFO_TRY_WLOCK() in tcp_tw_2msl_scan() to clean-up time-wait objects only when nobody else handles INP_INFO lock - Etc. The strategy being to prioritize packet reception over time-wait objects cleaned-up as: - we hate dropping packet in reception when the bandwidth is far from being full - the maximum of used time-wait objects is configurable (net.inet.tcp.maxtcptw) - in case of time-wait objects memory exhaustion, the current behavior is already optimal: The oldest time-wait object is recycled and directly reused. We picked the time-wait lock way because it suits well our long-term strategy to completely mitigate the INP_INFO lock contention everywhere in TCP stack. Any thoughts on this particular behavior? -- Julien [-- Attachment #2 --] %PDF-1.3 1 0 obj << /Pages 2 0 R /Type /Catalog >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /XObject << /Im0 8 0 R >> /ProcSet 6 0 R >> /MediaBox [0 0 640 480] /CropBox [0 0 640 480] /Contents 4 0 R /Thumb 11 0 R >> endobj 4 0 obj << /Length 5 0 R >> stream q 640 0 0 480 0 0 cm /Im0 Do Q endstream endobj 5 0 obj 31 endobj 6 0 obj [ /PDF /Text /ImageI ] endobj 7 0 obj << >> endobj 8 0 obj << /Type /XObject /Subtype /Image /Name /Im0 /Filter [ /FlateDecode ] /Width 640 /Height 480 /ColorSpace 10 0 R /BitsPerComponent 8 /Length 9 0 R >> stream x뺫*Ea ۀr3Y9&.aZ@%! @Rͮc0*Rolbw~osf$e$P# M+\IOPcw~o'6e1t ASЃP{̿j0'٩A]ʟ_?PU?v>?C2rnS~@O?=`wv|T/a?Ë|;ΥZ<p{+.g=Vue<DUjU*u$ װϾH+1q.y5bfNڟ7*9}XsW'yE?u\z?MnVq"2xUlrA~WJ7KNe~4<cOůdOqK(dz<u#i(^'KGSry8w@TꏳcOկO&oGgg%j k(x%g~'2pEf̪k.[oM§l[-)6(S]Sȣ2L 73로dz\KYvq<%Sqڿ"t?mբg vD8 ?2ed R[s=~usSU$2de%u]@sȂm䃳5l0s]m[%6S/E< .5ԟe~Zd-8;.tCں9e7hгemB^wc!I)Yñc/t~⪿5<G\5rQ7nC9}=HQF]\gd/yP)t\_;>w>nr,M:>;,;g s܍?? *o]g=vN?]+GA8ǼG 0EoC/`Sulwa};-xlt9%zxV6ƥrU^{{SL[z?Idw1)MoFLõmfSk{b,q?n=MN>VΆ 5??3gҿ0}ʼwn麗K9:oDTz1v|yxLUs幵-SצtLKzy8(f@O[A~`)2ݎgt`[V'R}({ˮ쓈&}: =@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O? K ԟ"H&v#DTGpJƣ]gZ5ǿJ1H@e{źQZب^0AG ?=@O?P)eV:|[yeI?!zUVʓXn=@!ƿ o;esvYZ'?1$xzR|D?J T'3D >5`~ SCHKӀ s sXp<_@f0zMHo{5=!2&?a7 /kAzR WۿW6i#@6 0%v_w1?@_A0qd̬^tcߨށϣXzb?oIoAX<4K)`5G_*Hl 2"ƿ^D |o O [@_qlU2_? =G[FwFFB/Y|oxJ TOPư`N&ϿHyh|! ٘dz9 _gD FT(#3VjwW"+=WH3N/OA0*_!JsGPwRdhJD^ʹB3"G)_j<1,2egL{"Pbw(BO, Bߘ|#zW>Gie+cGxK?PY% ?{}|L{X9Ƹ@38{HO g|A2R3282TY7wmgkC?pԽhA\+-OÏlUy8G;1U߷wIf<Jw& 0*(&L@`k/kɶqU)X07wܤ"n+R}O`Q̵DuB Sc<!co`ol&iԥW*6vqBŨR㓴@G@p/B7g+Xk^ -ןCtgT T]˛e76̱ݢ߫oڷF^#Ӟnzͬ~@ȝ'D3 ̯?16Qto@̠?SeMQs3t3el37 5hy;`15Ub!߯S|kTcA3)hl6g'#T1L87u7l?I(?N)ActƯ0Q.fu6qC@{14_?RōuƸ3Fdڸ88&8t%W|WEFpLl::SK 4q6_UHoDd2*O!FQbi*o463Tf䠿b|CW-uprAOC#_n<] W*/{y{jS[]8(w7jf?:RammaK*Mw%]0I/ooTi8߆cvG~ ؿz+W!Fi7vtG=]<8g<_1[ΈEzRc-/ua_\_1ǿShA2<I$7LQ \q@e x*ϟp# T TG#9qt KGPǟƾ/;i\2KN@kS0xTye#pV0U,M@`iVB%i>FL; F!_q])ߞY}Ɩc=AӮ|0n/Q'dnB&Kb4T?>-_9rAh ??7F? Ū5B#k Z3J ix+ ff0?@WL4xC5u\JGYḧ́&/";6S+GWLNC88ߝC1h wگE!-"4U:wsDCX^#P<7@ -PŞ~ R'nǚatgT0<֗16y_938T 1_ 7YG _N `<6 QaN߳oWڻꆰ;ZxrG~_t+O;?we,ҦvQQuD;e@EimƆU RGȡ7`B2>=7 aW#<aP_v';;q$K #H@h(>
