Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Feb 2014 11:32:47 +0100
From:      Julien Charbon <jcharbon@verisign.com>
To:        freebsd-net@freebsd.org
Subject:   Re: TCP stack lock contention with short-lived connections
Message-ID:  <len481$sfv$2@ger.gmane.org>
In-Reply-To: <op.w56mamc0ak5tgc@dul1rjacobso-l3.vcorp.ad.vrsn.com>
References:  <op.w51mxed6ak5tgc@fri2jcharbon-m1.local> <op.w56mamc0ak5tgc@dul1rjacobso-l3.vcorp.ad.vrsn.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

  Hi,

On 07/11/13 14:55, Julien Charbon wrote:
> On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon
> <jcharbon@verisign.com> wrote:
>>   just a follow-up of vBSDCon discussions about FreeBSD TCP
>> performances with short-lived connections.  In summary: <snip>
>>
>> I have put technical and how-to-repeat details in below PR:
>>
>> kern/183659: TCP stack lock contention with short-lived connections
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=183659
>>
>>   We are currently working on this performance improvement effort;  it
>> will impact only the TCP locking strategy not the TCP stack logic
>> itself.  We will share on freebsd-net the patches we made for
>> reviewing and improvement propositions;  anyway this change might also
>> require enough eyeballs to avoid tricky race conditions introduction
>> in TCP stack.

  Just a follow-up on this TCP performance improvements task:

  1. Our first related patch has been applied in HEAD:

Decrease lock contention within the TCP accept case by removing
the INP_INFO lock from tcp_usr_accept.
http://svnweb.freebsd.org/base?view=revision&revision=261242

  Thanks to reviewers.

  2. We studied an another lock contention related to INP_INFO when TCP 
connections in TIME_WAIT state are cleaned-up.  The context:

  This lock contention was found when checking why our Intel 10G was 
dropping packets in reception even with plenty of free bandwidth.  To 
study this issue we computed the distribution of TCP connection time 
lengths (i.e. time between the first SYN and the last ACK for a given 
TCP session) at the maximum rate of TCP connections per second _without_ 
a single packet drop using FreeBSD 10.0-RELEASE (see joined 
conntime-bsd10.0-release.pdf)

  In this graph, in X you have time in second where the SYN was 
received, and in Y you have TCP session duration in millisecond (i.e. 
the difference between first SYN received time and last ACK send time).

  As you can see at some point every 500ms the TCP connection time 
raises in spikes.  This periodicity led us to tcp_slowtimo() that calls 
tcp_tw_2msl_scan() with the INP_INFO lock taken.

  Our theory is:  Every 500ms there is a competition for the INP_INFO 
lock between tcp_slowtimo() and tcp_input(), and tcp_input() is indeed 
directly called by the NIC RX interruption handler.  Then, during the 
whole duration of tcp_tw_2msl_scan() call, packets are no more dequeued 
from NIC RX rings, and once all RX rings are full the NIC starts to drop 
packets in reception.

  The calculus of time needed to filled-up all available RX rings 
descriptors follows this theory:

  - 40k TCP connections per second (with 5 packets received per TCP 
connection) = 200k packets per second
  - We use 4 RX queues of 2048 descriptors each = 8192 RX descriptors 
overall

  Time to filled-up all available NIC descriptors at 200k packet per 
second: 8192/200000 = 40.96 milliseconds

   Which is coherent with what we see on the 
conntime-bsd10.0-release.pdf graph.

  To confirm this theory, we introduced a new lock (see joined patch 
tw-lock.patch) to protect the TIME_WAIT global list instead of using 
INP_INFO, and now the TIME_WAIT states are cleanup one by one in order 
to prioritize the NIC interruption handler against tcp_tw_2msl_scan(). 
See joined conntime-bsd10.0-patched.pdf:  No more spikes and the maximum 
TCP connection rate without dropping a single packet becomes:

  - FreeBSD 10.0:  40k
  - FreeBSD 10.0 + patch:  53k

  Obviously, to mitigate this lock contention there are various solutions:

  - Introduce a new time-wait lock as proposed in joined patch
  - Call tcp_tw_2msl_scan() more often in case of high workload
  - Use INP_INFO_TRY_WLOCK() in tcp_tw_2msl_scan() to clean-up time-wait 
objects only when nobody else handles INP_INFO lock
  - Etc.

  The strategy being to prioritize packet reception over time-wait 
objects cleaned-up as:

  - we hate dropping packet in reception when the bandwidth is far from 
being full
  - the maximum of used time-wait objects is configurable 
(net.inet.tcp.maxtcptw)
  - in case of time-wait objects memory exhaustion, the current behavior 
is already optimal:  The oldest time-wait object is recycled and 
directly reused.

  We picked the time-wait lock way because it suits well our long-term 
strategy to completely mitigate the INP_INFO lock contention everywhere 
in TCP stack.

  Any thoughts on this particular behavior?

--
Julien

[-- Attachment #2 --]
%PDF-1.3 
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [ 3 0 R ]
/Count 1
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/XObject << /Im0 8 0 R >>
/ProcSet 6 0 R >>
/MediaBox [0 0 640 480]
/CropBox [0 0 640 480]
/Contents 4 0 R
/Thumb 11 0 R
>>
endobj
4 0 obj
<<
/Length 5 0 R
>>
stream
q
640 0 0 480 0 0 cm
/Im0 Do
Q
endstream
endobj
5 0 obj
31
endobj
6 0 obj
[ /PDF /Text /ImageI ]
endobj
7 0 obj
<<
>>
endobj
8 0 obj
<<
/Type /XObject
/Subtype /Image
/Name /Im0
/Filter [ /FlateDecode ]
/Width 640
/Height 480
/ColorSpace 10 0 R
/BitsPerComponent 8
/Length 9 0 R
>>
stream
x뺫*Ea ۀr3Y9&.aZ@%!@Rͮc0*Rolbw~osf$e$P#M+\IOPcw~o'6e1tASЃP{̿j0'٩A]ʟ_?PU?v>?C2rnS~@O?=`wv|T/a?Ë|;ΥZ<p{+.g=Vue<DUjU*u$ װϾH+1q.y5bfNڟ7*9}XsW'yE?u\z?MnVq"2xUlrA~WJ7KNe~4<cOůdOqK(dz<u#i(^'KGSry8w@TꏳcOկO&oGgg%jk(x%g~'2pEf̪k.[oM§l[-)6(S]Sȣ2L
73로dz\KYvq΢<%Sqڿ"t?mբg	vD8?2ed
R[s=~usSU$2de%u]@sȂm䃳5l0s]m[%6S/E< .5ԟe~Zd-8;.tCں9e7hгemB^wc!I)Yñc/t~⪿5<G\5rQ7nC9}=HQF޴]\gd/yP)t\_;>w>nr,M:>;,;g
s܍??*o]g=vN?]+GA8ǼG
0EoC/`Sulwa};-xlt9%zxV6ƥrU^{{SL[z?Idw1)MoFLõmfSk{b,q?n=MN>VΆ 5??3gҿ0}ʼwn麗K9:oDTz1v|yxLUs幵-SצtLKzy8(f@O[A~`)2ݎgt`[V'R}({ˮ쓈&׈}:=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?=@O?	K
ԟ"H&v#DTGpJƣ]gZ5ǿJ1H@e{źQZب^0AG	?=@O?P)eV:|[yeI?!zUVʓXn=@!ƿ
o;esvYZ'?1$xzR|D?J	T'3D
>5`~ SCHKӀ
s
sXp<_@f0zMHo{5=!2&?a7	/kAzR	WۿW6i#@6
0%v_w1?@_A0qd̬^tcߨށϣXzb?oIoAX<4K)`5G_*Hl 2"ƿ^D	|o
O
[@_qlU2_?

=G[FwFFB/Y|oxJ
TOPư`N&ϿHyh|!
٘dz9 _gD	FT(#3VjwW"+=WH3N/OA0*_!JsGPwRdhJD^ʹB3"G)_j<1,2egL{"Pbw(BO,Bߘ|#zW>Gie+cGxK?PY%
?{}|L{X9Ƹ@38{HO
g|A2R3282TY7wmgkC?pԽhA\+-OÏlUy8G;1U߷wIf<Jw&	0*(&L@`k/kɶqU)X07wܤ"n+R}O`Q̵DuB
  Sc<!co`ol&iԥW*6vqBŨR㓴@G@p/B7g+Xk^-ןCtgT T]˛e76̱ݢ߫oڷF^#Ӟnzͬ~@ȝ'D3̯?16Qto@̠?SeMQs3t3el37
5hy;`15Ub!߯S|kTcA3)hl6g'#T1L87u7l?I(?N)ActƯ0Q.fu6qC@{14_?RōuƸ3Fdڸ88&8t%W|WEFpLl::SK 4q6_UH׈oDd2*O!FQbi*o463Tf䠿b|CW-uprAOC#_n<]	W*/{y{jS[]8(w7jf?:RammaK*Mw%]0I/ooTi8߆cvG~ؿz+W!Fi7vtG=]<8g<_1[ΈEzRc-/ua_\_1ǿShA2<I$7LQ
\q@e
x׈*ϟp#T	TG#9qt	KGPǟƾ/;i\2KN@kS0xTye#pV0U,M@`iVB%i>FL;F!_q])ߞY}Ɩc=AӮ|0n/Q'dnB&Kb4T?>-_9rAh??7F?Ū5B#kZ3Jix+ff0?@WL4xC5u\JGYḧ́&/";6S+GWLNC88ߝC1h wگE!-"4U:wsDCX^#P<7@ -PŞ~
R'nǚatgT0<֗16y_938T	1_7YG	_N	`<6 QaN߳oWڻꆰ;ZxrG~_t+O;?we,ҦvQQuD;e@EimƆU
RGȡ7`B2>=7
aW#<aP_v';;q$K #H@h(>+PB~_IGGIИ4@mhc $7ˤw;iTxbpl`Пv	^
cl$iI|!%͇st$FLұXq	3՟ox~*︜=62c/u仃VZ65(VҀXns?:ĦNzcu"+H B_AZOM3x
WaC<c^ 0+HlM3 Q}+X:B_AZ2_
+H?2O8љ @4
3_AZWya|7@iRQ-
RgMp9_+Zqa~|G@ɝ=h-
<+H0C_A_ѣd8V߯<+I~/g@e@74s#_AH
gd(҄
^*wyQ	gWÈ94vW:jȶ3+Hwu̠|F_Г9~-_
Wc!A~Q|Ww@Ab@kRxWU^8][>TZcP-RU9P4WSfBs	gP.׈*fUM75bM[/7AH*ӯKs뗣Oa$>isVY:_
j8`-jN.7饿W=׏k7?b/rP__X!Zl˳nI]ܶ=rI	%d(2X‹|rIR$Q	9/!,|wg)GzbwY.4W$*:_R,jʋky!C
w'ŕdk`qr%4]-rdl*$؃Gu<^┷N!էY8'3ﶶ,T73^"cU4I$,^\I/]k%Ykڝl=[bX/˕;S"ŻûK}9>:S5\*p}z͡YG
*90jus5cO
D鼬MW%{{n%0䂻2(Kd
kl=R4om@Шf7)Ae\(P_XA'	d9,^VqG
MHTulZ}e[UtW';y\&H	o¼/1ZB-Ɩ,hx@VptAU}+_A}.Fѷa)6աQ
kpsn_q#B-.Us7{XAZ@9ԟkԟӁ'؋P$':KK@}Ah^*J{t a,0hW,ͦ?'*%4.a7TXHIkٌF	.:j&:8]UH^W~h
"|^'jμ׸r
Q`}]y;Q
f*6ai[
ttm5nToWK?A>O:NeNGo$F[|_yjA;OR>7UM&fd}?ȫ~O#=t7Itdk'<⻨fع
2] 6,=h,g?iuw?\|WHnI/#?2~ңܙ‘/}9a˿ёcg;jE/Q¢!MwsM<[顚cT(/1t@+_h33`ra{<'d,)ӈ5=qYٓf=uf2'Ysr(=]䉕{Vk*ɝ<5NhZJU
,g:={
Eb"Ak"<"VK/1Ѯ*wK*Mf.H`4ϥ7iY2Z\kzeX8Xz{BjmL%sA?R8?Y6HO<9QJ59Le SڅIeW9v.H}9 .g1>,9{|UtUYT%fisIjTEO,ԾQJPK)5:f6\H뉜UE*mǒKPSU=3H4`%3^vk͈vM4_ݖ!#]߯{nevROج9[h4K=}_Ωr,L2қ<tx?N_N?(TskIFa|M<)tk>lγZ2Y[K1BWxMOx9oiPojDiH^镉5Il+͛JsX&kq^&iWd7<&_{ԻkÅ,yRߘskf3I<E_\Wt)/]I_u~>E9p.Mv,xq&87xd7ٻ0Qܮ|w3
{B_ti\3&N͘Ypv]fcO|BK^8&UK*H'p1xܟ4M(0!/e0:‚gwhJ:XƅaQ#vJqcl<1\eĔbAMەEi/xclkWa%~]4<_r&4sNL@>Y3χ1Y`
endstream
endobj
9 0 obj
7407
endobj
10 0 obj
[ /Indexed /DeviceRGB 109 13 0 R ]
endobj
11 0 obj
<<
/Filter [ /FlateDecode ]
/Width 106
/Height 80
/ColorSpace 10 0 R
/BitsPerComponent 8
/Length 12 0 R
>>
stream
x휋OhږSSUcjI}TCVRPbm)\~P
,\vg׌rB3{sϹw,oBmmmv>꾘322^:<<|>_\\\XXb|qv*1U%p
iMOf*İhX,uuuKKK


3334}V)$@*$:rbb%.k||vÜx<sᨊ=\P\Ѐ4 s=ĨK+6=;ET:ɨh(o)HPcKk׮]|֭[%%%iiiΝf^^۷|8ϟ?s7nܸׯfff-U~g\t	a/?@3cUAYYrYUU_ѫSN555566VVV3<j֠ucn3p%%%%''>|…p';;Bz݁!L>}?p$77aq!̛7obKV)Hffee}7JOO?r'ǎ"~AUEE*\XG^qe<A›)<Gí8"GX~o"Q$	Z@iE	'TIC920pGP?V
RixhxhxhxhxhXdIi#}t&x"H=5(z9R-|޽(LA(Ui:W*hÉ4QW-ZZ9ARz4y833s^J{PGG6[1f1LaLQW^ef^Ǐ

U,l2k]833/ȔsE8fGabll,Ȕ\ө_Fi{z@79_c8Ovv(Ly<j{gΜ/2z!AEpꞞ='&__0駟b7z!)@2Rgf577wZTȮn+zȈb?VQQ1\.`QQQgg'=o&XHX_b'fWZZ*[醆J͇^ȸG48Csrr=:88rfffݻiJ:[\#j}:tHC/|sC$JЌg,!1	}$dݿ?葳("Rzw#!b/2z#:N&Huq$#:rtznȼMKO#=ff(⭑61=v옉W	K\ 
ɁqFֈ؞8קx
/CfjX@y$5jjjjkkgEk_!s!CDmoeAȇoܸx耝E1Ё b|׮]?}ˏ\N#F~l6zEʫ#35pS7y풒9fjܲK/H
\]]r_FULՅjs
л|sͱ̻{	ymnhh.Z8NRӹt?.\.?:^qF MNN

utt'g},Kkdd52O (ɿ@:6kiff2},&f	t*EϵZ'b|!d8JPL01iAF'֦@__p{{uum]#[>yW	DH}_Tm+Tv…ڸg>=0váCe՘D7. =5b)<,mȱ܃644=psE9{{<#å^:=<7==}T\1ztLN}"ܗvkԤ.Y֤$P=GAʼnB7Q" ›7o&''阐1=7$j3bЋﹹ3FAT!X#IcQTP	2EF[l9,O˖^UsIsNd襀)N(.qmis^^^aapphXnw_uXgB
⟞V8~b}Omhi*f_czx<nbpκ@.ncvG9!JM,,->Dz"Ԧpl6Cز@rbB]OAo0ꨬd bnM>ŤE[sV4葓h1:;Mଡ}QLqe~m2
z$xK[NuL0'1y&cB0ٲ8AߏMAc<K
x3(-:y5./xdz{uLqc+:1g%Ky-ݭ0$wV@W_KhOC%SZ bJ5JuGgs
S2ڋ髸1^4lX
{]]$q<z/h{ &R3W(Y{\qB<&~^#qT
Uaҟ
q\&g!Lo&Z	s1T"&sBiUI_i4&6rC/2$چDIVT(Hц8~Dcܚ֭rb%%/ȯ*>,8!?.ge}ʟ.&,(LO>Qw+bIIrrٳ+#g4B/.kNIssKo-{		
KjHU4g"S]>R"sɑ_<0O<TnwaaUYn<ztgg6o^=zlNHpn}
i67ߔv^{ͻmTBk&ΝJы/J*m.%$x_yŵu/xSғOJO?-%&*w+2/$o;;vLoܹ;[oI+Ryw͛]?ݿ_*+QCϗ_zq,1ѹe+L4oDgǏILTpt;$zc޽JAzL
wIV~玽`ol-JJ
R_UVVTR⨩Q*+W[먫kQdERD-Ug]r±Z9-J:)Z&ZZe
lee-ZfoiQh@fkXC'hnVtf8S
wC}4;;c[6}}}<𢅅ጰfX466F-ÆVDQl6w>z?H7{^__oWwkF=OII"(##mvt֭ymPqjqΝ(HU$"QQoiiG',!bU[~Tutt!cSF,Poְwʳ0|6_N˄\0D%	'GWVVRD&=LRCCzĉԺ egg>}~ʺrٳg_~9ɓc#yL!"``#bXiuuucc#a{H?@#H	X
endstream
endobj
12 0 obj
3898
endobj
13 0 obj
<<
/Length 14 0 R
>>
stream
@   @@```@0``@@@@`````@```` ```    @@ @` ```@    ```@@@```       @ @``@@`p???___ߧ
endstream
endobj
14 0 obj
330
endobj
15 0 obj
<<
>>
endobj
16 0 obj
330
endobj
17 0 obj
<<
/Title (conntime.pdf)
/CreationDate (D:20140227105000)
/ModDate (D:20140227105000)
/Producer (ImageMagick 6.5.4-7 2014-02-10 Q16 OpenMP http://www.imagemagick.org)
>>
endobj
xref
0 18
0000000000 65535 f 
0000000010 00000 n 
0000000059 00000 n 
0000000118 00000 n 
0000000300 00000 n 
0000000383 00000 n 
0000000401 00000 n 
0000000439 00000 n 
0000000460 00000 n 
0000008049 00000 n 
0000008069 00000 n 
0000008120 00000 n 
0000012159 00000 n 
0000012180 00000 n 
0000012565 00000 n 
0000012585 00000 n 
0000012607 00000 n 
0000012627 00000 n 
trailer
<<
/Size 18
/Info 17 0 R
/Root 1 0 R
>>
startxref
12813
%%EOF

[-- Attachment #3 --]
diff --git a/sys/netinet/tcp_timer.c b/sys/netinet/tcp_timer.c
index bde7503..b45a9ea 100644
--- a/sys/netinet/tcp_timer.c
+++ b/sys/netinet/tcp_timer.c
@@ -144,9 +144,7 @@ tcp_slowtimo(void)
 	VNET_LIST_RLOCK_NOSLEEP();
 	VNET_FOREACH(vnet_iter) {
 		CURVNET_SET(vnet_iter);
-		INP_INFO_WLOCK(&V_tcbinfo);
-		(void) tcp_tw_2msl_scan(0);
-		INP_INFO_WUNLOCK(&V_tcbinfo);
+		tcp_tw_2msl_scan();
 		CURVNET_RESTORE();
 	}
 	VNET_LIST_RUNLOCK_NOSLEEP();
diff --git a/sys/netinet/tcp_timer.h b/sys/netinet/tcp_timer.h
index 3115fb3..c04723a 100644
--- a/sys/netinet/tcp_timer.h
+++ b/sys/netinet/tcp_timer.h
@@ -178,7 +178,8 @@ extern int tcp_fast_finwait2_recycle;
 void	tcp_timer_init(void);
 void	tcp_timer_2msl(void *xtp);
 struct tcptw *
-	tcp_tw_2msl_scan(int _reuse);		/* XXX temporary */
+	tcp_tw_2msl_reuse(void);	/* XXX temporary? */
+void	tcp_tw_2msl_scan(void);
 void	tcp_timer_keep(void *xtp);
 void	tcp_timer_persist(void *xtp);
 void	tcp_timer_rexmt(void *xtp);
diff --git a/sys/netinet/tcp_timewait.c b/sys/netinet/tcp_timewait.c
index 7e6128b..0230b88 100644
--- a/sys/netinet/tcp_timewait.c
+++ b/sys/netinet/tcp_timewait.c
@@ -49,6 +49,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/socketvar.h>
 #include <sys/protosw.h>
 #include <sys/random.h>
+#include <sys/refcount.h>
 
 #include <vm/uma.h>
 
@@ -98,13 +99,59 @@ static int	maxtcptw;
  * The timed wait queue contains references to each of the TCP sessions
  * currently in the TIME_WAIT state.  The queue pointers, including the
  * queue pointers in each tcptw structure, are protected using the global
- * tcbinfo lock, which must be held over queue iteration and modification.
+ * timewait lock, which must be held over queue iteration and modification.
  */
 static VNET_DEFINE(TAILQ_HEAD(, tcptw), twq_2msl);
 #define	V_twq_2msl			VNET(twq_2msl)
 
-static void	tcp_tw_2msl_reset(struct tcptw *, int);
-static void	tcp_tw_2msl_stop(struct tcptw *);
+/* Global timewait lock */
+static VNET_DEFINE(struct rwlock, tw_lock);
+#define	V_tw_lock			VNET(tw_lock)
+
+#define TW_LOCK_INIT(tw, d)	rw_init_flags(&(tw), (d), 0)
+#define TW_LOCK_DESTROY(tw)	rw_destroy(&(tw))
+#define TW_RLOCK(tw)		rw_rlock(&(tw))
+#define TW_WLOCK(tw)		rw_wlock(&(tw))
+#define TW_RUNLOCK(tw)		rw_runlock(&(tw))
+#define TW_WUNLOCK(tw)		rw_wunlock(&(tw))
+#define TW_LOCK_ASSERT(tw)	rw_assert(&(tw), RA_LOCKED)
+#define TW_RLOCK_ASSERT(tw)	rw_assert(&(tw), RA_RLOCKED)
+#define TW_WLOCK_ASSERT(tw)	rw_assert(&(tw), RA_WLOCKED)
+#define TW_UNLOCK_ASSERT(tw)	rw_assert(&(tw), RA_UNLOCKED)
+
+/*
+ * tw_pcbref() bumps the reference count on an tw in order to maintain
+ * stability of an tw pointer despite the tw lock being released.
+ */
+static void
+tw_pcbref(struct tcptw *tw)
+{
+	KASSERT(tw->tw_refcount > 0, ("%s: refcount 0", __func__));
+	refcount_acquire(&tw->tw_refcount);
+}
+
+/*
+ * Drop a refcount on an tw elevated using tw_pcbref().  If it is
+ * valid, we return with the tw lock held.
+ */
+static int
+tw_pcbrele(struct tcptw *tw)
+{
+	TW_WLOCK_ASSERT(V_tw_lock);
+	KASSERT(tw->tw_refcount > 0, ("%s: refcount 0", __func__));
+
+	if (!refcount_release(&tw->tw_refcount)) {
+		TW_WUNLOCK(V_tw_lock);
+		return (0);
+	}
+
+	uma_zfree(V_tcptw_zone, tw);
+	TW_WUNLOCK(V_tw_lock);
+	return (1);
+}
+
+static void	tcp_tw_2msl_reset(struct tcptw *, int ream);
+static void	tcp_tw_2msl_stop(struct tcptw *, int reuse);
 
 static int
 tcptw_auto_size(void)
@@ -171,6 +218,7 @@ tcp_tw_init(void)
 	else
 		uma_zone_set_max(V_tcptw_zone, maxtcptw);
 	TAILQ_INIT(&V_twq_2msl);
+	TW_LOCK_INIT(V_tw_lock, "tcptw");
 }
 
 #ifdef VIMAGE
@@ -179,11 +227,14 @@ tcp_tw_destroy(void)
 {
 	struct tcptw *tw;
 
-	INP_INFO_WLOCK(&V_tcbinfo);
-	while((tw = TAILQ_FIRST(&V_twq_2msl)) != NULL)
-		tcp_twclose(tw, 0);
-	INP_INFO_WUNLOCK(&V_tcbinfo);
+	TW_WLOCK(V_tw_lock);
+	while((tw = TAILQ_FIRST(&V_twq_2msl)) != NULL) {
+		tcp_twclose(tw, 0, 1);
+		TW_WLOCK(V_tw_lock);
+	}
+	TW_WUNLOCK(V_tw_lock);
 
+	TW_LOCK_DESTROY(V_tw_lock);
 	uma_zdestroy(V_tcptw_zone);
 }
 #endif
@@ -204,7 +255,7 @@ tcp_twstart(struct tcpcb *tp)
 	int isipv6 = inp->inp_inc.inc_flags & INC_ISIPV6;
 #endif
 
-	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);	/* tcp_tw_2msl_reset(). */
+	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	INP_WLOCK_ASSERT(inp);
 
 	if (V_nolocaltimewait) {
@@ -229,7 +280,7 @@ tcp_twstart(struct tcpcb *tp)
 
 	tw = uma_zalloc(V_tcptw_zone, M_NOWAIT);
 	if (tw == NULL) {
-		tw = tcp_tw_2msl_scan(1);
+		tw = tcp_tw_2msl_reuse();
 		if (tw == NULL) {
 			tp = tcp_close(tp);
 			if (tp != NULL)
@@ -238,6 +289,7 @@ tcp_twstart(struct tcpcb *tp)
 		}
 	}
 	tw->tw_inpcb = inp;
+	refcount_init(&tw->tw_refcount, 1);
 
 	/*
 	 * Recover last window size sent.
@@ -356,7 +408,6 @@ tcp_twcheck(struct inpcb *inp, struct tcpopt *to, struct tcphdr *th,
 	int thflags;
 	tcp_seq seq;
 
-	/* tcbinfo lock required for tcp_twclose(), tcp_tw_2msl_reset(). */
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	INP_WLOCK_ASSERT(inp);
 
@@ -458,11 +509,11 @@ tcp_twclose(struct tcptw *tw, int reuse)
 	inp = tw->tw_inpcb;
 	KASSERT((inp->inp_flags & INP_TIMEWAIT), ("tcp_twclose: !timewait"));
 	KASSERT(intotw(inp) == tw, ("tcp_twclose: inp_ppcb != tw"));
-	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);	/* tcp_tw_2msl_stop(). */
+	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	INP_WLOCK_ASSERT(inp);
 
 	tw->tw_inpcb = NULL;
-	tcp_tw_2msl_stop(tw);
+	tcp_tw_2msl_stop(tw, reuse);
 	inp->inp_ppcb = NULL;
 	in_pcbdrop(inp);
 
@@ -494,11 +545,6 @@ tcp_twclose(struct tcptw *tw, int reuse)
 	} else
 		in_pcbfree(inp);
 	TCPSTAT_INC(tcps_closed);
-	crfree(tw->tw_cred);
-	tw->tw_cred = NULL;
-	if (reuse)
-		return;
-	uma_zfree(V_tcptw_zone, tw);
 }
 
 int
@@ -616,34 +662,83 @@ tcp_tw_2msl_reset(struct tcptw *tw, int rearm)
 
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	INP_WLOCK_ASSERT(tw->tw_inpcb);
+
+	TW_WLOCK(V_tw_lock);
 	if (rearm)
 		TAILQ_REMOVE(&V_twq_2msl, tw, tw_2msl);
 	tw->tw_time = ticks + 2 * tcp_msl;
 	TAILQ_INSERT_TAIL(&V_twq_2msl, tw, tw_2msl);
+	TW_WUNLOCK(V_tw_lock);
 }
 
 static void
-tcp_tw_2msl_stop(struct tcptw *tw)
+tcp_tw_2msl_stop(struct tcptw *tw, int reuse)
 {
 
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
+
+	TW_WLOCK(V_tw_lock);
 	TAILQ_REMOVE(&V_twq_2msl, tw, tw_2msl);
+	crfree(tw->tw_cred);
+	tw->tw_cred = NULL;
+
+	if (!reuse) {
+		tw_pcbrele(tw);
+		return;
+	}
+
+	TW_WUNLOCK(V_tw_lock);
 }
 
 struct tcptw *
-tcp_tw_2msl_scan(int reuse)
+tcp_tw_2msl_reuse(void)
 {
-	struct tcptw *tw;
 
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
+
+	struct tcptw *tw;
+
+	TW_WLOCK(V_tw_lock);
+	tw = TAILQ_FIRST(&V_twq_2msl);
+	if (tw == NULL) {
+		TW_WUNLOCK(V_tw_lock);
+		return NULL;
+	}
+	TW_WUNLOCK(V_tw_lock);
+
+	INP_WLOCK(tw->tw_inpcb);
+	tcp_twclose(tw, 1);
+
+	return (tw);
+}
+
+void
+tcp_tw_2msl_scan(void)
+{
+
+	struct tcptw *tw;
 	for (;;) {
+		TW_RLOCK(V_tw_lock);
 		tw = TAILQ_FIRST(&V_twq_2msl);
-		if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0))
+		if (tw == NULL || ((tw->tw_time - ticks) > 0)) {
+			TW_RUNLOCK(V_tw_lock);
 			break;
+		}
+		tw_pcbref(tw);
+		TW_RUNLOCK(V_tw_lock);
+
+		/* Close timewait state */
+		INP_INFO_WLOCK(&V_tcbinfo);
+
+		TW_WLOCK(V_tw_lock);
+		if(tw_pcbrele(tw))
+			continue;
+
+		KASSERT(tw->tw_inpcb != NULL,
+		        ("%s: tw->tw_inpcb == NULL", __func__));
+
 		INP_WLOCK(tw->tw_inpcb);
-		tcp_twclose(tw, reuse);
-		if (reuse)
-			return (tw);
+		tcp_twclose(tw, 0);
+		INP_INFO_WUNLOCK(&V_tcbinfo);
 	}
-	return (NULL);
 }
diff --git a/sys/netinet/tcp_var.h b/sys/netinet/tcp_var.h
index e3197e5..b44672d 100644
--- a/sys/netinet/tcp_var.h
+++ b/sys/netinet/tcp_var.h
@@ -355,6 +355,7 @@ struct tcptw {
 	TAILQ_ENTRY(tcptw) tw_2msl;
 	void		*tw_pspare;	/* TCP_SIGNATURE */
 	u_int		*tw_spare;	/* TCP_SIGNATURE */
+	u_int		tw_refcount;	/* refcount */
 };
 
 #define	intotcpcb(ip)	((struct tcpcb *)(ip)->inp_ppcb)

[-- Attachment #4 --]
%PDF-1.3 
1 0 obj
<<
/Pages 2 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [ 3 0 R ]
/Count 1
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/XObject << /Im0 8 0 R >>
/ProcSet 6 0 R >>
/MediaBox [0 0 640 480]
/CropBox [0 0 640 480]
/Contents 4 0 R
/Thumb 11 0 R
>>
endobj
4 0 obj
<<
/Length 5 0 R
>>
stream
q
640 0 0 480 0 0 cm
/Im0 Do
Q
endstream
endobj
5 0 obj
31
endobj
6 0 obj
[ /PDF /Text /ImageI ]
endobj
7 0 obj
<<
>>
endobj
8 0 obj
<<
/Type /XObject
/Subtype /Image
/Name /Im0
/Filter [ /FlateDecode ]
/Width 640
/Height 480
/ColorSpace 10 0 R
/BitsPerComponent 8
/Length 9 0 R
>>
stream
x횣*aTew$9ULĠ%_T
aX=:]DžK]Lv]s~jF6|;ZmzNǺA>7ϥ_%6=l]oϟ/yw6%F{ac{ao$H5^0Viy7O#S|]x7!Quuզt&R˃hwd4E[<^m0b
g'[*>
M}ȿ	6AYg
TMQ[	z#"b|S~t_w&U3&}nlꍷs㦶#v?N@[ŗ񴰮6q?;wӘBiJT.Nb/_kJ1|N~1*MAQ^/?|n`Oĭx7jr#WYgbf0mj$k,vr>뛉T
iciX2.3&HYYE3hWIpk;K-Q}wkK*ЎބmJ(!Jɔ2jºzC?_nSTjFT9o%uDR*)Ir*h1>Ys%N˫8/6OjW^k.f	7g~il""r#eNg7KUss|(d]	m|ȯD]Nς;Su?U[ʘE,lCQ1|@(O=VF|9#bOobn.6[1ޘmկ\dFGyœ!%mlY]q\.X%h+Ε]+GXuh.wN1J_HX>lLPȿimOK1٢H=p_3VRmm判V|}jTM'_浸>&iniG12v3oo\LbOXR--y[X;SBznw΅ۿ^.0y\.ZsĩvTX_ӁgN:R#Q#oI+HcK"XMBu*Yn-Zv{/;K[\-9nEy^ݸ#_dNyc1cܘr]TSGW(V]M:mEwu?;(o/qJ9'W~;waݍSz1kG?	$?@I$?@I$?@I$?@I$?@I$?@I$?@I$?@I$?@I$?@I$?@I$?@I$?Fjѓ;gNY/;XzmdUǚӱgj>4ſ+Ƈ|@qVq',KzyzHvl,$H]nc<7'c4jl ="3vlpk$?x]
`LF066ZsZ#ncrAAcۃF`GPΎʏ

8Chkp{8Eu֌	ƥy
_@;??@IOλ0o{d?x91Z6=rP匿e6 @ch,}cb4ydN13$`3Hg$ڃ'-00!sc1zT2II@@@@Ot7w<>FF;CD/2g$	5?a@3B3pɄ:
$O~؎ F2څ, <ҍz!m>J6NӐR7ԳZ).He4m9EK֭chB
.1[xs=`񆿃im]/Цpj\.#%^xI/6s=~\%	og(Wr^o?.-}嘂I(A_硥ߞ(A88/Ha-/jqؑYCĠIh$?@ 	@ɻ{ϭ`?ɤ94?.~Zةs¯@I$?='vscl3w?@|v'cH@ic4OA)h~~`
7@I$a
HB$Ih$0AP7:]$1tAGh/GOA$?q&mX=%qS&S4몗4/֤
a|z@~%6[RSTqrr׌67zTSx%jI|ߔf_<ѹ4?4?S}fA'g2nG~̡a?5*C=(mY qZ,	>I<$y@@0)H MSw`b`W+6F5/H|
ό?uLzrMOz}w 
 C;
$$$$qzNϦ}4mtAkP4?x̿B0. 	$̿0.H 
,=sE'@usG}v߿/ĆoO@0 k߳/|Y(Ϭ8h=3N|qzҶp
.i_~7Y}&=$Ih/A(ܘEȂ 	$46m/uIi^K&i?0!3?~r^%\Igj>ϋ;@'oϟ/S!/ 177 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	$H 	]2l/Q~CçM؝YAo?Ó/C?дdlCh}Wz{(mSu9<_"$vZ7I7\SF}{bcծm
Cm[Kφa;þpET~oUF<ÅV$Uܪ&D+WP&pV$=}E7@Y]lXMi='{a/þa06
1܏H~aKaK"XiQ!c+Yzy=<-v>ߣJ|G0JlbNЖkV)r*62ĖYegS0챪?/͒]uW)Ǽ:A-M?_G.o"'1V_%Psm
^\\;TX.<[Ň{Tm4?-/$mW1Nqթ6R$B2VD<k.y]wT&xbcۓLȢ+2*2|eT\̡k_u3YؽL+y/++șjoE}Q+*
{_fT_U:Wv/)yG.Vuxb;fވʩ?Ņܳ\*UxN:UGx89uHMx)v|Js?9AaWI#wP.9WՎ^{P]YطzU?1+S}GX3elZsaRTUJ.mje]mQЋbj*dCRe"mR>-TTQtLʞ:?u2՟~'
#|}iV
'W~yB"r]ӽ_o+?g]jQ^Wux~!}3*yQ<tuaD(=/*euJ*b<)ʼ(u%P
O'X,SU!CۊBKU/]OSV9v"R	:W,G_.j>e[\|gafsLV۩1,=nޙ_=~?w^LƓ^+!mkY7$!LIփmL@s8SSB۴n=Л>1Áfcpm[5OWpri0Yh-|1e &lsӴR0l:<i0
(v\4GώnU?m?v!%rڜi8%i??ZW:=*Xm _{/liZېt&kE+4`Zh̞Rj\{ڍ$9mGpm`خ֛ZK^_lOW4twtӞn]kv&횪.%ظpY{/d7ilcxm4r4_w֍M:x[Zwzƍm=?i=Z(6FP疓ij~,w/4k,oLg{ԫcˁ~f{O~>qÆ'%Q"ҥuO>I*l`_t[	/ggmzIL_<{viVmմwJuyJhY'80|:fH7a*-O
v^Y;JR	<0;sD/e2
Aa!NWjmTZ(<slӰV-z$a.[fKsJ{\ƢMCP\msOxվ*7֮ۿkڿkg,.V.>%Rh`7Y,
endstream
endobj
9 0 obj
5890
endobj
10 0 obj
[ /Indexed /DeviceRGB 108 13 0 R ]
endobj
11 0 obj
<<
/Filter [ /FlateDecode ]
/Width 106
/Height 80
/ColorSpace 10 0 R
/BitsPerComponent 8
/Length 12 0 R
>>
stream
x\kOG1UU%&mK$MVJ(_O@I0`lcŀcC07s1ȼ2+]lcUUj5sΙ3gYS8.۱KJU[[;77z$ۋFc 3$Ԇ
E'~DGt"+9ncKI%bxɰgZ7ol6[(NLP0.-HVVV4jcch!h4!34
pgg6raa!|>\[[@q-<"b

XV*l~oʎ|kB944TQQɓvF322bXFzp><<l2


Plkkkii'h=zI]]]yy`ōFAWWWMM
ut:=00?A:i 5<OGG*++޾};;;VUUA<08
˞.*'''++qըjͦ(33"؀f+ܦFÁSZZ[@RwM
cFdwbOЭAЕVWWސE5@ih%q=tĀ
@~
SSSh`EH`W"hA}gDY$$}{tKtKt;)0YL	s!ab"PcD&)z>酗:m"N(tNt-J@`O%@l)=(9KQBnE?q(%V
KsqXce-U1j>}tvvVb%dRRc/!01,U1b1Rj7#`%paǎTK]WW"U1kc!U1n߹sGfc/>[OOf41lii8uvv<O%^-c/C+**<B!9??>Yۋ0c0233Eh{R،%5$U1=FNb%D8uhTb%ֹ<`o	R`NN{C,֖T؋̇[G^|0`ǫTֆ1[%$:{:NP`0d$t/..l0u\m};M4H
h`lƒ0[eo	Rq^Ba0W@=7##fIU@U*^Wb%Ç	asssvv6]#`1RYYCT؋=2r8x<tWh4b$⮯/$C&Cl0~r:$n`EVIwuq6IJ8:
/]RƽKtKtKtKtz'vҾC
hp1bH>T٦[Q~\q۝
:uu#ssdvYx<pmM8\.NHu5#V+%d]]DlTD$

ͤxA`BYԐIoA;2"T	QQ/jilj7EP$>᪻TV
rpimH`90 ܸF!_%?#OBo/_{._>^|`R~wRyk+._};o|Ÿ=u͛3g°u~M+7nϞ
AW| D?Q:~*~PfC:Wg}$~EО;v-x=	{]l~YPs^p	Y$eeR9^V6Ue%r*Vs:P8u:F3^ZzX+/4[RUTp`qV(P`\]͵	JB1RTQPh3"A0nATMhpOyBȖtrstj---m񬑪Ѩ\8)qY2JV499944$!r<8/{ESSSo* G^Dmll.٭̌l䃃}x%{#"E*
TڪVwa5tZFtttL$ߑRlooIl{TJ壣d|wwd~Y0U\TW.KVWWeY1Z,݄SbeKJq/d9g`xǁatyx0a<8#
ljaaAc;͆aϞ=˳@ܽ{WTUU566ҝsss<~s8!
4043nxpZi0*k{ǁ~?P?	>
endstream
endobj
12 0 obj
2588
endobj
13 0 obj
<<
/Length 14 0 R
>>
stream
@   @@```@0``@@@@`````@```` ```    @@ @` ```@    ```@@@```       @ @``@@`p???___ߧ
endstream
endobj
14 0 obj
327
endobj
15 0 obj
<<
>>
endobj
16 0 obj
327
endobj
17 0 obj
<<
/Title (conntime.pdf)
/CreationDate (D:20140227104825)
/ModDate (D:20140227104825)
/Producer (ImageMagick 6.5.4-7 2014-02-10 Q16 OpenMP http://www.imagemagick.org)
>>
endobj
xref
0 18
0000000000 65535 f 
0000000010 00000 n 
0000000059 00000 n 
0000000118 00000 n 
0000000300 00000 n 
0000000383 00000 n 
0000000401 00000 n 
0000000439 00000 n 
0000000460 00000 n 
0000006532 00000 n 
0000006552 00000 n 
0000006603 00000 n 
0000009332 00000 n 
0000009353 00000 n 
0000009735 00000 n 
0000009755 00000 n 
0000009777 00000 n 
0000009797 00000 n 
trailer
<<
/Size 18
/Info 17 0 R
/Root 1 0 R
>>
startxref
9983
%%EOF

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?len481$sfv$2>