Date: Wed, 12 Oct 2016 11:19:48 +0200 From: Julien Charbon <jch@freebsd.org> To: Slawa Olhovchenkov <slw@zxy.spb.ru> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org, hiren panchasara <hiren@strugglingcoder.info> Subject: Re: 11.0 stuck on high network load Message-ID: <f3c0e73a-5e6e-2190-aed3-499250c1764c@freebsd.org> In-Reply-To: <20161012084045.GA57714@zxy.spb.ru> References: <20161006111043.GH54003@zxy.spb.ru> <1431484c-c00e-24c5-bd76-714be8ae5ed5@freebsd.org> <20161010133220.GU54003@zxy.spb.ru> <23f1200e-383e-befb-b76d-c88b3e1287b0@freebsd.org> <20161010142941.GV54003@zxy.spb.ru> <52d634aa-639c-bef7-1f10-c46dbadc4d85@freebsd.org> <20161010173531.GI6177@zxy.spb.ru> <8143cd8f-c007-2378-b004-b2b037402d03@freebsd.org> <20161011121145.GJ6177@zxy.spb.ru> <f1d9e34e-3d85-bd02-e660-6d647e4343fb@freebsd.org> <20161012084045.GA57714@zxy.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --DGX6hbHiQg02lxOwk9lmjcNErfTg5diS7 Content-Type: multipart/mixed; boundary="8aoSHgk0kipRFXJlwulQdE80ElhUCGXii"; protected-headers="v1" From: Julien Charbon <jch@freebsd.org> To: Slawa Olhovchenkov <slw@zxy.spb.ru> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org, hiren panchasara <hiren@strugglingcoder.info> Message-ID: <f3c0e73a-5e6e-2190-aed3-499250c1764c@freebsd.org> Subject: Re: 11.0 stuck on high network load References: <20161006111043.GH54003@zxy.spb.ru> <1431484c-c00e-24c5-bd76-714be8ae5ed5@freebsd.org> <20161010133220.GU54003@zxy.spb.ru> <23f1200e-383e-befb-b76d-c88b3e1287b0@freebsd.org> <20161010142941.GV54003@zxy.spb.ru> <52d634aa-639c-bef7-1f10-c46dbadc4d85@freebsd.org> <20161010173531.GI6177@zxy.spb.ru> <8143cd8f-c007-2378-b004-b2b037402d03@freebsd.org> <20161011121145.GJ6177@zxy.spb.ru> <f1d9e34e-3d85-bd02-e660-6d647e4343fb@freebsd.org> <20161012084045.GA57714@zxy.spb.ru> In-Reply-To: <20161012084045.GA57714@zxy.spb.ru> --8aoSHgk0kipRFXJlwulQdE80ElhUCGXii Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Slawa, On 10/12/16 10:40 AM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote: >> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote: >>> On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: >>>> Then threads are competing for the INP_WLOCK lock. For the example= , >>>> let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() = and >>>> racing for this INP_WLOCK: >>>> >>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/i= n_pcb.c#L1964 >>>> >>>> And thread B wants to run tcp_timer_2msl()/tcp_close()/in_pcbdrop()= and >>>> racing for this INP_WLOCK: >>>> >>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/t= cp_timer.c#L323 >>>> >>>> That leads to two cases: >>>> >>>> o Thread A wins the race: >>>> >>>> Thread A will continue tcp_input() as usal and INP_DROPPED flags i= s >>>> not set and inp is still in TCP hash table. >>>> Thread B is waiting on thread A to release INP_WLOCK after finishi= ng >>>> tcp_input() processing, and thread B will continue >>>> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing. >>>> >>>> o Thread B wins the race: >>>> >>>> Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp >>>> INP_DROPPED is set and inp being removed from TCP hash table. >>>> In parallel, thread A has found the inp in TCP hash before is was >>>> removed, and waiting on the found inp INP_WLOCK lock. >>>> Once thread B has released the INP_WLOCK lock, thread A gets this = lock >>>> and sees the INP_DROPPED flag and do "goto findpcb" but here because= the >>>> inp is not more in TCP hash table and it will not be find again by >>>> in_pcblookup_mbuf(). >>>> >>>> Hopefully I am clear enough here. >>> >>> Thanks, very clear. >>> Small qeustion: when both thread run on same CPU core, INP_WLOCK will= >>> be re-schedule? >> >> Hmm, a thread can re-scheduled but not a lock. Thus no sure I >> understand your question here. :) >=20 > I am don't know how work INP_WLOCK in this case (all on same cpu): >=20 > thread1: INP_WLOCK > -interrupt-- > thread2: INP_WLOCK >=20 > if INP_WLOCK is like spinlock -- this is dead lock. > if INP_WLOCK is like mutex -- thread1 resheduled. Thanks, I understand you question now. No an interrupt cannot bypass a lock: Here INP_WLOCK is like mutex -- thread1 resheduled. >>> As I remeber race created by call tcp_twstart() at time of end >>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected >>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in tcp_t= wstart() >> >> Exactly, thus the current fix is: If you already have the INP_DROPPE= D >> flag set you are not allowed to call tcp_twstart(), actually it is a >> good candidate for a new INVARIANT. Let me add that. >> >>> After check source code I am found invocation of tcp_twstart() in >>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c, >>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c. >>> >>> Invocation from sys/netinet/tcp_stacks/fastpath.c and >>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now >>> will be OK. >>> >>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and >>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed= >>> INP_WLOCK. Is this OK? >>> >>> Can be thread A wants do_peer_close() directed from chelsio IRQ >>> handler, bypass tcp_input()? >> >> If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and >> t4_cpl_io.c before calling tcp_twstart(). >=20 > Yes, and you remeber: sys/netinet/tcp_subr.c >=20 > 1535 struct tcpcb * > 1536 tcp_close(struct tcpcb *tp) > 1537 { > ... > 1569 INP_WUNLOCK(inp); > 1570 ACCEPT_LOCK(); > 1571 SOCK_LOCK(so); > 1572 so->so_state &=3D ~SS_PROTOREF; > 1573 sofree(so); > 1574 return (NULL); >=20 > sofree() call tcp_usr_detach() and in tcp_usr_detach() we have > unexpected INP_TIMEWAIT. I see, thus just for the context: The TCP stack in sys/dev/cxgb* is a TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a separate/side TCP stack that is used only with TCP_OFFLOAD option. This TOE TCP stack actually has its own set of detach()/input() functions and seems to check INP_DROPPED flag properly. I guess @np check fixes in socket TCP stack and decides which one can also impact the Chelsio TOE TCP stack. Some bugs are only in socket TCP stack, some are only in TOE TCP stack. -- Julien --8aoSHgk0kipRFXJlwulQdE80ElhUCGXii-- --DGX6hbHiQg02lxOwk9lmjcNErfTg5diS7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJX/gA4AAoJEKVlQ5Je6dhxOMUH/iJ2YxJkegfdn6YiAfRffrNx RT+E1JvhSQmITZ1wCSSZpAG+y4VjYaph5Ey8V429wenXcq47MDAcbqrqmmDhj82v nNiFschA8A4lCRvZyPIrRV1UD+ojaE56ykzvrXqvf5JsdZ54oVzweTkEdP/ehnyk XADKx2XrwAydr7L5/6a+3yu5EgiMU/6KMvxEGT0PVGw1Tyur4kQjnstfLPfo5u8m miKVM8VFu5wewMn7FApb2amhBUGo0cSJZTDGSd+IMowZMuY8eB52IHLXUtJLEn6n MNjEuETcy37T8EU9u8LbTMJgLi6sGWusRfSc1FZ2LWM3xiavbKsPwxVgSt6lERQ= =7uvB -----END PGP SIGNATURE----- --DGX6hbHiQg02lxOwk9lmjcNErfTg5diS7--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f3c0e73a-5e6e-2190-aed3-499250c1764c>