Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Oct 2016 11:19:48 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org, hiren panchasara <hiren@strugglingcoder.info>
Subject:   Re: 11.0 stuck on high network load
Message-ID:  <f3c0e73a-5e6e-2190-aed3-499250c1764c@freebsd.org>
In-Reply-To: <20161012084045.GA57714@zxy.spb.ru>
References:  <20161006111043.GH54003@zxy.spb.ru> <1431484c-c00e-24c5-bd76-714be8ae5ed5@freebsd.org> <20161010133220.GU54003@zxy.spb.ru> <23f1200e-383e-befb-b76d-c88b3e1287b0@freebsd.org> <20161010142941.GV54003@zxy.spb.ru> <52d634aa-639c-bef7-1f10-c46dbadc4d85@freebsd.org> <20161010173531.GI6177@zxy.spb.ru> <8143cd8f-c007-2378-b004-b2b037402d03@freebsd.org> <20161011121145.GJ6177@zxy.spb.ru> <f1d9e34e-3d85-bd02-e660-6d647e4343fb@freebsd.org> <20161012084045.GA57714@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--DGX6hbHiQg02lxOwk9lmjcNErfTg5diS7
Content-Type: multipart/mixed; boundary="8aoSHgk0kipRFXJlwulQdE80ElhUCGXii";
 protected-headers="v1"
From: Julien Charbon <jch@freebsd.org>
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org,
 hiren panchasara <hiren@strugglingcoder.info>
Message-ID: <f3c0e73a-5e6e-2190-aed3-499250c1764c@freebsd.org>
Subject: Re: 11.0 stuck on high network load
References: <20161006111043.GH54003@zxy.spb.ru>
 <1431484c-c00e-24c5-bd76-714be8ae5ed5@freebsd.org>
 <20161010133220.GU54003@zxy.spb.ru>
 <23f1200e-383e-befb-b76d-c88b3e1287b0@freebsd.org>
 <20161010142941.GV54003@zxy.spb.ru>
 <52d634aa-639c-bef7-1f10-c46dbadc4d85@freebsd.org>
 <20161010173531.GI6177@zxy.spb.ru>
 <8143cd8f-c007-2378-b004-b2b037402d03@freebsd.org>
 <20161011121145.GJ6177@zxy.spb.ru>
 <f1d9e34e-3d85-bd02-e660-6d647e4343fb@freebsd.org>
 <20161012084045.GA57714@zxy.spb.ru>
In-Reply-To: <20161012084045.GA57714@zxy.spb.ru>

--8aoSHgk0kipRFXJlwulQdE80ElhUCGXii
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi Slawa,

On 10/12/16 10:40 AM, Slawa Olhovchenkov wrote:
> On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote:
>> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote:
>>> On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote:
>>>>  Then threads are competing for the INP_WLOCK lock.  For the example=
,
>>>> let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() =
and
>>>> racing for this INP_WLOCK:
>>>>
>>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/i=
n_pcb.c#L1964
>>>>
>>>>  And thread B wants to run tcp_timer_2msl()/tcp_close()/in_pcbdrop()=
 and
>>>> racing for this INP_WLOCK:
>>>>
>>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/t=
cp_timer.c#L323
>>>>
>>>>  That leads to two cases:
>>>>
>>>>  o Thread A wins the race:
>>>>
>>>>   Thread A will continue tcp_input() as usal and INP_DROPPED flags i=
s
>>>> not set and inp is still in TCP hash table.
>>>>   Thread B is waiting on thread A to release INP_WLOCK after finishi=
ng
>>>> tcp_input() processing, and thread B will continue
>>>> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing.
>>>>
>>>>  o Thread B wins the race:
>>>>
>>>>   Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp
>>>> INP_DROPPED is set and inp being removed from TCP hash table.
>>>>   In parallel, thread A has found the inp in TCP hash before is was
>>>> removed, and waiting on the found inp INP_WLOCK lock.
>>>>   Once thread B has released the INP_WLOCK lock, thread A gets this =
lock
>>>> and sees the INP_DROPPED flag and do "goto findpcb" but here because=
 the
>>>> inp is not more in TCP hash table and it will not be find again by
>>>> in_pcblookup_mbuf().
>>>>
>>>>  Hopefully I am clear enough here.
>>>
>>> Thanks, very clear.
>>> Small qeustion: when both thread run on same CPU core, INP_WLOCK will=

>>> be re-schedule?
>>
>>  Hmm, a thread can re-scheduled but not a lock. Thus no sure I
>> understand your question here.  :)
>=20
> I am don't know how work INP_WLOCK in this case (all on same cpu):
>=20
> thread1: INP_WLOCK
> -interrupt--
> thread2: INP_WLOCK
>=20
> if INP_WLOCK is like spinlock -- this is dead lock.
> if INP_WLOCK is like mutex -- thread1 resheduled.

 Thanks, I understand you question now.  No an interrupt cannot bypass a
lock:  Here INP_WLOCK is like mutex -- thread1 resheduled.

>>> As I remeber race created by call tcp_twstart() at time of end
>>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected
>>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in tcp_t=
wstart()
>>
>>  Exactly, thus the current fix is:  If you already have the INP_DROPPE=
D
>> flag set you are not allowed to call tcp_twstart(), actually it is a
>> good candidate for a new INVARIANT.  Let me add that.
>>
>>> After check source code I am found invocation of tcp_twstart() in
>>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c,
>>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c.
>>>
>>> Invocation from sys/netinet/tcp_stacks/fastpath.c and
>>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now
>>> will be OK.
>>>
>>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and
>>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed=

>>> INP_WLOCK. Is this OK?
>>>
>>> Can be thread A wants do_peer_close() directed from chelsio IRQ
>>> handler, bypass tcp_input()?
>>
>>  If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and
>> t4_cpl_io.c before calling tcp_twstart().
>=20
> Yes, and you remeber: sys/netinet/tcp_subr.c
>=20
>   1535  struct tcpcb *
>   1536  tcp_close(struct tcpcb *tp)
>   1537  {
> ...
>   1569                  INP_WUNLOCK(inp);
>   1570                  ACCEPT_LOCK();
>   1571                  SOCK_LOCK(so);
>   1572                  so->so_state &=3D ~SS_PROTOREF;
>   1573                  sofree(so);
>   1574                  return (NULL);
>=20
> sofree() call tcp_usr_detach() and in tcp_usr_detach() we have
> unexpected INP_TIMEWAIT.

 I see, thus just for the context:  The TCP stack in sys/dev/cxgb* is a
TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a
separate/side TCP stack that is used only with TCP_OFFLOAD option.

 This TOE TCP stack actually has its own set of detach()/input()
functions and seems to check INP_DROPPED flag properly.  I guess @np
check fixes in socket TCP stack and decides which one can also impact
the Chelsio TOE TCP stack.  Some bugs are only in socket TCP stack, some
are only in TOE TCP stack.

--
Julien


--8aoSHgk0kipRFXJlwulQdE80ElhUCGXii--

--DGX6hbHiQg02lxOwk9lmjcNErfTg5diS7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJX/gA4AAoJEKVlQ5Je6dhxOMUH/iJ2YxJkegfdn6YiAfRffrNx
RT+E1JvhSQmITZ1wCSSZpAG+y4VjYaph5Ey8V429wenXcq47MDAcbqrqmmDhj82v
nNiFschA8A4lCRvZyPIrRV1UD+ojaE56ykzvrXqvf5JsdZ54oVzweTkEdP/ehnyk
XADKx2XrwAydr7L5/6a+3yu5EgiMU/6KMvxEGT0PVGw1Tyur4kQjnstfLPfo5u8m
miKVM8VFu5wewMn7FApb2amhBUGo0cSJZTDGSd+IMowZMuY8eB52IHLXUtJLEn6n
MNjEuETcy37T8EU9u8LbTMJgLi6sGWusRfSc1FZ2LWM3xiavbKsPwxVgSt6lERQ=
=7uvB
-----END PGP SIGNATURE-----

--DGX6hbHiQg02lxOwk9lmjcNErfTg5diS7--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f3c0e73a-5e6e-2190-aed3-499250c1764c>