From owner-freebsd-stable@freebsd.org Tue Oct 11 07:20:30 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2251C0D841 for ; Tue, 11 Oct 2016 07:20:30 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-lf0-f51.google.com (mail-lf0-f51.google.com [209.85.215.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 66A87CD7 for ; Tue, 11 Oct 2016 07:20:29 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-lf0-f51.google.com with SMTP id l131so26096676lfl.2 for ; Tue, 11 Oct 2016 00:20:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=LsjeWxOmrNaZOmpZND6GDYNQ/mJkAtvf7yii/iqaPyc=; b=Q36srr1P8RjyJMeAxtK1JbMvOWzXnacNtbj3o2rHpUi8ITrhOQUCsGmecqAfZimSuy QBWqBlUBDt9gpur6SsiWwFmnlGBrUfsH5jut4ZvVoovcu5ORcrccMNaSIdsKlUSzq5jd 67MJ810CVh866iyL9In1MUs0h2MLV5m2wmuYp69h5WqztsZqiKvX67s2fD9A5ZDwsb93 cPTyYlW+A2fUNWV8X0sK9gffIRBvPvwS9oQ2mkYayrH8jpM+NTlTLWf+uL/hgrCTZX4p 4cyForkiDQ6xUXINlDGPzZELAaHCGNA5VbUmRk8V94b8j7tXequZB88nx9qs+588pshU yOmA== X-Gm-Message-State: AA6/9Rk4cWTxqEYoTFQ+3+krDd8Z1zFElYsoWVa/3B8o/oUUaOi/NhylU0UcoVhnEr0JIA== X-Received: by 10.25.163.17 with SMTP id m17mr1625073lfe.161.1476170425560; Tue, 11 Oct 2016 00:20:25 -0700 (PDT) Received: from [10.100.64.17] ([217.30.88.7]) by smtp.gmail.com with ESMTPSA id h68sm353068lji.20.2016.10.11.00.20.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Oct 2016 00:20:24 -0700 (PDT) Subject: Re: 11.0 stuck on high network load To: Slawa Olhovchenkov References: <20160926172159.GA54003@zxy.spb.ru> <62453d9c-b1e4-1129-70ff-654dacea37f9@gmail.com> <20160928115909.GC54003@zxy.spb.ru> <20161006111043.GH54003@zxy.spb.ru> <1431484c-c00e-24c5-bd76-714be8ae5ed5@freebsd.org> <20161010133220.GU54003@zxy.spb.ru> <23f1200e-383e-befb-b76d-c88b3e1287b0@freebsd.org> <20161010142941.GV54003@zxy.spb.ru> <52d634aa-639c-bef7-1f10-c46dbadc4d85@freebsd.org> <20161010173531.GI6177@zxy.spb.ru> Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara From: Julien Charbon Message-ID: <8143cd8f-c007-2378-b004-b2b037402d03@freebsd.org> Date: Tue, 11 Oct 2016 09:20:17 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161010173531.GI6177@zxy.spb.ru> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="oq9rDkJ6isRwA1ckJwhFJjfq55qKQCU3W" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Oct 2016 07:20:31 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --oq9rDkJ6isRwA1ckJwhFJjfq55qKQCU3W Content-Type: multipart/mixed; boundary="6bRp3pkLFdj50Nvhlaf2QLjbLWh7J9Aq4"; protected-headers="v1" From: Julien Charbon To: Slawa Olhovchenkov Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara Message-ID: <8143cd8f-c007-2378-b004-b2b037402d03@freebsd.org> Subject: Re: 11.0 stuck on high network load References: <20160926172159.GA54003@zxy.spb.ru> <62453d9c-b1e4-1129-70ff-654dacea37f9@gmail.com> <20160928115909.GC54003@zxy.spb.ru> <20161006111043.GH54003@zxy.spb.ru> <1431484c-c00e-24c5-bd76-714be8ae5ed5@freebsd.org> <20161010133220.GU54003@zxy.spb.ru> <23f1200e-383e-befb-b76d-c88b3e1287b0@freebsd.org> <20161010142941.GV54003@zxy.spb.ru> <52d634aa-639c-bef7-1f10-c46dbadc4d85@freebsd.org> <20161010173531.GI6177@zxy.spb.ru> In-Reply-To: <20161010173531.GI6177@zxy.spb.ru> --6bRp3pkLFdj50Nvhlaf2QLjbLWh7J9Aq4 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Slawa, On 10/10/16 7:35 PM, Slawa Olhovchenkov wrote: > On Mon, Oct 10, 2016 at 05:44:21PM +0200, Julien Charbon wrote: >>>> can check the current other usages of goto findpcb in tcp_input(). = The >>>> rational here being: >>>> >>>> - Behavior before the patch: If the inp we found was deleted then = goto >>>> findpcb. >>>> - Behavior after the patch: If the inp we found was deleted or dro= pped >>>> then goto findpcb. >>>> >>>> I just prefer having the same behavior applied everywhere: If >>>> tcp_input() loses the inp lock race and the inp was deleted or dropp= ed >>>> then retry to find a new inpcb to deliver to. >>>> >>>> But you are right dropping the packet here will also fix the issue.= >>>> >>>> Then the review process becomes quite helpful because people can ar= gue: >>>> Dropping here is better because "blah", or goto findpcb is better >>>> because "bluh", etc. And at the review end you have a nice final pa= tch. >>>> >>>> https://reviews.freebsd.org/D8211 >>> >>> I am not sure, I am see to >>> >>> sys/netinet/in_pcb.h:#define INP_DROPPED 0x04000000 /*= >> protocol drop flag */ >>> >>> and think this is a flag 'all packets must be droped' >> >> Hm, I believe this flag means "this inp has been dropped by the TCP >> stack, so don't use it anymore". Actually this flag is better describ= ed >> in the function that sets it: >> >> "(INP_DROPPED) is used by TCP to mark an inpcb as unused and avoid >> future packet delivery or event notification when a socket remains ope= n >> but TCP has closed." >> >> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_= pcb.c#L1320 >> >> /* >> * in_pcbdrop() removes an inpcb from hashed lists, releasing its >> address and >> * port reservation, and preventing it from being returned by inpcb lo= okups. >> * >> * It is used by TCP to mark an inpcb as unused and avoid future packe= t >> * delivery or event notification when a socket remains open but TCP h= as >> * closed. This might occur as a result of a shutdown()-initiated TCP= close >> * or a RST on the wire, and allows the port binding to be reused whil= e >> still >> * maintaining the invariant that so_pcb always points to a valid inpc= b >> until >> * in_pcbdetach(). >> * >> */ >> void >> in_pcbdrop(struct inpcb *inp) >> { >> inp->inp_flags |=3D INP_DROPPED; >> ... >> >> The classical example where "goto findpcb" is useful: You receive a >> new connection request with a TCP SYN packet and this packet is unluck= y >> and reached a inp being dropped: >> >> - with "goto findpcb" approach, the next lookup will most likely find= >> the LISTEN inp and start the TCP hand-shake as usual >> - with "drop the packet" approach, the TCP client will need to >> re-transmit a TCP SYN packet >> >> It is not because a packet was unlucky once that it deserves to be >> dropped. :) >=20 > Thanks for explaining, very helpfull. > In this situation (TCP SYN with same 4-tuple as existing socket) > allocate new PCB is best. But for this we must destroy current PCB. I > am think INP_WUNLOCK(inp) don't destroy it and in_pcblookup_mbuf find > it again (I am think in_pcblookup_mbuf find this PCB on first turn). > I am assume for classical example in_pcbrele_wlocked(inp) free and > destroy current PCB for possibility in_pcblookup_mbuf allocate new > one. Astute question: Here, the same inp cannot be find again by in_pcblookup_mbuf(), the explanation is a bit long though: in_pcbdrop() does two things under INP_WLOCK lock: https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb= =2Ec#L1334 1. Add INP_DROPPED flag 2. Remove the inp from the TCP hash table And once removed from the TCP hash table, in_pcblookup_mbuf() will return NULL when doing the same TCP hash table lookup again. It means that under a INP_WLOCK lock, these two changes are atomic: - Either an inp does not have INP_DROPPED flag and can be in TCP hash ta= ble - Either an inp has INP_DROPPED flag and is not in TCP hash table But when you don't have the INP_WLOCK lock, then you can witness the intermediate state where a inp is still in TCP hash table while a thread is adding the INP_DROPPED flag. Nothing unusual here. Then threads are competing for the INP_WLOCK lock. For the example, let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() and racing for this INP_WLOCK: https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb= =2Ec#L1964 And thread B wants to run tcp_timer_2msl()/tcp_close()/in_pcbdrop() and racing for this INP_WLOCK: https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_ti= mer.c#L323 That leads to two cases: o Thread A wins the race: Thread A will continue tcp_input() as usal and INP_DROPPED flags is not set and inp is still in TCP hash table. Thread B is waiting on thread A to release INP_WLOCK after finishing tcp_input() processing, and thread B will continue tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing. o Thread B wins the race: Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp INP_DROPPED is set and inp being removed from TCP hash table. In parallel, thread A has found the inp in TCP hash before is was removed, and waiting on the found inp INP_WLOCK lock. Once thread B has released the INP_WLOCK lock, thread A gets this lock and sees the INP_DROPPED flag and do "goto findpcb" but here because the inp is not more in TCP hash table and it will not be find again by in_pcblookup_mbuf(). Hopefully I am clear enough here. -- Julien --6bRp3pkLFdj50Nvhlaf2QLjbLWh7J9Aq4-- --oq9rDkJ6isRwA1ckJwhFJjfq55qKQCU3W Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJX/JK2AAoJEKVlQ5Je6dhxNxkH/RhKPtBgDzP+NpnwwumbRhEr PrbYsnFOKL60VpPZfbvY4vz7TNclgmaaS2VIRcuJphuJr2k6MfkETyFd7QSBqSQV vg4tAvNVXbNqG+Ubvrc2KvY/QeK+FbJc1aYlzZ9xS5trHLbf0uRjoIg+KsC+gyBA VPLizfIGdN+8tHFs0FKAuirfPgiqNBBgUDaTCqh0l8DOsRp3/QLPs05sok6UJAtF ooaba6b4WyFe1DCQawMvJM4JDZIrGMy3qjzZYQppM8FxYbbl3yCAqcTDV0wgksXF TViCJYuRb+xYbQNdvnOSPZUBGbjsZYa8UXq4fmqva9urD7UllFpiDzKJ8uPJBIw= =5V+B -----END PGP SIGNATURE----- --oq9rDkJ6isRwA1ckJwhFJjfq55qKQCU3W--