From owner-freebsd-net@freebsd.org Mon Jun 20 16:48:50 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4545AC40DE for ; Mon, 20 Jun 2016 16:48:50 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id AD57A1314 for ; Mon, 20 Jun 2016 16:48:50 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id A6672AC40DC; Mon, 20 Jun 2016 16:48:50 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A58F2AC40DA; Mon, 20 Jun 2016 16:48:50 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com [74.125.82.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E2621310; Mon, 20 Jun 2016 16:48:50 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-wm0-f48.google.com with SMTP id r201so70242096wme.1; Mon, 20 Jun 2016 09:48:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=dspSP47DsoxtDUjUNH0mSVqm+HvYa1GNboskfxte0EI=; b=Btp3NJySvqzKM1XTn8aYcF+knIsTgIf5GIfUWNchIs9Sw3VCFoM/9OcXODBB0/ZHRA lNJLIZQzZ65/FQ+6r3auXjP0OLo4Y21BAnvC9JmTlNAm4sUwqUgpfDIyKuXj3RAGPS3F oPOuKJAJK2NuO4cQ5WLPW8L1b0NM5hSFMiwv4m/jVBfhADpit8VN6s4j43tQ4wvWtomY ibBkIsSudPkn0pFYvxQ++ILWFIMHPdloq62Dp3ymkSejy3vz2XYqdIfX3j6mekNtQvcs iH+mjza6imYON6Uagk0JYEXv44WNaQE6yai4at+f8sJ7UuV72vkTxT1yvllMaO99LY8W D/dw== X-Gm-Message-State: ALyK8tLSWFTWeBZE7oeQ0pNiZYcWHwnx11ar+huVAIQI15PYm5H+Toku8Y+qp5P7wpuB/Q== X-Received: by 10.28.45.201 with SMTP id t192mr12821342wmt.77.1466441317441; Mon, 20 Jun 2016 09:48:37 -0700 (PDT) Received: from [172.20.10.4] (48.228.197.178.dynamic.wless.zhbmb00p-cgnat.res.cust.swisscom.ch. [178.197.228.48]) by smtp.gmail.com with ESMTPSA id t198sm14394128wmt.16.2016.06.20.09.48.36 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 20 Jun 2016 09:48:36 -0700 (PDT) Subject: Re: panic with tcp timers To: Gleb Smirnoff , Hans Petter Selasky References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> <20160620095822.GJ1076@FreeBSD.org> <20160620103015.GK1076@FreeBSD.org> Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org From: Julien Charbon Message-ID: <7294457d-07c0-2d6e-617b-15fa48bf2bb9@freebsd.org> Date: Mon, 20 Jun 2016 18:48:27 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20160620103015.GK1076@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 16:48:50 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc Content-Type: multipart/mixed; boundary="Mvq9GL89TXjDlopGgmEC0W8JMOWLNG1QO" From: Julien Charbon To: Gleb Smirnoff , Hans Petter Selasky Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Message-ID: <7294457d-07c0-2d6e-617b-15fa48bf2bb9@freebsd.org> Subject: Re: panic with tcp timers References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> <20160620095822.GJ1076@FreeBSD.org> <20160620103015.GK1076@FreeBSD.org> In-Reply-To: <20160620103015.GK1076@FreeBSD.org> --Mvq9GL89TXjDlopGgmEC0W8JMOWLNG1QO Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, On 6/20/16 12:30 PM, Gleb Smirnoff wrote: > On Mon, Jun 20, 2016 at 12:14:18PM +0200, Hans Petter Selasky wrote: > H> On 06/20/16 11:58, Gleb Smirnoff wrote: > H> > The fix I am working on now is doing exactly that. callout_reset m= ust > H> > return 0 if the callout is currently running. > H> > > H> > What are the old paths impacted? > H>=20 > H> I'll dig into the matter aswell and give some comments. Thanks for t= he=20 > H> analysis, Gleb. > H>=20 > H> FYI: This class of problems wouldn't exist if the callout could be=20 > H> associated with a mutex! >=20 > Exactly! I am convinced that all callouts should be locked, and non-loc= ked > one should simply go away, as well as async drain. >=20 > What does prevent us from converting TCP timeouts to locked? To my > understanding it is the lock order of taking pcbinfo after pcb lock. >=20 > I'm now trying to understand Julien's conversion from pcbinfo lock > to pcbinfo + pcblist locks. It seems to me that we actually can convert= > TCP timers to locked callouts. >=20 > What for do we need pcbinfo read lock in a tcp timer? The timer works > only on the pcb and doesn't modify global lists, does it? tcp_timer_keep() for example can modify global pcb list, see the call stack below: tcp_timer_keep() tcp_drop() tcp_close() sofree() tcp_usr_detach() (via pr->pr_usrreqs->pru_detach() in sofree()) tcp_detach() in_pcbfree() in_pcbremlists() Anyway, a bit of history here: o In stable/10 the TCP locking order is: ipi_lock (before) inpcb lock and in stable/10 ipi_lock is protecting the global pcb list. Then, only in the cases where you were absolutely sure you are _not_ going to modify the global pcb list you are allowed to take the inpcb lock only. For all the other cases, you have to take the write lock on ipi_lock first due to lock order. And in context of short-lived TCP connection of the 5 received packets for a TCP connection, 4 require the write ipi_lock lock, and that's does not scale well. Lesson learned: For scaling perspective, in lock ordering always put the most global lock last. It was improved in a lean way in 11: o In 11 the TCP lock order became: ipi_lock (before) inpcb lock (before) ipi_list And in 11 ipi_list protects global pcb list, and only the code actually modifying the global list is protected by a global write lock, e.g.: https://github.com/freebsd/freebsd/blob/master/sys/netinet/in_pcb.c#L1285= Then why keeping the ipi_lock lock at all now? It is solely for one case: When you need the complete stability of the full pcb list while traversing it. These full pcb list traversals are done in functions like: in_pcbnotifyall(), in_pcbpurgeif0(), inp_apply_all(), tcp_ccalgounload(), tcp_drain(), etc. Thus in 11 ipi_lock write lock is required only when you do this full traversal with list stability requirement, and the ipi_lock read lock in all other cases like tcp_input() that then scales better. Sadly in 11, you cannot use the inpcb lock as is for the TCP timers callout, because like tcp_timer_keep() most TCP timers callout can modify the global pcb list and then you need the read lock ipi_lock in top of inpcb lock... o Future/12: The next natural step is to remove the ipi_lock from the picture to get:= inpcb lock (before) ipi_list It /just/ requires a global pcb list that allows addition/deletion while doing a full traversal concurrently. A classical way to achieve that, is to use a RCU-based list. As soon as RCU (or anything equivalent like lwref) are supported in kernel, we will implement this change. Just history here, as I already did a presentation on this exact topic at BSDCan 2015: https://wiki.freebsd.org/201506DevSummit#line-83 It was recorded but I never saw the footage/presentation actually published. :) -- Julien --Mvq9GL89TXjDlopGgmEC0W8JMOWLNG1QO-- --1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJXaB5jAAoJEKVlQ5Je6dhxuN8IAKeytbxsVb7bEK4iGl3HDY8X RRD4rBqamc4cVTzzLn75LpOkIXQZhj6SzaZKO5AfTjZ5eUSuabYubM3wcpai9swx UEoBCnjocNWjxJcElRwdN0CNgj0VtGPlNycWe+d8p3hRGqy14IdGEQ74+ru7ryC8 uGg/eJ0e73YmjVOZCthHHqyH1K+JLild16ZSH+1uFJt+KaPJjLBg5iQfXvM7t1HM yP7HvACAaQXFrUdw/L51pffdxOAwG1FHieQMw2Lu8GuoCn7dJ4hlxumETJpcPJY9 Torn0bCwg0fdSLQiHJbGLE5VywXzcIdb0KAxLBO+k6leCKbnU9lEkxkp7uihB6U= =xbUt -----END PGP SIGNATURE----- --1Ta12WsRFnONHaUgmQbLo6tFj62n4m1rc--