From owner-freebsd-net@freebsd.org Sat Jun 25 08:48:55 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AFB90B7324A for ; Sat, 25 Jun 2016 08:48:55 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 8DD2C1D6C for ; Sat, 25 Jun 2016 08:48:55 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mailman.ysv.freebsd.org (Postfix) id 89381B73248; Sat, 25 Jun 2016 08:48:55 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88E64B73247 for ; Sat, 25 Jun 2016 08:48:55 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mail-vk0-x22c.google.com (mail-vk0-x22c.google.com [IPv6:2607:f8b0:400c:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 412E61D66 for ; Sat, 25 Jun 2016 08:48:55 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mail-vk0-x22c.google.com with SMTP id c2so148289086vkg.1 for ; Sat, 25 Jun 2016 01:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=tpBM99m1BKG3QPoXTcRmsr7jEgv/ta+6xDZKRAKia1Y=; b=bCEpaLnVowICR96R1x9BJjRu7d6E5Byl+h3uxJYmaq7ApBRu3iQkeTJQ6ZtxQq1isW 6Fyj4Pa8fBkRdeE8goX1wGSm3N0kT+LPitT9RkwPGiqq0PtsbH417Hng2JTFmlMT2qQ/ 49OWYk7EJXyEVgvpsHz63CIfaPGy4lRC8tR2A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=tpBM99m1BKG3QPoXTcRmsr7jEgv/ta+6xDZKRAKia1Y=; b=PxwGlONpRu/CI21sZgTzPqgbiwY71zBzbrpSMSX7lgopUm7OqNLcGu1R9gk2KfOG5w xBqDWNfPU6MMrKj6KCMhIIIAcJL0tLDUiFatCWThUGE22BE31JfV5rqyRwnJ/e5jia2F M9I3bDTR8VfunhVSTLCT9AvT2e0N2n898tLf9OuBsrzsXH5gBgdbsFxMp0raEyF5sfun Q6op541GzDo7ZrBHRadERtKU6sbnjS25Z8bHhB+czzAaGir+kFBxf9QLy4BXS8HYrlDa h+tb7wbbzkq8EgAC7Y8YwbrrPtp+ml7ZfSuQk8R7B2wpar650n1DTa4I/mg55E9JKldG ie6Q== X-Gm-Message-State: ALyK8tIvqzZbrYJ8ykPMSUs7Oep0rrSkMSt8b0e1XCDTtJiKHocCQkUNffKULtWQtufDhvb1 X-Received: by 10.176.64.7 with SMTP id h7mr4365555uad.153.1466844534204; Sat, 25 Jun 2016 01:48:54 -0700 (PDT) Received: from [100.127.86.242] ([69.53.246.16]) by smtp.gmail.com with ESMTPSA id g4sm2820025vkb.3.2016.06.25.01.48.52 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 25 Jun 2016 01:48:53 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: panic with tcp timers From: Randall Stewart In-Reply-To: <74bb31b7-a9f5-3d0c-eea0-681872e6f09b@freebsd.org> Date: Sat, 25 Jun 2016 04:48:51 -0400 Cc: Gleb Smirnoff , Randall Ray Stewart , hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Content-Transfer-Encoding: quoted-printable Message-Id: <18D94615-810E-4E79-A889-4B0CC70F9E45@netflix.com> References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> <20160620095822.GJ1076@FreeBSD.org> <74bb31b7-a9f5-3d0c-eea0-681872e6f09b@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2016 08:48:55 -0000 So All of our timers in TCP do something like --------------------- INFO-LOCK INP_WLOCK if (inp needs to be dropped) { drop-it } do other work UNLOCK-INP UNLOCK-INFO ------------------ And generally the path =93inp needs to be dropped=94 is rarely taken. So why don=92t we change the procedure to something like: INP_WLOCK if (inp needs to be dropped) { inp_ref() INP_WUNLOCK() INFO_LOCK() INP_WLOCK() if (inp_release_ref) { /* victory its dead */ INFO_UNLOCK return } do-the-drop } This way we would only go grab the INFO lock in those rarer cases when we *do* actually want to kill the tcb and at the same time it would make the current callout system work correctly.. which as many have pointed out would be much better if we could just let the lock be gotten by the callout system. Hmm maybe with this scheme we could even do that... R > On Jun 20, 2016, at 1:45 PM, Julien Charbon wrote: >=20 >=20 > Hi, >=20 > On 6/20/16 11:58 AM, Gleb Smirnoff wrote: >> On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote: >> J> > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: >> J> > J> > Comparing stable/10 and head, I see two changes that could >> J> > J> > affect that: >> J> > J> >=20 >> J> > J> > - callout_async_drain >> J> > J> > - switch to READ lock for inp info in tcp timers >> J> > J> >=20 >> J> > J> > That's why you are in To, Julien and Hans :) >> J> > J> >=20 >> J> > J> > We continue investigating, and I will keep you updated. >> J> > J> > However, any help is welcome. I can share cores. >> J> >=20 >> J> > Now, spending some time with cores and adding a bunch of >> J> > extra CTRs, I have a sequence of events that lead to the >> J> > panic. In short, the bug is in the callout system. It seems >> J> > to be not relevant to the callout_async_drain, at least for >> J> > now. The transition to READ lock unmasked the problem, that's >> J> > why NetflixBSD 10 doesn't panic. >> J> >=20 >> J> > The panic requires heavy contention on the TCP info lock. >> J> >=20 >> J> > [CPU 1] the callout fires, tcp_timer_keep entered >> J> > [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo); >> J> > [CPU 2] schedules the callout >> J> > [CPU 2] tcp_discardcb called >> J> > [CPU 2] callout successfully canceled >> J> > [CPU 2] tcpcb freed >> J> > [CPU 1] unblocks... panic >> J> >=20 >> J> > When the lock was WLOCK, all contenders were resumed in a >> J> > sequence they came to the lock. Now, that they are readers, >> J> > once the lock is released, readers are resumed in a "random" >> J> > order, and this allows tcp_discardcb to go before the old >> J> > running callout, and this unmasks the panic. >> J>=20 >> J> Highly interesting. I should be able to reproduce that (will be = useful >> J> for testing the corresponding fix). >> J>=20 >> J> Fix proposal: If callout_async_drain() returns 0 (fail) (instead = of 1 >> J> (success) here) when the callout cancellation is a success _but_ = the >> J> callout is current running, that should fix it. >> J>=20 >> J> For the history: It comes back to my old callout question: >> J>=20 >> J> Does _callout_stop_safe() is allowed to return 1 (success) even = if the >> J> callout is still currently running; a.k.a. it is not because you >> J> successfully cancelled a callout that the callout is not currently = running. >> J>=20 >> J> We did propose a patch to make _callout_stop_safe() returns 0 = (fail) >> J> when the callout is currently running: >> J>=20 >> J> callout_stop() should return 0 when the callout is currently being >> J> serviced and indeed unstoppable >> J> = https://reviews.freebsd.org/differential/changeset/?ref=3D62513&whitespace= =3Dignore-most >> J>=20 >> J> But this change impacted too many old code paths and was = interesting >> J> only for TCP timers and thus was abandoned. >>=20 >> The fix I am working on now is doing exactly that. callout_reset must >> return 0 if the callout is currently running. >>=20 >> What are the old paths impacted? >=20 > Actually all the paths that check the callout_stop() return value AND > call both callout_reset() and callout_stop() AND use mpsafe callout(). > And for each path, we would have to check our patch was ok (or not). >=20 > Thus, if you only do the change in callout_async_drain() context, you > don't impact the "old" callout_stop() behavior and get the desired > behavior for the TCP timers. Might be a good trade-off... >=20 > My 2 cents. >=20 > -- > Julien -------- Randall Stewart rrs@netflix.com 803-317-4952