From owner-freebsd-net@freebsd.org Fri Jun 17 14:51:22 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72599A77F94 for ; Fri, 17 Jun 2016 14:51:22 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 515972BF5 for ; Fri, 17 Jun 2016 14:51:22 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 4D900A77F92; Fri, 17 Jun 2016 14:51:22 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A827A77F90; Fri, 17 Jun 2016 14:51:22 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D28BF2BF3; Fri, 17 Jun 2016 14:51:21 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-wm0-f51.google.com with SMTP id a66so3028224wme.0; Fri, 17 Jun 2016 07:51:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=6OY9gjWbYgOcgcVDQVySHWPAeT0fFPIwXKTx6HtXeBo=; b=MCTi+qXHRsFeL/RpIYwFIa7gvXMb8DDbKMN/EcGXDlaUHnwMkD8g/ZE4PwtH7laQS9 aapzakL7GYUUNC/uV0+rH/7zq037ff1mDoMQpsXhJSDjt/wtx2g8DLd8Nlgz/SDFI/g4 PgmMCZr86KN3dWkHZmBCAJQINXO+4vLbLUG1wct9kCKdLIqd7BlLEHcvJtF7tHboPCD5 aQ92HTqslu3sKVcPU9cfXHdGy1oJQmktZKFCqVatQdNpWeSjbUloPZ+9JkPS0+0FvLgt k94iLFVmtstC4tLhGeDMes5EQSilfPZPBtMvg7r6DfBSO79F800fRFXIed73B0uggtTl dNaw== X-Gm-Message-State: ALyK8tJ1EvphMzuptAYnb+JiXxgHbcnWlE6CJ38Sg0VBf8T1Fqlrx0VX40ZEWX+FQFk3kQ== X-Received: by 10.194.151.73 with SMTP id uo9mr1199856wjb.177.1466155666454; Fri, 17 Jun 2016 02:27:46 -0700 (PDT) Received: from [10.100.64.43] ([217.30.88.7]) by smtp.gmail.com with ESMTPSA id x124sm19060810wmg.24.2016.06.17.02.27.44 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 17 Jun 2016 02:27:45 -0700 (PDT) Subject: Re: panic with tcp timers To: Gleb Smirnoff , hselasky@FreeBSD.org References: <20160617045319.GE1076@FreeBSD.org> Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org From: Julien Charbon Message-ID: <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> Date: Fri, 17 Jun 2016 11:27:39 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20160617045319.GE1076@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2016 14:51:22 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV Content-Type: multipart/mixed; boundary="iNgJ25Kd6kg6dTtPv0T8RCFla3TDH3fl9" From: Julien Charbon To: Gleb Smirnoff , hselasky@FreeBSD.org Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Message-ID: <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> Subject: Re: panic with tcp timers References: <20160617045319.GE1076@FreeBSD.org> In-Reply-To: <20160617045319.GE1076@FreeBSD.org> --iNgJ25Kd6kg6dTtPv0T8RCFla3TDH3fl9 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Gleb, On 6/17/16 6:53 AM, Gleb Smirnoff wrote: > At Netflix we are observing a race in TCP timers with head. > The problem is a regression, that doesn't happen on stable/10. > The panic usually happens after several hours at 55 Gbit/s of > traffic. >=20 > What happens is that tcp_timer_keep finds t_tcpcb being > NULL. Some coredumps have tcpcb already initialized, > with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which > means that other CPU was working on the tcpcb while > the faulted one was working on the panic. So, this all looks > like a use after free, which conflicts with new allocation. >=20 > Comparing stable/10 and head, I see two changes that could > affect that: >=20 > - callout_async_drain > - switch to READ lock for inp info in tcp timers >=20 > That's why you are in To, Julien and Hans :) >=20 > We continue investigating, and I will keep you updated. > However, any help is welcome. I can share cores. Thanks for sharing. Let me run our TCP tests on a recent version of HEAD to see if by chance I can reproduce it. If I am not able to reproduce it I will ask for debug kernel and cores and see if I can help.= Few notes here: - Around 2 months ago I did test HEAD with callout_async_drain() in TCP timers with our TCP QA testsuite but no kernel panic. That said I did not let our test run during several hours. - At Verisign we run 10 with READ lock for inp info in tcp timers change. Again, it does not mean this change has no impact here. My 2 cents. -- Julien --iNgJ25Kd6kg6dTtPv0T8RCFla3TDH3fl9-- --FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJXY8KPAAoJEKVlQ5Je6dhxY2YH/RMWLRYKV0VwtKNw6YgGhLss JaZhOzuHg6W751fBk1LXGJp1pg3CICVMtRX7jQVtGVjAPiT4en6M0M2DzHlgb8un IFUfnwAfP9DSdIpclzc8vOci4QBI3inziIuQ5vLDayuExS1gswZk8fRSkW9BroVu 4TVIPk7vVLyK5bo/VlWK8e1+d5Ypdd+2rGKPinB28GVmBwejWf0GnTV80O/Qr2JE jBldQM44ZU0nnxUj/yIq8NiswoTGQxdx2h4KPnCLIe+BJ6lygYMwrg8LdGbH/359 s0yiJoiwhPAmhvaS73dPmps7WUtS2e+QPq001r+IdNebWjXW8OwvbExGNHrH8pQ= =PA0C -----END PGP SIGNATURE----- --FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV--