From owner-freebsd-net@freebsd.org Fri Jun 17 11:07:31 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1F89A78EA0 for ; Fri, 17 Jun 2016 11:07:31 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id A050D1846 for ; Fri, 17 Jun 2016 11:07:31 +0000 (UTC) (envelope-from hps@selasky.org) Received: by mailman.ysv.freebsd.org (Postfix) id 99596A78E9E; Fri, 17 Jun 2016 11:07:31 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9892CA78E9C; Fri, 17 Jun 2016 11:07:31 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6169B1844; Fri, 17 Jun 2016 11:07:30 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 6383E1FE023; Fri, 17 Jun 2016 13:07:22 +0200 (CEST) Subject: Re: panic with tcp timers To: Gleb Smirnoff , jch@FreeBSD.org References: <20160617045319.GE1076@FreeBSD.org> Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org From: Hans Petter Selasky Message-ID: Date: Fri, 17 Jun 2016 13:10:57 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: <20160617045319.GE1076@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2016 11:07:31 -0000 On 06/17/16 06:53, Gleb Smirnoff wrote: > Hi! > > At Netflix we are observing a race in TCP timers with head. > The problem is a regression, that doesn't happen on stable/10. > The panic usually happens after several hours at 55 Gbit/s of > traffic. > > What happens is that tcp_timer_keep finds t_tcpcb being > NULL. Some coredumps have tcpcb already initialized, > with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which > means that other CPU was working on the tcpcb while > the faulted one was working on the panic. So, this all looks > like a use after free, which conflicts with new allocation. > > Comparing stable/10 and head, I see two changes that could > affect that: > > - callout_async_drain > - switch to READ lock for inp info in tcp timers > > That's why you are in To, Julien and Hans :) > > We continue investigating, and I will keep you updated. > However, any help is welcome. I can share cores. > Hi, I do have projects/hps_head around, which is not that much behind 11-current, which has a completely different callout implementation. If you can reproduce the issue separately might we worth a try to rule out the callout stack. --HPS