From owner-freebsd-net@freebsd.org Fri Jun 17 14:41:19 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92A6CA77CD3 for ; Fri, 17 Jun 2016 14:41:19 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 785182642 for ; Fri, 17 Jun 2016 14:41:19 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: by mailman.ysv.freebsd.org (Postfix) id 77AA2A77CD1; Fri, 17 Jun 2016 14:41:19 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 772CDA77CD0; Fri, 17 Jun 2016 14:41:19 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3A55A2640; Fri, 17 Jun 2016 14:41:19 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id AAE3925D388E; Fri, 17 Jun 2016 14:41:06 +0000 (UTC) Received: from content-filter.sbone.de (content-filter.sbone.de [IPv6:fde9:577b:c1a9:31::2013:2742]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id D27C9D1F8BC; Fri, 17 Jun 2016 14:41:05 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587]) by content-filter.sbone.de (content-filter.sbone.de [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024) with ESMTP id uNrClogwaQYJ; Fri, 17 Jun 2016 14:41:04 +0000 (UTC) Received: from [192.168.124.1] (unknown [IPv6:fde9:577b:c1a9:4410:392c:10a8:d881:6c0f]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id AAC8CD1F7FA; Fri, 17 Jun 2016 14:41:03 +0000 (UTC) From: "Bjoern A. Zeeb" To: "Gleb Smirnoff" Cc: jch@FreeBSD.org, hselasky@FreeBSD.org, rrs@FreeBSD.org, current@FreeBSD.org, net@FreeBSD.org Subject: Re: panic with tcp timers Date: Fri, 17 Jun 2016 14:41:02 +0000 Message-ID: In-Reply-To: <20160617045319.GE1076@FreeBSD.org> References: <20160617045319.GE1076@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailer: MailMate Trial (2.0BETAr6032) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2016 14:41:19 -0000 On 17 Jun 2016, at 4:53, Gleb Smirnoff wrote: > Hi! > > At Netflix we are observing a race in TCP timers with head. > The problem is a regression, that doesn't happen on stable/10. > The panic usually happens after several hours at 55 Gbit/s of > traffic. > > What happens is that tcp_timer_keep finds t_tcpcb being > NULL. Some coredumps have tcpcb already initialized, > with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which > means that other CPU was working on the tcpcb while > the faulted one was working on the panic. So, this all looks > like a use after free, which conflicts with new allocation. > > Comparing stable/10 and head, I see two changes that could > affect that: > > - callout_async_drain > - switch to READ lock for inp info in tcp timers > > That's why you are in To, Julien and Hans :) > > We continue investigating, and I will keep you updated. > However, any help is welcome. I can share cores. There’s also the change to no longer mark the zones NO_FREE. In theory I was convinced at the time that it should not be an issue anymore. If I had overlooked something or follow-up timer changes invalidated assumptions then that could also be trouble. That said, I was not able to get any related panics or log entries anymore lately (but I am currently slightly behind head with my branch). We should get the problem fixed however and not try to “paint over” again. /bz