From owner-freebsd-current@freebsd.org Fri Jun 17 04:53:27 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C749A7889C for ; Fri, 17 Jun 2016 04:53:27 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 1A43E12A1 for ; Fri, 17 Jun 2016 04:53:27 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 19992A7889A; Fri, 17 Jun 2016 04:53:27 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19368A78899; Fri, 17 Jun 2016 04:53:27 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebi.us (glebi.us [96.95.210.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 81A7A129F; Fri, 17 Jun 2016 04:53:26 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.15.2/8.15.2) with ESMTPS id u5H4rJH9075730 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 16 Jun 2016 21:53:19 -0700 (PDT) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebi.us (8.15.2/8.15.2/Submit) id u5H4rJvq075729; Thu, 16 Jun 2016 21:53:19 -0700 (PDT) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 16 Jun 2016 21:53:19 -0700 From: Gleb Smirnoff To: jch@FreeBSD.org, hselasky@FreeBSD.org Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Subject: panic with tcp timers Message-ID: <20160617045319.GE1076@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.6.1 (2016-04-27) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2016 04:53:27 -0000 Hi! At Netflix we are observing a race in TCP timers with head. The problem is a regression, that doesn't happen on stable/10. The panic usually happens after several hours at 55 Gbit/s of traffic. What happens is that tcp_timer_keep finds t_tcpcb being NULL. Some coredumps have tcpcb already initialized, with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which means that other CPU was working on the tcpcb while the faulted one was working on the panic. So, this all looks like a use after free, which conflicts with new allocation. Comparing stable/10 and head, I see two changes that could affect that: - callout_async_drain - switch to READ lock for inp info in tcp timers That's why you are in To, Julien and Hans :) We continue investigating, and I will keep you updated. However, any help is welcome. I can share cores. -- Totus tuus, Glebius.