Date: Thu, 16 Jun 2016 21:53:19 -0700 From: Gleb Smirnoff <glebius@FreeBSD.org> To: jch@FreeBSD.org, hselasky@FreeBSD.org Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Subject: panic with tcp timers Message-ID: <20160617045319.GE1076@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
Hi! At Netflix we are observing a race in TCP timers with head. The problem is a regression, that doesn't happen on stable/10. The panic usually happens after several hours at 55 Gbit/s of traffic. What happens is that tcp_timer_keep finds t_tcpcb being NULL. Some coredumps have tcpcb already initialized, with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which means that other CPU was working on the tcpcb while the faulted one was working on the panic. So, this all looks like a use after free, which conflicts with new allocation. Comparing stable/10 and head, I see two changes that could affect that: - callout_async_drain - switch to READ lock for inp info in tcp timers That's why you are in To, Julien and Hans :) We continue investigating, and I will keep you updated. However, any help is welcome. I can share cores. -- Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160617045319.GE1076>