Date: Fri, 17 Jun 2016 13:10:57 +0200 From: Hans Petter Selasky <hps@selasky.org> To: Gleb Smirnoff <glebius@FreeBSD.org>, jch@FreeBSD.org Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Subject: Re: panic with tcp timers Message-ID: <b2443a62-cd50-fca4-9436-aeb03f278bd7@selasky.org> In-Reply-To: <20160617045319.GE1076@FreeBSD.org> References: <20160617045319.GE1076@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 06/17/16 06:53, Gleb Smirnoff wrote: > Hi! > > At Netflix we are observing a race in TCP timers with head. > The problem is a regression, that doesn't happen on stable/10. > The panic usually happens after several hours at 55 Gbit/s of > traffic. > > What happens is that tcp_timer_keep finds t_tcpcb being > NULL. Some coredumps have tcpcb already initialized, > with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which > means that other CPU was working on the tcpcb while > the faulted one was working on the panic. So, this all looks > like a use after free, which conflicts with new allocation. > > Comparing stable/10 and head, I see two changes that could > affect that: > > - callout_async_drain > - switch to READ lock for inp info in tcp timers > > That's why you are in To, Julien and Hans :) > > We continue investigating, and I will keep you updated. > However, any help is welcome. I can share cores. > Hi, I do have projects/hps_head around, which is not that much behind 11-current, which has a completely different callout implementation. If you can reproduce the issue separately might we worth a try to rule out the callout stack. --HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b2443a62-cd50-fca4-9436-aeb03f278bd7>