FreeBSD Mail Archives

Date:      Fri, 17 Jun 2016 13:10:57 +0200
From:      Hans Petter Selasky <hps@selasky.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>, jch@FreeBSD.org
Cc:        rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Subject:   Re: panic with tcp timers
Message-ID:  <b2443a62-cd50-fca4-9436-aeb03f278bd7@selasky.org>
In-Reply-To: <20160617045319.GE1076@FreeBSD.org>
References:  <20160617045319.GE1076@FreeBSD.org>

On 06/17/16 06:53, Gleb Smirnoff wrote:
>   Hi!
>
>   At Netflix we are observing a race in TCP timers with head.
> The problem is a regression, that doesn't happen on stable/10.
> The panic usually happens after several hours at 55 Gbit/s of
> traffic.
>
> What happens is that tcp_timer_keep finds t_tcpcb being
> NULL. Some coredumps have tcpcb already initialized,
> with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which
> means that other CPU was working on the tcpcb while
> the faulted one was working on the panic. So, this all looks
> like a use after free, which conflicts with new allocation.
>
> Comparing stable/10 and head, I see two changes that could
> affect that:
>
> - callout_async_drain
> - switch to READ lock for inp info in tcp timers
>
> That's why you are in To, Julien and Hans :)
>
> We continue investigating, and I will keep you updated.
> However, any help is welcome. I can share cores.
>

Hi,

I do have projects/hps_head around, which is not that much behind 
11-current, which has a completely different callout implementation. If 
you can reproduce the issue separately might we worth a try to rule out 
the callout stack.

--HPS

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b2443a62-cd50-fca4-9436-aeb03f278bd7>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation