Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Jun 2016 11:27:39 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>, hselasky@FreeBSD.org
Cc:        rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Subject:   Re: panic with tcp timers
Message-ID:  <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org>
In-Reply-To: <20160617045319.GE1076@FreeBSD.org>
References:  <20160617045319.GE1076@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV
Content-Type: multipart/mixed; boundary="iNgJ25Kd6kg6dTtPv0T8RCFla3TDH3fl9"
From: Julien Charbon <jch@freebsd.org>
To: Gleb Smirnoff <glebius@FreeBSD.org>, hselasky@FreeBSD.org
Cc: rrs@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Message-ID: <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org>
Subject: Re: panic with tcp timers
References: <20160617045319.GE1076@FreeBSD.org>
In-Reply-To: <20160617045319.GE1076@FreeBSD.org>

--iNgJ25Kd6kg6dTtPv0T8RCFla3TDH3fl9
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi Gleb,

On 6/17/16 6:53 AM, Gleb Smirnoff wrote:
>   At Netflix we are observing a race in TCP timers with head.
> The problem is a regression, that doesn't happen on stable/10.
> The panic usually happens after several hours at 55 Gbit/s of
> traffic.
>=20
> What happens is that tcp_timer_keep finds t_tcpcb being
> NULL. Some coredumps have tcpcb already initialized,
> with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which
> means that other CPU was working on the tcpcb while
> the faulted one was working on the panic. So, this all looks
> like a use after free, which conflicts with new allocation.
>=20
> Comparing stable/10 and head, I see two changes that could
> affect that:
>=20
> - callout_async_drain
> - switch to READ lock for inp info in tcp timers
>=20
> That's why you are in To, Julien and Hans :)
>=20
> We continue investigating, and I will keep you updated.
> However, any help is welcome. I can share cores.

 Thanks for sharing.  Let me run our TCP tests on a recent version of
HEAD to see if by chance I can reproduce it.  If I am not able to
reproduce it I will ask for debug kernel and cores and see if I can help.=


 Few notes here:

 -  Around 2 months ago I did test HEAD with callout_async_drain() in
TCP timers with our TCP QA testsuite but no kernel panic.  That said I
did not let our test run during several hours.

 - At Verisign we run 10 with READ lock for inp info in tcp timers
change.  Again, it does not mean this change has no impact here.

 My 2 cents.

--
Julien


--iNgJ25Kd6kg6dTtPv0T8RCFla3TDH3fl9--

--FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJXY8KPAAoJEKVlQ5Je6dhxY2YH/RMWLRYKV0VwtKNw6YgGhLss
JaZhOzuHg6W751fBk1LXGJp1pg3CICVMtRX7jQVtGVjAPiT4en6M0M2DzHlgb8un
IFUfnwAfP9DSdIpclzc8vOci4QBI3inziIuQ5vLDayuExS1gswZk8fRSkW9BroVu
4TVIPk7vVLyK5bo/VlWK8e1+d5Ypdd+2rGKPinB28GVmBwejWf0GnTV80O/Qr2JE
jBldQM44ZU0nnxUj/yIq8NiswoTGQxdx2h4KPnCLIe+BJ6lygYMwrg8LdGbH/359
s0yiJoiwhPAmhvaS73dPmps7WUtS2e+QPq001r+IdNebWjXW8OwvbExGNHrH8pQ=
=PA0C
-----END PGP SIGNATURE-----

--FGNRPDI6hUU9D3ssP7mvM3Up4quRr6JfV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1f28844b-b4ea-b544-3892-811f2be327b9>