Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jul 2016 19:01:11 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>, rrs@FreeBSD.org
Cc:        hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Subject:   Re: panic with tcp timers
Message-ID:  <dbb33989-538a-69e8-7243-26c554da266c@freebsd.org>
In-Reply-To: <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>
References:  <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--3bJTeB9odClQb2ngDjv5fox53UGeQumpb
Content-Type: multipart/mixed; boundary="wOpjdVBETEfMPxFEjbCdXKObs3doI97cF"
From: Julien Charbon <jch@freebsd.org>
To: Gleb Smirnoff <glebius@FreeBSD.org>, rrs@FreeBSD.org
Cc: hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org
Message-ID: <dbb33989-538a-69e8-7243-26c554da266c@freebsd.org>
Subject: Re: panic with tcp timers
References: <20160617045319.GE1076@FreeBSD.org>
 <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org>
 <20160620073917.GI1076@FreeBSD.org>
 <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>
In-Reply-To: <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org>

--wOpjdVBETEfMPxFEjbCdXKObs3doI97cF
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi,

On 6/20/16 11:55 AM, Julien Charbon wrote:
> On 6/20/16 9:39 AM, Gleb Smirnoff wrote:
>> On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote:
>> J> > Comparing stable/10 and head, I see two changes that could
>> J> > affect that:
>> J> >=20
>> J> > - callout_async_drain
>> J> > - switch to READ lock for inp info in tcp timers
>> J> >=20
>> J> > That's why you are in To, Julien and Hans :)
>> J> >=20
>> J> > We continue investigating, and I will keep you updated.
>> J> > However, any help is welcome. I can share cores.
>>
>> Now, spending some time with cores and adding a bunch of
>> extra CTRs, I have a sequence of events that lead to the
>> panic. In short, the bug is in the callout system. It seems
>> to be not relevant to the callout_async_drain, at least for
>> now. The transition to READ lock unmasked the problem, that's
>> why NetflixBSD 10 doesn't panic.
>>
>> The panic requires heavy contention on the TCP info lock.
>>
>> [CPU 1] the callout fires, tcp_timer_keep entered
>> [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo);
>> [CPU 2] schedules the callout
>> [CPU 2] tcp_discardcb called
>> [CPU 2] callout successfully canceled
>> [CPU 2] tcpcb freed
>> [CPU 1] unblocks... panic
>>
>> When the lock was WLOCK, all contenders were resumed in a
>> sequence they came to the lock. Now, that they are readers,
>> once the lock is released, readers are resumed in a "random"
>> order, and this allows tcp_discardcb to go before the old
>> running callout, and this unmasks the panic.
>=20
>  Highly interesting.  I should be able to reproduce that (will be usefu=
l
> for testing the corresponding fix).

 Finally, I was able to reproduce it (without glebius fix).   The trick
was to really lower TCP keep timer expiration:

$ sysctl -a | grep tcp.keep
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.keepinit: 75000
net.inet.tcp.keepcnt: 8
$ sudo bash -c "sysctl net.inet.tcp.keepidle=3D10 && sysctl
net.inet.tcp.keepintvl=3D50 && sysctl net.inet.tcp.keepinit=3D10"
Password:
net.inet.tcp.keepidle: 7200000 -> 10
net.inet.tcp.keepintvl: 75000 -> 50
net.inet.tcp.keepinit: 75000 -> 10

 Note: It will certainly close all your ssh connections to the tested
server.

 Now I will test in order:

#1. glebius fix
https://svnweb.freebsd.org/base?view=3Drevision&revision=3D302350

#2. rss extra fix
https://reviews.freebsd.org/D7135

#3. rrs TCP Timer cleanup
https://reviews.freebsd.org/D7136

 My panic for reference:

Fatal trap 9: general protection fault while in kernel mode
cpuid =3D 10; apic id =3D 28
[root@atlas-dl360-4 ~]# instruction pointer     =3D 0x20:0xffffffff80c346=
f1
stack pointer           =3D 0x28:0xfffffe1f29b848b0
frame pointer           =3D 0x28:0xfffffe1f29b848e0
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 12 (swi4: clock (4))
trap number             =3D 9
panic: general protection fault
cpuid =3D 10
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe1f29b844a0
vpanic() at vpanic+0x182/frame 0xfffffe1f29b84520
panic() at panic+0x43/frame 0xfffffe1f29b84580
trap_fatal() at trap_fatal+0x351/frame 0xfffffe1f29b845e0
trap() at trap+0x820/frame 0xfffffe1f29b847f0
calltrap() at calltrap+0x8/frame 0xfffffe1f29b847f0
--- trap 0x9, rip =3D 0xffffffff80c346f1, rsp =3D 0xfffffe1f29b848c0, rbp=
 =3D
0xfffffe1f29b848e0 ---
tcp_timer_keep() at tcp_timer_keep+0x51/frame 0xfffffe1f29b848e0
softclock_call_cc() at softclock_call_cc+0x19c/frame 0xfffffe1f29b849c0
softclock() at softclock+0x47/frame 0xfffffe1f29b849e0
intr_event_execute_handlers() at intr_event_execute_handlers+0x96/frame
0xfffffe1f29b84a20
ithread_loop() at ithread_loop+0xa6/frame 0xfffffe1f29b84a70
fork_exit() at fork_exit+0x84/frame 0xfffffe1f29b84ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1f29b84ab0
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---

--
Julien



--wOpjdVBETEfMPxFEjbCdXKObs3doI97cF--

--3bJTeB9odClQb2ngDjv5fox53UGeQumpb
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJXh8VdAAoJEKVlQ5Je6dhxhccH/R7BIEReY5MtXw8l37IDBIB2
pK2uuSS+mvscTnIUzJcaCMPfXLsH/b5gmFpaqFGhouVsl0Z/pBl45br2jMXggFph
Z9ApSUFhEdfkTeM0tVp2VHOnMnIn8+L/gdSY4S2dKyPk/rEq/5DzIf0Ys2q34XJ1
WTltD3IsDjS1baOpy4O6iwSgoZnNTuZerOQqsJXmZ+ZayLM9OF/TGS8w+ztqewQL
9eKfZM7EoYKVdMsYjD/ECZOGy1pw9lFflHQkNaSdUMCePFPLy29DoTXSfALzl5+P
4JLnkRxKzoLoy8ep3LzVm91lwGZIigrkWGobGqAo+YYR9Np6Aq0680ZggPn50Ac=
=rq23
-----END PGP SIGNATURE-----

--3bJTeB9odClQb2ngDjv5fox53UGeQumpb--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?dbb33989-538a-69e8-7243-26c554da266c>