From owner-freebsd-net@freebsd.org Mon Sep 28 08:23:18 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0235A08CBA for ; Mon, 28 Sep 2015 08:23:17 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f194.google.com (mail-wi0-f194.google.com [209.85.212.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7FBAE199E; Mon, 28 Sep 2015 08:23:17 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wicuu12 with SMTP id uu12so16096299wic.0; Mon, 28 Sep 2015 01:23:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=f/02x2pG/fujEiineUxzMBKDOIpCXPGnPhZW2XSNKEk=; b=MDGyyexp1PmK+QPBVbbp9vufmTrmuu1qCgAJchH4YBAJpeOUUT80SfEwAQDqaSMnIe L0Ic2Owd6jwjATniDnlvUXX705l1k7PlcyzY2VxHE6JvnLWxWAbSPThptDw4ISQkHbJA sXxq5rmNPI2yKm614+ilgkaVHhj//KZ8EZ4kTKNWMqWsED1JD4+VhDbYnf6+TYQdib58 6TRXAukHqsfKh2ZkzJ//NvIeI89l7tRArfxvD8/twhm238KJyrkQdMQL8QRAcj9fbAFu 0EQmjXYmwnakyyy0hKY67dcT4c3fOVrZmf9Ffftiz1QfAld0TmxM9avOTxmNYeywe1jl HVzQ== X-Received: by 10.180.36.193 with SMTP id s1mr29076wij.84.1443427680104; Mon, 28 Sep 2015 01:08:00 -0700 (PDT) Received: from fri2pmaresca-l1.vcorp.ad.vrsn.com ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id d8sm21576634wiy.1.2015.09.28.01.07.58 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Sep 2015 01:07:59 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> Cc: freebsd-net@freebsd.org From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5608F559.3020702@freebsd.org> Date: Mon, 28 Sep 2015 10:07:53 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="rDa8W47rW3h7JrJo0ArH56mGJNH7euvOr" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Sep 2015 08:23:18 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --rDa8W47rW3h7JrJo0ArH56mGJNH7euvOr Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, On 25/09/15 16:14, Palle Girgensohn wrote: >> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn : >>> 24 sep 2015 kl. 09:57 skrev Julien Charbon : On >>> 24/09/15 09:03, Julien Charbon wrote: >>>> On 24/09/15 08:55, Palle Girgensohn wrote: >>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn=20 >>>>>> : >>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn=20 >>>>>>> : >>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon >>>>>>>> : On 23/09/15 20:26, Palle Girgensohn >>>>>>>> wrote: >>>>>>> Kernels and userland are updated to 10.2-p3 with the >>>>>>> patch removing the suspicous KASSERT. dtrace running >>>>>>> continously redirecting to a log file. >>>>> Just had a crash. Unfortunately, the kernel was stuck at the >>>>> db> prompt, and the remote keyboard was unresponsive (HP ILO, >>>>> not impressed). So I had to reset the power and never got a >>>>> core dump... >>>>>=20 >>>>> panic: tcp_tw_2msl_stop: inp should not be released here=20 >>>>> cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at >>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe175acd16a0 >>>>> kdb_backtrace() at kdb_backtrace+0x39/frame=20 >>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame >>>>> 0xfffffe175acd1790 kassert_panic() at >>>>> kassert_panic+0x139/frame 0xfffffe175acd1800 tcp_twclose() at >>>>> tcp_twclose+0x2cb/frame 0xfffffe175acd1850 tcp_tw_2msl_scan() >>>>> at tcp_tw_2msl_scan+0x13b/frame 0xfffffe175acd1890 >>>>> tcp_slowtimo() at tcp_slowtimo+0x68/frame 0xfffffe175acd18c0 >>>>> pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe175acd18f0 >>>>> softclock_call_cc() at softclock_call_cc+0x193/frame >>>>> 0xfffffe175acd19d0 softclock() at softclock+0x47/frame >>>>> 0xfffffe175acd19f0 intr_event_execute_handlers() at >>>>> intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30=20 >>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70=20 >>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0=20 >>>>> fork_trampoline() at fork_trampoline+0xe/frame >>>>> 0xfffffe175acd1ab0 --- trap 0, rip =3D 0, rsp =3D >>>>> 0xfffffe175acd1b70, rbp =3D 0 --- KDB: enter: panic [ thread >>>>> pid 12 tid 100043 ] Stopped at kdb_enter+0x3e: movq >>>>> $0,kdb_why db> >>>>=20 >>>> Thanks a log for this backstrace. This is what at expected, >>>> when tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can >>>> be called one extra time that leads to: >>>>=20 >>>> tcp_tw_2msl_stop: inp should not be released here >>>>=20 >>>> Let me try to come with a tentative fix for this case. >>>=20 >>> See joined my tentative patch for these case. It is only a >>> first tentative patch as I am still waiting on -net feedbacks on >>> what should be the rule here. >>>=20 >>> By the way: >>>=20 >>> - I see nothing specific to VIMAGE here >>>=20 >>> - Anyone aware of tcp_close() (or tcp_drop()) calls >>> modified/introduced recently in 10.2 that could explained why >>> this issue only appears only now? >>=20 >> Running a machine with the patch now (it just crashed and rebooted >> with the new kernel). >>=20 >> Hoping it will have a "soothing" effect... ;-) >>=20 >> dtrace running as previously. No output yet, though. >=20 > First of, loud cheers and a big *thank you* to Julien for helping us > get our systems to stop crashing. This really means a lot to us! > Thank you! Glab to see your system more stable now. You are welcome, thanks to you for reporting this issue with accuracy. We got lucky than it took /only/ three different kernel panics to get a good overview. This part of the code being quite tricky as you have three entangled layers that tries to clean up theirs things the right way: socket, inp and tcptw. > Dtrace still shows nothing. I will try to provide you more generic Dtrace script, it seem the current one is too specific. -- Julien --rDa8W47rW3h7JrJo0ArH56mGJNH7euvOr Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWCPVeAAoJEKVlQ5Je6dhxrFsH/3wQHFmAqMr0iRzO9iRJIChW wYahDsA4TjezATlZMmraU130h6BXyRSzunLky+QVrY+BMDTjAn5T7+jYsA9xpaCW vdBUHyAQQy5jaddfHZrtx0ZumSIYNI3TIqUvhmbyhWtGfPrvVkx0P3qQpwV60M7N XAPrFFn5LdqcdUVT9/D/YOe13C4dxlYvRdWtpdC1z7cZBPBGzaRq+R+6dD3H24AU SEJdDEfijvRu0iLykYQ5QOr/5l0DobojE3X9lZmUa7nZfOwknLuQCzcr5Hwylm6T X0nMdt1HHzunW0mJcCWpxhxcx2Ga0MTXtr9A97X/2bh8sLXAX4H5bAcyc5LKCHw= =rpsy -----END PGP SIGNATURE----- --rDa8W47rW3h7JrJo0ArH56mGJNH7euvOr--