From owner-freebsd-stable@freebsd.org Tue Sep 20 20:59:20 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 361BABE2915 for ; Tue, 20 Sep 2016 20:59:20 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BF710CBD for ; Tue, 20 Sep 2016 20:59:19 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-wm0-f44.google.com with SMTP id b130so56662612wmc.0 for ; Tue, 20 Sep 2016 13:59:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=TBhkwAZMxRQJJwZXaVq/7qX6IjfEvv5sa4DycmEMWV4=; b=F4ZJC4QOK4XJRGSoO2Vh4XJePXfci0kOqglFkGwXQ7IVtgm6LPWZI48/rfwKPH8RGa WevirCxalh4uqKXZ532724ptnli/l2RWP2jlDy3OWWKkjlwRhDpix4Mj70QkWaHBIA79 i3mMr0XotIRbwj7cjn2PImXnONd+h6fzRx/FyRc5LrdVWN41Da4DQ4+wm68ZobfhtYf+ wdybjyAhHWAxyCz7rzKylt5+A74sFTqFpLu58h+xapQ3b6wmCffTu9TVR0528sz7Fkw3 uUyG1gY5lxr5Sd3Dq+zi0YnfpQPnqO/mpVmhix6C3uf6Q2i9cY8iU6p4a0YhopmKtgR7 7R/g== X-Gm-Message-State: AE9vXwMqZ6Os4Lt+lvJ0Ii0WK1lHh/DVHAQFNZJhGz6Vemm2dPTwwG82biNjGdFILAMVrg== X-Received: by 10.28.181.145 with SMTP id e139mr4654358wmf.114.1474401634944; Tue, 20 Sep 2016 13:00:34 -0700 (PDT) Received: from [192.168.0.12] (217-162-163-184.dynamic.hispeed.ch. [217.162.163.184]) by smtp.gmail.com with ESMTPSA id o2sm29998864wjo.3.2016.09.20.13.00.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Sep 2016 13:00:33 -0700 (PDT) Subject: Re: 11.0 stuck on high network load To: Slawa Olhovchenkov References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160914213503.GJ2840@zxy.spb.ru> <20160915085938.GN38409@kib.kiev.ua> <20160915090633.GS2840@zxy.spb.ru> <20160916181839.GC2960@zxy.spb.ru> <20160916183053.GL9397@strugglingcoder.info> <20160916190330.GG2840@zxy.spb.ru> <78cbcdc9-f565-1046-c157-2ddd8fcccc62@freebsd.org> <20160919204328.GN2840@zxy.spb.ru> Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara From: Julien Charbon Message-ID: <8ba75d6e-4f01-895e-0aed-53c6c6692cb9@freebsd.org> Date: Tue, 20 Sep 2016 22:00:25 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160919204328.GN2840@zxy.spb.ru> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="UfwU9GI729W6F2q2nLQ6jAwJJ1tANQG4J" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Sep 2016 20:59:20 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --UfwU9GI729W6F2q2nLQ6jAwJJ1tANQG4J Content-Type: multipart/mixed; boundary="v9b3hUSrNDIrqVvbMuxrNDHf6TvMtHHi3"; protected-headers="v1" From: Julien Charbon To: Slawa Olhovchenkov Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara Message-ID: <8ba75d6e-4f01-895e-0aed-53c6c6692cb9@freebsd.org> Subject: Re: 11.0 stuck on high network load References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160914213503.GJ2840@zxy.spb.ru> <20160915085938.GN38409@kib.kiev.ua> <20160915090633.GS2840@zxy.spb.ru> <20160916181839.GC2960@zxy.spb.ru> <20160916183053.GL9397@strugglingcoder.info> <20160916190330.GG2840@zxy.spb.ru> <78cbcdc9-f565-1046-c157-2ddd8fcccc62@freebsd.org> <20160919204328.GN2840@zxy.spb.ru> In-Reply-To: <20160919204328.GN2840@zxy.spb.ru> --v9b3hUSrNDIrqVvbMuxrNDHf6TvMtHHi3 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Slawa, On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: > On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: >> >>> @ CPU_CLK_UNHALTED_CORE [4653445 samples] >>> >>> 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel >>> 100.0% [2413083] __rw_wlock_hard >>> 100.0% [2413083] tcp_tw_2msl_scan >>> 99.99% [2412958] pfslowtimo >>> 100.0% [2412958] softclock_call_cc >>> 100.0% [2412958] softclock >>> 100.0% [2412958] intr_event_execute_handlers >>> 100.0% [2412958] ithread_loop >>> 100.0% [2412958] fork_exit >>> 00.01% [125] tcp_twstart >>> 100.0% [125] tcp_do_segment >>> 100.0% [125] tcp_input >>> 100.0% [125] ip_input >>> 100.0% [125] swi_net >>> 100.0% [125] intr_event_execute_handlers >>> 100.0% [125] ithread_loop >>> 100.0% [125] fork_exit >> >> The only write lock tcp_tw_2msl_scan() tries to get is a >> INP_WLOCK(inp). Thus here, tcp_tw_2msl_scan() seems to be stuck >> spinning on INP_WLOCK (or pfslowtimo() is going crazy and calls >> tcp_tw_2msl_scan() at high rate but this will be quite unexpected). >> >> Thus my hypothesis is that something is holding the INP_WLOCK and not= >> releasing it, and tcp_tw_2msl_scan() is spinning on it. >> >> If you can, could you compile the kernel with below options: >> >> options DDB # Support DDB. >> options DEADLKRES # Enable the deadlock resolver >> options INVARIANTS # Enable calls of extra sanity >> checking >> options INVARIANT_SUPPORT # Extra sanity checks of intern= al >> structures, required by INVARIANTS >> options WITNESS # Enable checks to detect >> deadlocks and cycles >> options WITNESS_SKIPSPIN # Don't run witness on spinlock= s >> for speed >=20 > Currently this host run with 100% CPU load (on all cores), i.e. > enabling WITNESS will be significant drop performance. > Can I use only some subset of options? >=20 > Also, I can some troubles to DDB enter in this case. > May be kgdb will be success (not tryed yet)? If these kernel options will certainly slow down your kernel, they also might found the root cause of your issue before reaching the point where you have 100% cpu load on all cores (thanks to INVARIANTS). I would suggest: #1. Try above kernel options at least once, and see what you can get. #2. If #1 is a total failure try below patch: It won't solve anything, it just makes tcp_tw_2msl_scan() less greedy when there is contention on the INP write lock. If it makes the debugging more feasible, continue to #3. diff --git a/sys/netinet/tcp_timewait.c b/sys/netinet/tcp_timewait.c index a8b78f9..4206ea3 100644 --- a/sys/netinet/tcp_timewait.c +++ b/sys/netinet/tcp_timewait.c @@ -701,34 +701,42 @@ tcp_tw_2msl_scan(int reuse) in_pcbref(inp); TW_RUNLOCK(V_tw_lock); +retry: if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { - INP_WLOCK(inp); - tw =3D intotw(inp); - if (in_pcbrele_wlocked(inp)) { - KASSERT(tw =3D=3D NULL, ("%s: held last i= np " - "reference but tw not NULL", __func__= )); - INP_INFO_RUNLOCK(&V_tcbinfo); - continue; - } + if (INP_TRY_WLOCK(inp)) { + tw =3D intotw(inp); + if (in_pcbrele_wlocked(inp)) { + KASSERT(tw =3D=3D NULL, ("%s: hel= d last inp " + "reference but tw not NULL", __func__)); + INP_INFO_RUNLOCK(&V_tcbinfo); + continue; + } - if (tw =3D=3D NULL) { - /* tcp_twclose() has already been called = */ - INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); - continue; - } + if (tw =3D=3D NULL) { + /* tcp_twclose() has already been called */ + INP_WUNLOCK(inp); + INP_INFO_RUNLOCK(&V_tcbinfo); + continue; + } - tcp_twclose(tw, reuse); - INP_INFO_RUNLOCK(&V_tcbinfo); - if (reuse) - return tw; + tcp_twclose(tw, reuse); + INP_INFO_RUNLOCK(&V_tcbinfo); + if (reuse) + return tw; + } else { + INP_INFO_RUNLOCK(&V_tcbinfo); + goto retry; + } } else { /* INP_INFO lock is busy, continue later. */ - INP_WLOCK(inp); - if (!in_pcbrele_wlocked(inp)) - INP_WUNLOCK(inp); - break; + if (INP_TRY_WLOCK(inp)) { + if (!in_pcbrele_wlocked(inp)) + INP_WUNLOCK(inp); + break; + } else { + goto retry; + } } } #3. Once the issue is reproduced, launch ddb and run the below commands= : show pcpu show allpcpu show locks show alllocks show lockchain show allchains show all trace My 2 cents. -- Julien --v9b3hUSrNDIrqVvbMuxrNDHf6TvMtHHi3-- --UfwU9GI729W6F2q2nLQ6jAwJJ1tANQG4J Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJX4ZVgAAoJEKVlQ5Je6dhxXvIIANqvnsweFVWMO+fG4EHI1tZW UeXOwM5/Amyox9uVrVkaTlBDft8hEDVAgwDyQKWdEOqV3FJek9/edXkbg31nuVHJ LtQQZvgMgN2gFQs+42U6XTKXAk0XsVmFQaPqi99m97AThXOLvKkI4kS0DJZL6tvU drXFt5NT0zfQLjqo/WWbNNqBXYkBQ6NI+rQ5bpZIBPEmvPPXSm1RUQd7pkJOdnYa mZHXB7J/9avca6pPomNsm6bnyrsmvdg6ecOTaQRq4qBn6sEjb1bYIOzfz5Tc6hzk Y7+JwRUtN9+sN7QumghAupyTuET4kvIOLfjGDT0HTF+8cAXOMG18Yj4jdPX74GE= =BBIM -----END PGP SIGNATURE----- --UfwU9GI729W6F2q2nLQ6jAwJJ1tANQG4J--