From owner-freebsd-stable@freebsd.org Mon Sep 5 17:50:30 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 217A8B965E3 for ; Mon, 5 Sep 2016 17:50:30 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x231.google.com (mail-it0-x231.google.com [IPv6:2607:f8b0:4001:c0b::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DDAC5F8E for ; Mon, 5 Sep 2016 17:50:29 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x231.google.com with SMTP id i184so156718308itf.1 for ; Mon, 05 Sep 2016 10:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=IPFLcLf8ChZ8vrO0osJjpe1ZacjFt9QK04VdafeEixU=; b=qcGpG1oI4FQIc5tYm4SaO47OQbqRoubVyqoterjkzDQnBibJ7VvERIqOqTw/pVo7PT uOPIkoDLKSES2KaUESPirfl+Pbpwsa01GV4HoPw7X5KKwImlcVBsHMhwzPySAOWnyVuA b5qmaIUTtV4IlnxawnRbW6WbRSr0eucvZDS7bGDEJJYGKmxklXhoxKq+lWycPpGZBGZq d5fDiIPN+rfd0hIiEeMKzSE51+NWBt8s4Qa4YYuZnCcLsW7Itp2avygw4HBxlHyekcKl iSwmZ+sx2sSJ7nA55KjIqvjZIiER99HBRKbDZRSdN+S2jyEI5Fhz+g7G2Y1k51/8JZVr JGUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=IPFLcLf8ChZ8vrO0osJjpe1ZacjFt9QK04VdafeEixU=; b=Bl0Vqp3RkEySbq/Iuhq6U4QYGmxq55lCoWJZT2qI2fymIMC9yEdWXwSwd7x3fpYlka 2kS1NdJkBJLt5PAbFPvMbCxq/kVl7cETVCZGxmfcDB03AfagUOnGLgQz4pifrIQMutpU PGVUz46sG9qkI3afuVrXg79E/ioObvMXu/B1qEIFC/FnjAMPBifhP7d2dNfJgWqULegy Kj+yttSyiLJCg/LYt8/fypmbmlUhEIUiCC+NhVFN0P1+F3rAd6cbKoEDugM92TdfkBP/ vTDvJAmWIyu7/XTFMa2g9i2FYTVZGeCqTRyr4B7FmlOS0ORS+F/NulbE3ExaAynsBR+I nK1A== X-Gm-Message-State: AE9vXwNvyPucbWyh3Y5o1szJWCjqfYWSsJovUGMVBrXtAYSr78REnLa/yP1ZS6LQ/GQ1XA2F9wEmLeLQW/XW0Q== X-Received: by 10.36.210.68 with SMTP id z65mr26810334itf.32.1473097829344; Mon, 05 Sep 2016 10:50:29 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.36.65.7 with HTTP; Mon, 5 Sep 2016 10:50:28 -0700 (PDT) X-Originating-IP: [50.253.99.174] In-Reply-To: <20160905164656.GG34394@zxy.spb.ru> References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160905074348.GE34394@zxy.spb.ru> <20160905164656.GG34394@zxy.spb.ru> From: Warner Losh Date: Mon, 5 Sep 2016 11:50:28 -0600 X-Google-Sender-Auth: 75jAQ8k1MKtpX2W5e2MgB2FXoq8 Message-ID: Subject: Re: 11.0 stuck on high network load To: Slawa Olhovchenkov Cc: hiren panchasara , FreeBSD-STABLE Mailing List Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Sep 2016 17:50:30 -0000 On Mon, Sep 5, 2016 at 10:46 AM, Slawa Olhovchenkov wrote: > On Mon, Sep 05, 2016 at 10:14:59AM -0600, Warner Losh wrote: > >> On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov wrote: >> > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: >> > >> >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: >> >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). >> >> > Under high network load and may be addtional conditional system go to >> >> > unresponsible state -- no reaction to network and console (USB IPMI >> >> > emulation). INVARIANTS give to high overhad. Is this exist some way to >> >> > debug this? >> >> >> >> Can you panic it from console to get to db> to get backtrace and other >> >> info when it goes unresponsive? >> > >> > no >> > no reaction >> >> So the canonical 'ipmitool chassis power diag' doesn't send an NMI to >> get you to the debugger? > > Don't try (and don't know about this). > Can you some explain? The BCM sends the NMI to the CPU. > Is this FreeBSD by default catch NMI and enter to debugger? Yes. > How to interoperable with USB stack (I am beware USB keyboard may be locked)? I've just done serial console, so I'm not sure. I think that it works... >> I've seen this at Netflix on one variant of our flash offload box with >> a Intel e5-2697v2 running with the Chelsio driver. We're working >> around it by having fewer receive threads than CPUs in the system. The >> only way the boxes would come back was with watchdog. The load was >> streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally >> unresponsive as well. This is on our FreeBSD-10 stable based fork. >> >From my debugging, we go from totally fine as far as I can tell from >> ps, etc in the moments leading to the hang to being totally wedged. It >> seems a very sudden-onset condition. Sound at all familiar? >> >> Warner > > Not sure. > This is less power box and can be servered only 20Gbit, using Intel > card (lagg 2x10H). Day ago I am using on this box 10-STABLE w/o such > issuse. (Not cleancly remember, may be some month ago this box crashed > by this issuse -- at the that time I am don't have any ideas about crash) OK. > May be stuck caused by some poor (too big) memory request from nginx > (atempt parsing some malformed files). Or frequent nginx core dump > (from this malformed files). OK. We're using nginx too, with our modified sendfile. > 11.0 on two different more power box servered from 40 to 55Gbit w/o stuck. > But w/o malformed files (t.e. w/o bogus memory request and w/o nginx > crash). Not sure about correlation. In our case it seems like a timing issue between too many threads. The same hardware can handle 1x40G no probem... Warner