From owner-freebsd-stable@freebsd.org Mon Sep 5 16:47:05 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C446B966E4 for ; Mon, 5 Sep 2016 16:47:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 41C5EAAD for ; Mon, 5 Sep 2016 16:47:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bgx2u-000DYn-Rd; Mon, 05 Sep 2016 19:46:56 +0300 Date: Mon, 5 Sep 2016 19:46:56 +0300 From: Slawa Olhovchenkov To: Warner Losh Cc: hiren panchasara , FreeBSD-STABLE Mailing List Subject: Re: 11.0 stuck on high network load Message-ID: <20160905164656.GG34394@zxy.spb.ru> References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160905074348.GE34394@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Sep 2016 16:47:05 -0000 On Mon, Sep 05, 2016 at 10:14:59AM -0600, Warner Losh wrote: > On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov wrote: > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). > >> > Under high network load and may be addtional conditional system go to > >> > unresponsible state -- no reaction to network and console (USB IPMI > >> > emulation). INVARIANTS give to high overhad. Is this exist some way to > >> > debug this? > >> > >> Can you panic it from console to get to db> to get backtrace and other > >> info when it goes unresponsive? > > > > no > > no reaction > > So the canonical 'ipmitool chassis power diag' doesn't send an NMI to > get you to the debugger? Don't try (and don't know about this). Can you some explain? Is this FreeBSD by default catch NMI and enter to debugger? How to interoperable with USB stack (I am beware USB keyboard may be locked)? > I've seen this at Netflix on one variant of our flash offload box with > a Intel e5-2697v2 running with the Chelsio driver. We're working > around it by having fewer receive threads than CPUs in the system. The > only way the boxes would come back was with watchdog. The load was > streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally > unresponsive as well. This is on our FreeBSD-10 stable based fork. > >From my debugging, we go from totally fine as far as I can tell from > ps, etc in the moments leading to the hang to being totally wedged. It > seems a very sudden-onset condition. Sound at all familiar? > > Warner Not sure. This is less power box and can be servered only 20Gbit, using Intel card (lagg 2x10H). Day ago I am using on this box 10-STABLE w/o such issuse. (Not cleancly remember, may be some month ago this box crashed by this issuse -- at the that time I am don't have any ideas about crash) May be stuck caused by some poor (too big) memory request from nginx (atempt parsing some malformed files). Or frequent nginx core dump (from this malformed files). 11.0 on two different more power box servered from 40 to 55Gbit w/o stuck. But w/o malformed files (t.e. w/o bogus memory request and w/o nginx crash). Not sure about correlation.