From owner-freebsd-stable@freebsd.org Tue Sep 6 21:03:59 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D49DBCCE31 for ; Tue, 6 Sep 2016 21:03:59 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 00B38A56 for ; Tue, 6 Sep 2016 21:03:59 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bhNX4-000443-0y; Wed, 07 Sep 2016 00:03:50 +0300 Date: Wed, 7 Sep 2016 00:03:50 +0300 From: Slawa Olhovchenkov To: Warner Losh Cc: hiren panchasara , FreeBSD-STABLE Mailing List Subject: Re: 11.0 stuck on high network load Message-ID: <20160906210349.GL34394@zxy.spb.ru> References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160905074348.GE34394@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Sep 2016 21:03:59 -0000 On Mon, Sep 05, 2016 at 10:14:59AM -0600, Warner Losh wrote: > On Mon, Sep 5, 2016 at 1:43 AM, Slawa Olhovchenkov wrote: > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > >> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > >> > I am try using 11.0 on Dual E5-2620 (no X2APIC). > >> > Under high network load and may be addtional conditional system go to > >> > unresponsible state -- no reaction to network and console (USB IPMI > >> > emulation). INVARIANTS give to high overhad. Is this exist some way to > >> > debug this? > >> > >> Can you panic it from console to get to db> to get backtrace and other > >> info when it goes unresponsive? > > > > no > > no reaction > > So the canonical 'ipmitool chassis power diag' doesn't send an NMI to > get you to the debugger? This supermicro MB don't interact with ipmitool over lan :( either chassis power diag and sol > I've seen this at Netflix on one variant of our flash offload box with > a Intel e5-2697v2 running with the Chelsio driver. We're working > around it by having fewer receive threads than CPUs in the system. The > only way the boxes would come back was with watchdog. The load was > streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally > unresponsive as well. This is on our FreeBSD-10 stable based fork. > >From my debugging, we go from totally fine as far as I can tell from > ps, etc in the moments leading to the hang to being totally wedged. It > seems a very sudden-onset condition. Sound at all familiar? > > Warner