From owner-freebsd-stable@freebsd.org Fri Sep 16 18:18:44 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5F40BDDCD1 for ; Fri, 16 Sep 2016 18:18:44 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 770FEE63 for ; Fri, 16 Sep 2016 18:18:44 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bkxih-0000IO-Lh; Fri, 16 Sep 2016 21:18:39 +0300 Date: Fri, 16 Sep 2016 21:18:39 +0300 From: Slawa Olhovchenkov To: Konstantin Belousov Cc: freebsd-stable@FreeBSD.org Subject: Re: 11.0 stuck on high network load Message-ID: <20160916181839.GC2960@zxy.spb.ru> References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160914213503.GJ2840@zxy.spb.ru> <20160915085938.GN38409@kib.kiev.ua> <20160915090633.GS2840@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160915090633.GS2840@zxy.spb.ru> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Sep 2016 18:18:44 -0000 On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC). > > > > > Under high network load and may be addtional conditional system go to > > > > > unresponsible state -- no reaction to network and console (USB IPMI > > > > > emulation). INVARIANTS give to high overhad. Is this exist some way to > > > > > debug this? > > > > > > > > Can you panic it from console to get to db> to get backtrace and other > > > > info when it goes unresponsive? > > > > > > ipmi console don't respond (chassis power diag don't react) > > > login on sol console stuck on *tcp. > > > > Is 'login' you reference is the ipmi client state, or you mean login(1) > > on the wedged host ? > > on the wedged host > > > If BMC stops responding simultaneously with the host, I would suspect > > the hardware platform issues instead of a software problem. Do you have > > dedicated LAN port for BMC ? > > Yes. > But BMC emulate USB keyboard and this is may be lock inside USB > system. > "ipmi console don't respond" must be read as "ipmi console runnnig and > attached but system don't react to keypress on this console". > at the sime moment system respon to `enter` on ipmi sol console, but > after enter `root` stuck in login in the '*tcp' state (I think this is > NIS related). ~^B don't break to debuger. But I can login to sol console. All system work tooooo slooooow. Some cores displayed by top as 100% idle. Some cores displayed by top as 100% interrupt. last pid: 16631; load averages: 9.00, 9.08, 9.35 up 0+06:26:34 19:53:13 832 processes: 22 running, 765 sleeping, 42 waiting, 3 lock CPU 0: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 3: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle CPU 4: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 5: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 6: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 7: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 9: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 10: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 1564M Active, 5169M Inact, 114G Wired, 4548M Free ARC: 107G Total, 98G MFU, 9476M MRU, 160K Anon, 150M Header, 254M Other Swap: 32G Total, 32G Free Now I am trying to # pmcstat -c 0 -S CPU_CLK_UNHALTED_CORE -l 10 -O sample0.out load: 10.32 cmd: pmcstat 16878 [runnable] 4632.42r 0.00u 0.00s 0% 2940k I am still waiting