From owner-freebsd-stable@freebsd.org Mon Sep 5 18:02:39 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E040B969DE for ; Mon, 5 Sep 2016 18:02:39 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5FCC4B36 for ; Mon, 5 Sep 2016 18:02:39 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bgyE8-000FIM-9V; Mon, 05 Sep 2016 21:02:36 +0300 Date: Mon, 5 Sep 2016 21:02:36 +0300 From: Slawa Olhovchenkov To: Warner Losh Cc: hiren panchasara , FreeBSD-STABLE Mailing List Subject: Re: 11.0 stuck on high network load Message-ID: <20160905180236.GH34394@zxy.spb.ru> References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160905074348.GE34394@zxy.spb.ru> <20160905164656.GG34394@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Sep 2016 18:02:39 -0000 On Mon, Sep 05, 2016 at 11:50:28AM -0600, Warner Losh wrote: > > How to interoperable with USB stack (I am beware USB keyboard may be locked)? > > I've just done serial console, so I'm not sure. I think that it works... IPMI or hardware? > >> I've seen this at Netflix on one variant of our flash offload box with > >> a Intel e5-2697v2 running with the Chelsio driver. We're working > >> around it by having fewer receive threads than CPUs in the system. The > >> only way the boxes would come back was with watchdog. The load was > >> streaming video > ~36Gbps out 4 lagged 10G ports. Console is totally > >> unresponsive as well. This is on our FreeBSD-10 stable based fork. > >> >From my debugging, we go from totally fine as far as I can tell from > >> ps, etc in the moments leading to the hang to being totally wedged. It > >> seems a very sudden-onset condition. Sound at all familiar? > >> > >> Warner > > > > Not sure. > > This is less power box and can be servered only 20Gbit, using Intel > > card (lagg 2x10H). Day ago I am using on this box 10-STABLE w/o such > > issuse. (Not cleancly remember, may be some month ago this box crashed > > by this issuse -- at the that time I am don't have any ideas about crash) > > OK. > > > May be stuck caused by some poor (too big) memory request from nginx > > (atempt parsing some malformed files). Or frequent nginx core dump > > (from this malformed files). > > OK. We're using nginx too, with our modified sendfile. I am don't use sendfile and use ZFS. > > 11.0 on two different more power box servered from 40 to 55Gbit w/o stuck. > > But w/o malformed files (t.e. w/o bogus memory request and w/o nginx > > crash). Not sure about correlation. > > In our case it seems like a timing issue between too many threads. The > same hardware can handle 1x40G no probem... I am already reconfigure NIC to have total number of recive thread as half of total CPU cores. I am don't see high number of AIO tasks immediate before stuck (around 100).