From owner-freebsd-net@FreeBSD.ORG Thu Jul 11 14:52:31 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A031EAD1; Thu, 11 Jul 2013 14:52:31 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id 0E45011C1; Thu, 11 Jul 2013 14:52:30 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.7/8.14.7) with ESMTP id r6BEqT8J009058; Thu, 11 Jul 2013 18:52:29 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.7/8.14.7/Submit) id r6BEqTQl009057; Thu, 11 Jul 2013 18:52:29 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 11 Jul 2013 18:52:29 +0400 From: Gleb Smirnoff To: Luigi Rizzo Subject: Re: Listen queue overflow: N already in queue awaiting acceptance Message-ID: <20130711145229.GB8839@glebius.int.ru> References: <51DE591E.7040405@FreeBSD.org> <51DE5C8C.3090404@freebsd.org> <20130711133504.GB67810@FreeBSD.org> <51DEC10B.3080409@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-net@freebsd.org, Andre Oppermann , Andriy Gapon X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 14:52:31 -0000 On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote: L> >> IMO, this should be a single counter accessible via sysctl, with no L> >> printf(). Those, who need details on whether this is micro-burst or L> >> persistent condition, can run monitoring software that draws plots. L> > L> > L> > The single counter wouldn't tell you anything because it misses which L> > socket/accept queue is affected by the overflow. The inpcb pointer L> > can be cross-refrenced with netstat -a. L> > L> > Andriy for example would never have found out about this problem other L> > than receiving vague user complaints about aborted connection attempts. L> > Maybe after spending many hours searching for the cause he may have L> > interfered from endless scrolling in Wireshark that something wasn't L> > right and blame syncache first. Only later it would emerge that he's L> > either receiving too many connections or his application is too slow L> > dealing with incoming connections. L> > L> > If you can recommend a suitable and general sysadmin friendly monitoring L> > software that will point out this problem I'm all ears. L> L> the problem with these non-throttled messages is that they often L> cause thrashing -- you become slighly slow, messages start being L> generated and your system becomes a lot slower, making it hard L> to recover. L> L> What i usually do is throttle (in the kernel) and count the number of L> message suppressed. Something like this (in a macro): L> L> static int ctr, last_tick; L> if (ticks - last_tick > suppression_delay) { L> printf("got this error ... (%d times)\n", ... , ctr); L> ctr = 0; L> last_tick = tick; L> } else { L> ctr++; L> } L> L> the errors may not be exactly the same, the counter is race_prone L> (you can make it atomic if you really feel like) but the whole point is L> to get the idea that something is very wrong, not the exact count L> or pointer btw, there is ready function for that: ppsratecheck(), already utilized for suppressing some error messages. -- Totus tuus, Glebius.