From owner-freebsd-net@FreeBSD.ORG Thu Jul 11 15:06:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 014002D2; Thu, 11 Jul 2013 15:06:13 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-lb0-x231.google.com (mail-lb0-x231.google.com [IPv6:2a00:1450:4010:c04::231]) by mx1.freebsd.org (Postfix) with ESMTP id DD06D12D7; Thu, 11 Jul 2013 15:06:11 +0000 (UTC) Received: by mail-lb0-f177.google.com with SMTP id 10so6675597lbf.22 for ; Thu, 11 Jul 2013 08:06:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=UPhmRwOHn19shYskv07juhQotEq/qPW+eTBYTZJdRWg=; b=sxW6B6MQ17uHs3S6T/8SAwT69DuaXJG7YuMWJj80/hedMHKfYNUB0+N6ibblaEtIyU N+wfRbIIJlKKhISSvqOdYQSYwrQYYF+smdHq6r6TAX/Upzq5CI+wcwuRQdQYfq+qQAbK gozehMxMUcbmYxraZ+O0UyOsGUUo/sb5pEOHtdHzIXT0CH6to6DEzyxtlxpY+GyhMK+I DQV7G+64/1TO0rtfK829taj9bYRoBzrDBFx74h58AQ/nicI00UgYCr8MJJpDhvGp9skf Ubpm4GT/1ITABBDzBcdNguHzaaLgD6QvrZBKMQX1xGITeZPfEHlqzF6LgnJeEFHGLE+D FvMw== MIME-Version: 1.0 X-Received: by 10.112.144.97 with SMTP id sl1mr17303342lbb.56.1373555170197; Thu, 11 Jul 2013 08:06:10 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.200.15 with HTTP; Thu, 11 Jul 2013 08:06:10 -0700 (PDT) In-Reply-To: <20130711145229.GB8839@glebius.int.ru> References: <51DE591E.7040405@FreeBSD.org> <51DE5C8C.3090404@freebsd.org> <20130711133504.GB67810@FreeBSD.org> <51DEC10B.3080409@freebsd.org> <20130711145229.GB8839@glebius.int.ru> Date: Thu, 11 Jul 2013 17:06:10 +0200 X-Google-Sender-Auth: Kbaw26zWDcGRNthlxgwLwwpVWnE Message-ID: Subject: Re: Listen queue overflow: N already in queue awaiting acceptance From: Luigi Rizzo To: Gleb Smirnoff Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org, Andre Oppermann , Andriy Gapon X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 15:06:13 -0000 On Thu, Jul 11, 2013 at 4:52 PM, Gleb Smirnoff wrote: > On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote: > L> >> IMO, this should be a single counter accessible via sysctl, with no > L> >> printf(). Those, who need details on whether this is micro-burst or > L> >> persistent condition, can run monitoring software that draws plots. > L> > > L> > > L> > The single counter wouldn't tell you anything because it misses which > L> > socket/accept queue is affected by the overflow. The inpcb pointer > L> > can be cross-refrenced with netstat -a. > L> > > L> > Andriy for example would never have found out about this problem other > L> > than receiving vague user complaints about aborted connection attempts. > L> > Maybe after spending many hours searching for the cause he may have > L> > interfered from endless scrolling in Wireshark that something wasn't > L> > right and blame syncache first. Only later it would emerge that he's > L> > either receiving too many connections or his application is too slow > L> > dealing with incoming connections. > L> > > L> > If you can recommend a suitable and general sysadmin friendly monitoring > L> > software that will point out this problem I'm all ears. > L> > L> the problem with these non-throttled messages is that they often > L> cause thrashing -- you become slighly slow, messages start being > L> generated and your system becomes a lot slower, making it hard > L> to recover. > L> > L> What i usually do is throttle (in the kernel) and count the number of > L> message suppressed. Something like this (in a macro): > L> > L> static int ctr, last_tick; > L> if (ticks - last_tick > suppression_delay) { > L> printf("got this error ... (%d times)\n", ... , ctr); > L> ctr = 0; > L> last_tick = tick; > L> } else { > L> ctr++; > L> } > L> > L> the errors may not be exactly the same, the counter is race_prone > L> (you can make it atomic if you really feel like) but the whole point is > L> to get the idea that something is very wrong, not the exact count > L> or pointer > > btw, there is ready function for that: ppsratecheck(), already utilized > for suppressing some error messages. yes, i think i saw it before. To me, the convenience of the macro is that it can also wrap the declaration of the static variables and the printf. I basically have macros like this (see sys/dev/netmap/netmap_kern.h) RD(max_pps, "printf format ", arguments....) // rate-limited printf ND(same arguments as above) // compiles to no-op so i can quickly add the messages or disable them by simply changing the macro name FWIW the macro in netmap_kern.h does not have the counter of suppressed messages (I just thought about it , but i should probably add it as a feature) cheers luigi > -- > Totus tuus, Glebius. -- -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------