From owner-freebsd-net@FreeBSD.ORG Thu Jul 11 14:49:28 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 123738C9; Thu, 11 Jul 2013 14:49:28 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com [IPv6:2a00:1450:4010:c03::22d]) by mx1.freebsd.org (Postfix) with ESMTP id ED5F71179; Thu, 11 Jul 2013 14:49:26 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id fr10so6887336lab.4 for ; Thu, 11 Jul 2013 07:49:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=Bt4bgwyEs4LlKMxnrPl8Av4A+QUE62vScN/kdtmfrtI=; b=m2uFCFNicGzG465mZVwjQJKx/xMKsCNrZsTC8xI9pstxQ/0dgirwtF6el0DnimK4BS xkCpWvXzYPpOZfDszkRwn5V+1NdyYawEpbf9ynTRwOCJ2qFvU1rLWFY5ul0DAaqMlFfn TwfxLIygTpQCSfkoWb9lBUA97bX9xMeznaY39kdd2PxlblMarwDb2dA7kpKbcoOf1NnY VhO9B1PdcvCE47E4C4TtrrOEFle/rtcmWsJumPnNf9T8JSb38qiT5AuwG8wPK75WnwJn 7qrKcdy+JEVuih+oJOVrlOBR6iCSUYUgtIx6IZTcViWRxfzQEQ8iz8wV8jTof5Qu7dcY +t1g== MIME-Version: 1.0 X-Received: by 10.112.29.17 with SMTP id f17mr17500087lbh.20.1373554165762; Thu, 11 Jul 2013 07:49:25 -0700 (PDT) Sender: rizzo.unipi@gmail.com Received: by 10.114.200.15 with HTTP; Thu, 11 Jul 2013 07:49:25 -0700 (PDT) In-Reply-To: <51DEC10B.3080409@freebsd.org> References: <51DE591E.7040405@FreeBSD.org> <51DE5C8C.3090404@freebsd.org> <20130711133504.GB67810@FreeBSD.org> <51DEC10B.3080409@freebsd.org> Date: Thu, 11 Jul 2013 16:49:25 +0200 X-Google-Sender-Auth: C2_Kf5Xba1-tFFRQqhwvaczRJUs Message-ID: Subject: Re: Listen queue overflow: N already in queue awaiting acceptance From: Luigi Rizzo To: Andre Oppermann Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org, Andriy Gapon X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 14:49:28 -0000 On Thu, Jul 11, 2013 at 4:28 PM, Andre Oppermann wrote: > On 11.07.2013 15:35, Gleb Smirnoff wrote: >> >> On Thu, Jul 11, 2013 at 09:19:40AM +0200, Andre Oppermann wrote: >> A> On 11.07.2013 09:05, Andriy Gapon wrote: >> A> > kernel: sonewconn: pcb 0xfffffe0047db3930: Listen queue overflow: 193 >> already in >> A> > queue awaiting acceptance >> A> > last message repeated 113 times >> A> > last message repeated 518 times >> A> > last message repeated 2413 times >> A> > last message repeated 2041 times >> A> > last message repeated 1741 times >> A> > last message repeated 1543 times >> A> > last message repeated 1283 times >> A> > last message repeated 1178 times >> A> > last message repeated 1020 times >> A> > ... >> A> > >> A> > What does this messages mean? >> A> >> A> That your server process lagging behind in accepting new connections >> and a >> A> quite a number of them get aborted due to a backlogged listen queue. >> A> >> A> Making the accept queue longer doesn't help, it's user-space that can't >> keep >> A> up with the rate of new incoming connections. >> A> >> A> You can either reduce the rate of new incoming connections, optimize >> your >> A> server process to accept more connections in the same time, or get a >> beefier >> A> machine. >> A> >> A> > Is it really that important to be printed? >> A> >> A> The log messages are at DEBUG level. People probably want to know >> about >> A> their server not keeping up and throwing incoming connection attempts >> away. >> A> >> A> > Finally, why is it not throttled? >> A> >> A> The frequency it happens with is important to determine if this is only >> A> a temporary spike (micro-burst) or persistent condition. >> >> IMO, this should be a single counter accessible via sysctl, with no >> printf(). Those, who need details on whether this is micro-burst or >> persistent condition, can run monitoring software that draws plots. > > > The single counter wouldn't tell you anything because it misses which > socket/accept queue is affected by the overflow. The inpcb pointer > can be cross-refrenced with netstat -a. > > Andriy for example would never have found out about this problem other > than receiving vague user complaints about aborted connection attempts. > Maybe after spending many hours searching for the cause he may have > interfered from endless scrolling in Wireshark that something wasn't > right and blame syncache first. Only later it would emerge that he's > either receiving too many connections or his application is too slow > dealing with incoming connections. > > If you can recommend a suitable and general sysadmin friendly monitoring > software that will point out this problem I'm all ears. the problem with these non-throttled messages is that they often cause thrashing -- you become slighly slow, messages start being generated and your system becomes a lot slower, making it hard to recover. What i usually do is throttle (in the kernel) and count the number of message suppressed. Something like this (in a macro): static int ctr, last_tick; if (ticks - last_tick > suppression_delay) { printf("got this error ... (%d times)\n", ... , ctr); ctr = 0; last_tick = tick; } else { ctr++; } the errors may not be exactly the same, the counter is race_prone (you can make it atomic if you really feel like) but the whole point is to get the idea that something is very wrong, not the exact count or pointer cheers luigi > -- > Andre > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------