From owner-freebsd-bugs@FreeBSD.ORG Fri Sep 21 22:52:37 2007 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF85F16A418 for ; Fri, 21 Sep 2007 22:52:37 +0000 (UTC) (envelope-from souleorama@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6B1DC13C49D for ; Fri, 21 Sep 2007 22:52:37 +0000 (UTC) (envelope-from souleorama@gmail.com) Received: by py-out-1112.google.com with SMTP id u77so1893591pyb for ; Fri, 21 Sep 2007 15:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; bh=Esue4V1Fsks9inj9qYp84C5FY8BUjAEzGwQGyACEkBM=; b=d3sYMPaAHBjWh3RybfcE3o/NMZNIpTkao2Mr54/acRbAtidrAn1SSs4pX1TlvpUUlP1hnJ2Ea5PfObCqFTP0KVvb+NymsUE26cdn2L6xq8ZobD+LJTiKOXGNHgAx9GsJcJJ2LaxKwV/eoG7RlTBAzz5QibhcjP4ciiX5Ufv3284= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=LKDC4aZxdjtSZ+U/y67ojUbLNinT/CE1CBzsLia0fedxCwzQuX+zmFbHVZSZQCRDrY5X9Cv6EeBhO74qKDDc2TIcPSHE4qaDgRclkjpmHH9K76S4sAOMaAgJiWRhGMaDGeG69d2d3KqnOqX3q+AVPps1dU5rKaMxtrmakOpEyek= Received: by 10.35.85.16 with SMTP id n16mr20839pyl.1190413425262; Fri, 21 Sep 2007 15:23:45 -0700 (PDT) Received: by 10.35.79.1 with HTTP; Fri, 21 Sep 2007 15:23:45 -0700 (PDT) Message-ID: <6c845d510709211523o4694af26y15c83dfebb445acd@mail.gmail.com> Date: Fri, 21 Sep 2007 15:23:45 -0700 From: "Jeff Soule" Sender: souleorama@gmail.com To: freebsd-bugs@freebsd.org MIME-Version: 1.0 X-Google-Sender-Auth: 57b21e7cc8e2803b Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: tcp listen problem X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 22:52:37 -0000 We are seeing an intermittent problem with FreeBSD 6.2 and our custom web server application, where incoming listens will sometimes not be passed to our application to be accepted. It is as if the listen queue is "clogged" somehow, and all incoming listens are blocked from being passed to our application. The clogged state lasts anywhere from a few minutes to over 30 minutes, then (if we wait it out) picks up and runs as if nothing had gone wrong. When the application picks up, the pending requests are accepted by our application with an error that they timed out on the client, and with new listens accepted and working fine. Other applications, and other ip:port pairs in our application, all continue to work fine while a listen for a particular ip:port is clogged. Our short-term fix for the problem is to check for incoming listens completing, and if none come in for a 2 second period to call ourselves and make sure that our call to ourselves completes. If not, then we kill the instance and restart. Restarting the application fixes the problem immediately (except that the listens in the queue at the time of the restart are lost and get errors). The problem is that the short-term fix reduces our uptime from 100% to 99.5%, and this is simply not an acceptable level of service for our customers; we have to fix this... Internal details on what we are doing: * using select for polled I/O, with all I/O requests coming out of a single thread * using threads for incoming requests in a single process (this is because it is a database application, and we need all threads to access the database cache) We've checked a tcpdump of incoming calls, and can't see anything funny about the calls that clog the listen queue; they look fine to us. So doesn't look like an attack per se. Incidence seems to be random. We might have 4-5 days without any, then get 10 in one day close together, or get one every now and then. Any help would be much appreciated, and we would be happy to hire someone on a consulting basis to help resolve.