From owner-freebsd-stable@FreeBSD.ORG Wed Nov 5 18:23:35 2014 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 91A7A2EF for ; Wed, 5 Nov 2014 18:23:35 +0000 (UTC) Received: from constantine.ingresso.co.uk (constantine.ingresso.co.uk [IPv6:2a02:b90:3002:e550::3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5B524341 for ; Wed, 5 Nov 2014 18:23:35 +0000 (UTC) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1Xm5Ex-000MAj-LD for stable@freebsd.org; Wed, 05 Nov 2014 18:23:32 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1Xm5Ex-0008RQ-Jm for stable@freebsd.org; Wed, 05 Nov 2014 18:23:31 +0000 To: stable@freebsd.org Subject: Advice on an odd networking problem Message-Id: From: Pete French Date: Wed, 05 Nov 2014 18:23:31 +0000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Nov 2014 18:23:35 -0000 I have some ouzzling behaviour here - looks very much like I am running out of network resources of some kind, but I cant find out what, so am wondering if anyone has any ideas. All machines are running FreeBSD 9.2-STABLE r265427 - which is from the start of May (probbaly around heartbleed time!) We have 5 machines running webservers - apache24 serving cgi scripts, plyus nginx being used to drive uwsgi with some django/python based code. These are load balanced by pound on a machine which faces the internet. This all works as expected, except that if I modify the cgi-scripts running inside Apache so they make some https calls to the nginx server on 127.0.0.1 then what we see is that pound then stops being able to connect to Apache for a proportion of its calls - its returns 503's. The effect on the calls which fail are as if the webserver is not listening anymore. But this only applies to a fraction of the calls - most get through. If I disable the cuntionality which makes the intrenal call to 127.0.0.1 then the problem goes away. It looks to me like I am runing out of some network resource somwhow, but the load is very very low, and I cant see any obvious parameters hitting their limits. Nothing is looged out of the ordinary on the webservers, the only symptoom is the load balancer not being able to connect. Does anyone have any ideas where to look for a solution ? It is puzzling the hell out of me! -pete.