From owner-freebsd-net@FreeBSD.ORG  Sat Nov 10 09:13:20 2007
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B1EA816A468
	for <net@freebsd.org>; Sat, 10 Nov 2007 09:13:20 +0000 (UTC)
	(envelope-from mattjreimer@gmail.com)
Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 68DE913C4B5
	for <net@freebsd.org>; Sat, 10 Nov 2007 09:13:20 +0000 (UTC)
	(envelope-from mattjreimer@gmail.com)
Received: by py-out-1112.google.com with SMTP id u77so138491pyb
	for <net@freebsd.org>; Sat, 10 Nov 2007 01:13:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=R4LiWfTo+A1f8ADYr6khO9N9wR2Fz9xV9vu0F7G4Hzs=;
	b=ravl5lta4rm7Y7SAbNY8C15p+LP/UGzxO+l7Q27HXCSEj9ZncgyT9mQsK5osHLxjlRbxFwMjWF99quAUTRPxfZKXvNt9f0+AsVuIRoEhZ4l04OEWsfCINtXMwaSd4l80l4VnysP40tEXZhgekosxfdFGkDWhrOaMOMFCR3RUUXo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=bWjzJOWgy6fYxPOe3QyvxbIIaTOomWpwDVXvn1YWr4xq6YQM0Aogye9UOw0TJAH979FoaTpYtp3SXPnzRJ7vK2ss+AkzU0w+ckfKG3gmy3n5Tp+yMVyFnmWhxgZQTQV7a/0rTzyGZSgDXrFmrOnipZ7cfB9K+Zq2hDP28QIIpsU=
Received: by 10.35.69.11 with SMTP id w11mr3147948pyk.1194685991663;
	Sat, 10 Nov 2007 01:13:11 -0800 (PST)
Received: by 10.35.67.16 with HTTP; Sat, 10 Nov 2007 01:13:11 -0800 (PST)
Message-ID: <f383264b0711100113w2ef66739se59775819c9a34d3@mail.gmail.com>
Date: Sat, 10 Nov 2007 01:13:11 -0800
From: "Matt Reimer" <mattjreimer@gmail.com>
To: "Mike Silbersack" <silby@silby.com>
In-Reply-To: <20071110020333.I46803@odysseus.silby.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <f383264b0711091609n81875b6v444055960ab0fd96@mail.gmail.com>
	<20071109213846.O46803@odysseus.silby.com>
	<f383264b0711092323p5148300fu3c0883135f8fb01b@mail.gmail.com>
	<20071110020333.I46803@odysseus.silby.com>
Cc: net@freebsd.org
Subject: Re: Should syncache.count ever be negative?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Nov 2007 09:13:20 -0000

On Nov 10, 2007 12:13 AM, Mike Silbersack <silby@silby.com> wrote:
>
> On Fri, 9 Nov 2007, Matt Reimer wrote:
>
> > I first noticed this problem running ab; then to simplify I used
> > netrate/http[d]. What's strange is that it seems fine over the local
> > network (~15800 requests/sec), but it slowed down dramatically (~150
> > req/sec) when tested from another network 20 ms away. Running systat
> > -tcp and nload I saw that there was an almost complete stall with only
> > a handful of packets being sent (probably my ssh packets) for a few
> > seconds or sometimes even up to 60 seconds or so.
>
> I think most benchmarking tools end up stalling if all of their threads
> stall, that may be why the rate falls off after the misbehavior you
> describe below begins.

Ok. FWIW, I'm seeing the same behavior with tools/netrate/http as I am with ab.

> > Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
> > [66.230.193.105]:80; syncache_socket: Socket create failed due to
> > limits or memory shortage
> > Nov  9 19:02:34 wordpress1 kernel: TCP: [207.210.67.2]:64851 to
> > [66.230.193.105]:80 tcpflags 0x10<ACK>; tcp_input: Listen socket:
> > Socket allocation failed due to limits or memory shortage, sending RST
>
> Turns out you'll generally get both of those error messages together, from
> my reading of the code.
>
> Since you eliminated memory shortage in the socket zone, the next thing to
> check is the length of the listen queues.  If the listen queue is backing
> up because the application isn't accepting fast enough, the errors above
> should happen.  "netstat -Lan" should show you what's going on there.
> Upping the specified listen queue length in your webserver _may_ be all
> that is necessary.  Try fiddling with that and watching how much they're
> filling up during testing.

I ran "netstat -Lan" every second while running this test and the
output never changed from the following, whether before or after the
stall:

Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen         Local Address
tcp4  0/0/128        66.230.193.105.80
tcp4  0/0/10         127.0.0.1.25
tcp4  0/0/128        *.22
tcp4  0/0/128        *.199


> The fact that you see the same port repeatedly may indicate that the
> syncache isn't destroying the syncache entries when you get the socket
> creation failure.  Take a look at "netstat -n" and look for SYN_RECEIVED
> entries - if they're sticking around for more than a few seconds, this is
> probably what's happening.  (This entire paragraph is speculation, but
> worth investigating.)

During the stall the sockets are all in TIME_WAIT. More relevant info:

kern.ipc.maxsockets: 12328
kern.ipc.numopensockets: 46

net.inet.ip.portrange.randomtime: 45
net.inet.ip.portrange.randomcps: 10
net.inet.ip.portrange.randomized: 1
net.inet.ip.portrange.reservedlow: 0
net.inet.ip.portrange.reservedhigh: 1023
net.inet.ip.portrange.hilast: 65535
net.inet.ip.portrange.hifirst: 49152
net.inet.ip.portrange.last: 65535
net.inet.ip.portrange.first: 30000
net.inet.ip.portrange.lowlast: 600
net.inet.ip.portrange.lowfirst: 1023

net.inet.tcp.finwait2_timeout: 60000
net.inet.tcp.fast_finwait2_recycle: 0

[root@wordpress1 /sys/dev]# netstat -m
513/5382/5895 mbufs in use (current/cache/total)
511/3341/3852/25600 mbuf clusters in use (current/cache/total/max)
1/1663 mbuf+clusters out of packet secondary zone in use (current/cache)
0/488/488/0 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
1150K/9979K/11129K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
17 requests for I/O initiated by sendfile
0 calls to protocol drain routines

> > I don't know if it's relevant, but accf_http is loaded on wordpress1.
>
> That may be relevant - accepting filtering changes how the listen queues
> are used.  Try going back to non-accept filtering for now.

It still stalls. This time I noticed that tcptw shows 0 free:

socket:                   696,    12330,       14,      126,    10749,        0
unpcb:                    248,    12330,        5,       70,       75,        0
ipq:                       56,      819,        0,        0,        0,        0
udpcb:                    280,    12334,        2,       40,      184,        0
inpcb:                    280,    12334,     2485,      105,    10489,        0
tcpcb:                    688,    12330,        7,       73,    10489,        0
tcptw:                     88,     2478,     2478,        0,     2478,     7231
syncache:                 112,    15378,        1,       65,     9713,        0
hostcache:                136,    15372,        0,        0,        0,        0
tcpreass:                  40,     1680,        0,        0,        0,        0
sackhole:                  32,        0,        0,        0,        0,        0

But even while tcptw shows 0 free, I can still blast 15800 req/s from
another RELENG_7 box to this one during the stall. So I don't know if
that means anything.

> FWIW, my crazy theory of the moment is this:  We have some bug that
> happens when the listen queues overflow in 7.0, and your test is strenuous
> enough to hit the listen queue overflow condition, leading to total
> collapse.  I'll have to cobble together a test program to see what happens
> in the listen queue overflow case.

When I use ab I'm telling it to use a max of 100 simultaneous
connections (ab -c 100 -n 50000 http://66.230.193.105/). Wouldn't that
be well under the limit?

> Thanks for the quick feedback,

Thank *you*.

Matt