From owner-freebsd-hackers  Tue Nov 10 11:59:03 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id LAA06826
          for freebsd-hackers-outgoing; Tue, 10 Nov 1998 11:59:03 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA06818
          for <freebsd-hackers@FreeBSD.ORG>; Tue, 10 Nov 1998 11:59:01 -0800 (PST)
          (envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.1/8.9.1) id LAA14084;
	Tue, 10 Nov 1998 11:58:19 -0800 (PST)
	(envelope-from dillon)
Date: Tue, 10 Nov 1998 11:58:19 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199811101958.LAA14084@apollo.backplane.com>
To: Steven Yang <syang@directhit.com>
Cc: "'dg@root.com'" <dg@root.com>, Mike Smith <mike@smith.net.au>,
        Steven Yang <syang@directhit.com>,
        "'Open Systems Networking'" <opsys@mail.webspan.net>,
        "'freebsd-hackers@freebsd.org'" <freebsd-hackers@FreeBSD.ORG>
Subject: Re: RE: FW: Can't get rid of my mbufs. 
References:  <839A86AB6CE4D111A52200104B938D430B066B@MOE>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:Refresher: I started this thread two weeks ago and haven't had a time to
:reply until now.  I'm using FreeBSD 2.2.5 (I previously stated 2.2.6,
:but I was wrong) with Apache 1.2.4 using FastCGI.  A typical server
:response is about 22K of text, and under heavy load (20+
:requests/second), my mbufs (as seen through netstat -m) keep increasing
:until the server reboots itself (when I have > ~10000 mbufs)  It appears
:that all of the requests are getting valid replies (we check the
:returned web page for a string), even at loads around 100
:requests/second.  We do not do reverse-DNS.  My original question and
:one of the replies is attached at the bottom of this email.
:
:The higher the load, the faster the mbufs increase.  Under low load, the
:problem does not arise and new mbufs are not allocated.  I was requested
:to give you guys the output of "netstat -n", as shown below.  Big
:questions: do I have an mbuf leak?  Is it possibly my fault?  Could it
:be my version of FastCGI?  Will upgrading my OS to 2.2.7 solve the
:problem?  Will upgrading Apache solve the problem?

    There are three issues that I can think of.  I'm not sure 2.2.5 has the
    sysctl's to fix them (I think it does), but 2.2.7 certainly does.

    The mbuf's are almost certainly related to stale connections that aren't
    going away.  This typically occurs because Apache has not turned on
    keepalives.  You can fix this by turning on keepalives and reducing
    the keepalive idle test interval:

	sysctl -w net.inet.tcp.keepidle=1800
	sysctl -w net.inet.tcp.keepintvl=150
	sysctl -w net.inet.tcp.keepinit=150
	sysctl -w net.inet.tcp.always_keepalive=1

	NOTE: you must restart the web server after making these changes so
	it picks up the default

    The second issue could be that your default tcp window sizes are too
    large.  The defaults are actually reasonable... 16K:

	sysctl -a | fgrep tcp		(look for tcp.sendspace, tcp.recvspace)

    Check to make sure that Apache is not overriding the default window size
    to something huge.

    I don't understand why your netstat shows so few connections... are you
    sure apache was running at 20 hits/sec at the time you ran the netstat ?

	netstat -tn | fgrep tcp

    The third issue is the number of allocated protocol control blocks 
    in the netstat below... looks like a kernel bug to me, but not one
    I've ever seen before.  I would immediately upgrade the machine to
    2.2.7.  If this is your problem, I'll bet 2.2.7 will fix it.

    also do a 'ps ax' and look for hung CGI's, and try killing the server
    entirely (and anything else that was run from the server) and see if
    the space gets reclaimed.  I'm thinking pipes, possibly, but dunno if
    pipes use network mbufs.

:> > # netstat -m
:> > 4449 mbufs in use:
:> >         4437 mbufs allocated to data
:> >         1 mbufs allocated to packet headers
:> >         7 mbufs allocated to protocol control blocks
:> >         4 mbufs allocated to socket names and addresses
:> > 4263/4314 mbuf clusters in use
:> > 9184 Kbytes allocated to network (98% in use)

    Here's one of our servers (doing around 30 hits/sec at the moment).  Note
    that the in-use percentage is 48%, which is typical.  If you regularly
    see in-use percentages above 80% it's almost certainly due to a
    stale-socket problem, which in turn is usually due to keepalive's being
    turned off and blown sockets building up.  This box is running
    (roughly) 2.2.7.

shell3:/home/dillon# netstat -m
3201 mbufs in use:
        1470 mbufs allocated to data
        1513 mbufs allocated to packet headers
        211 mbufs allocated to protocol control blocks
        7 mbufs allocated to socket names and addresses
1206/2684 mbuf clusters in use
5768 Kbytes allocated to network (48% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message