From owner-freebsd-net  Thu Jan  2 22: 7:41 2003
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5175637B405
	for <freebsd-net@FreeBSD.org>; Thu,  2 Jan 2003 22:07:37 -0800 (PST)
Received: from mail.rpi.edu (mail.rpi.edu [128.113.22.40])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4F78B43ED8
	for <freebsd-net@FreeBSD.org>; Thu,  2 Jan 2003 22:07:36 -0800 (PST)
	(envelope-from higgsr@rpi.edu)
Received: from webmail.rpi.edu (webmail.rpi.edu [128.113.26.21])
	by mail.rpi.edu (8.12.1/8.12.1) with ESMTP id h0367Uut214646;
	Fri, 3 Jan 2003 01:07:30 -0500
Message-Id: <200301030607.h0367Uut214646@mail.rpi.edu>
Content-Type: text/plain
Content-Disposition: inline
To: freebsd-net@FreeBSD.org
From: higgsr@rpi.edu
Cc: jeff@expertcity.com
X-Originating-Ip: 24.195.1.76
Mime-Version: 1.0
Reply-To: higgsr@rpi.edu
Date: Fri, 03 Jan 2003 1:07:29 EST
X-Mailer: EMUmail 4.00
Subject: Re: when are mbuf clusters released?
X-Scanned-By: MIMEDefang 2.3 (www dot roaringpenguin dot com slash mimedefang)
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org

Mbufs are data structures in the kernel.  They contain information relating
to data to be sent/received.  Mbufs on my stable system are 256 bytes and
mbuf clusters are 2048 bytes.  I believe that there are 4 different types of
mbufs and they are usually chained together, depending on the amount of data.

The different mbufs

1.) internal data, packet header
2.) internal data, no packet header
3.) external data (mbuf cluster), packet header
4.) external data (mbuf cluster), no packet header

All mbufs have an mbuf header (20 bytes) that occupies the first portion of
the mbuf.  This mbuf header describes the type of mbuf, has pointers to link
mbufs together and to link mbuf chains together, etc.

All mbuf chains have a packet header (24 bytes) in the first mbuf of the
chain.

Mbuf 1 would have an mbuf header, a packet header and then room for the data
in transit (20 + 24 + 212 = 256 bytes).

Mbuf 2 would have an mbuf header and then room for data in transit (20 + 236
= 256 bytes).

Mbuf 3 would have an mbuf header, a packet header, an extended structure and
unused space (20 + 24 + 16 + 196 = 256 bytes). 

Mbuf 4 would have an mbuf header, some wasted space, an extended structure
and more wasted space (20 + 24 + 16 + 196 = 256 bytes).  The gap (some wasted
space) is because of the unions in the mbuf structure.

When the data in transit is small (less than MINCLSIZE = 213), a single mbuf
is used.****  

When the data in transit is larger (>= MINCLSIZE = 213) mbufs of type 3 and 4
are used.  The extended structure in the mbufs identifies the mbuf cluster
holding the data in transit.

So if you are send a lot of large messages, your system will use lots of mbuf
clusters.  I do not know why your mbuf clusters are not being returned to the
free list.  In the case of fragmentation, the IP code should be returning the
mbufs and mbuf clusters to the free lists.  When the datagram cannot be fully
reconstructed, its fragment list should be dropped.

I know I haven't answered all of your questions, but maybe I have given you
more insight.

Ray



****Not necessarily a single mbuf.  For example, as the the mbuf gets passed
down the the stack, a new mbuf is added in front of this one to in order to
prepend the headers from the other layers (i.e. ethernet header, ip header,
udp header, etc.)


On Thu, 02 Jan 2003 12:33:47 -0800 Jeff Behl wrote:

> Thanks for the info.	Could you explain how mbuf clusters and mbufs are 
> related?  i'd like to better understand how we can run out of one and 
> not the other.  also, is there an upper value for mbuf clusters that we 
> should be wary of?  again, the tuning page is quite vague on this. 
> 64000 seems to not do the trick so i was thinking of 2X this.  we have 
> plenty of memory (1GB and the machine only runs apache).
> 
> for those interested, we think we've found why we're getting lots of 
> connections in FIN_WAIT_1 state...it appears to be some sort of odd/bad 
> interaction between IE and flash (we think).	these machines serve popup 
> adds (sorry!), and we're guessing that when a user with a slower 
> connection gets one of these pop-ups and kills the window before the 
> flash file is done downloading, IE leaves the socket open...sorta, and 
> here's where it gets interesting.  it leaves it open, but closes the 
> window (sets it to 0)...or is the plugin doing this?	apache is done 
> sending data and considers the conenction closed, so its client timeout 
> feature never comes into play.  but there is still data in the sendq, 
> including the FIN, we believe.  BSD obeys the spec (unfortunately) and 
> keeps probing to see if the window has opened so it can transmit the 
> last of the data.  this goes on indefinitely!  so we get gobs 
> connections stuck in fin_wait_1.  interestingly, we have a solaris 
> machine serving the same purpose and it does not have the problem.  it 
> seems to not folluw the rfc and silently closes the conenction after a 
> number of tries when a socket is in fin_wait_1.  these seems more 
> reasonable to me.  this seems (as i've read from other posts as well) 
> quite an opportunity for a DoS attack....just keep advertising a window 
> of 0.  a single client could easily tie everything up in fin_wait_1...
> 
> anyone think of a workaround (besides not serving pop-ups :)
> 
> jeff
> 
> 
> Mike Silbersack wrote:
> > On Mon, 30 Dec 2002, Jeff Behl wrote:
> > 
> > 
> >>5066/52544/256000 mbufs in use (current/peak/max):
> >>5031/50612/64000 mbuf clusters in use (current/peak/max)
> >>
> >>is there some strange interaction going on between apace2 and bsd?
> >>killing apache caused the mbuf clusters to start draining, but only
> >>slowly.  will clusters still be allocated in FIN_WAIT_? states? 
> TIME_WAIT?
> > 
> > 
> > Before I answer your question, let me explain how clusters are
> allocated.
> > The first number above shows how many are in use at the moment.  The
> > second number shows how many have been used, and are currently
> allocated.
> > The third is the limit you have set.
> > 
> > What this means is that once an mbuf (or cluster) has been allocated,
> it
> > is never truly freed, only returned to the free list.  As a result,
> after
> > your spike in mbuf usage, you never really get the memory back. 
> However,
> > this may be OK if you have plenty of ram.
> > 
> > 
> >>This maching was serving a couple hundred connections a second...which
> >>doesn't seem like it should have taxed it much (p3 1.2 gHz).  CPU util
> >>was low.
> >>
> >>Any help appreciated.
> >>
> >>Jeff
> > 
> > 
> > Now, on to why the value spiked.  Yes, connections in FIN_WAIT* states
> > still hang on to mbuf clusters relating to the data they have been
> asked
> > to send.  There was a DoS script going around which intentionally stuck
> > many sockets on a server in the FIN_WAIT_2 state until enough had been
> > stuck to cause mbuf cluster exhaustion.  To determine if this is the
> case,
> > just run netstat -n and look at the sendq value; if you see high sendq
> > values on a lot of sockets, this may be your answer.
> > 
> > The other possibility is that you're being hit with lots of IP
> > fragments... currently, the IP reassembly code allows too many
> unassembled
> > packets to sit around.  There's no way to inspect the IP reassembly
> queue
> > actively, but you could use netstat -s to see "fragments received" - if
> > the number is high, then it's likely something is up.
> > 
> > Mike "Silby" Silbersack
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-net" in the body of the message
> 





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message