From owner-freebsd-hackers  Mon Jun 24 14:27:34 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from pintail.mail.pas.earthlink.net (pintail.mail.pas.earthlink.net [207.217.120.122])
	by hub.freebsd.org (Postfix) with ESMTP id 199D537B403
	for <hackers@freebsd.org>; Mon, 24 Jun 2002 14:27:28 -0700 (PDT)
Received: from pool0352.cvx22-bradley.dialup.earthlink.net ([209.179.199.97] helo=mindspring.com)
	by pintail.mail.pas.earthlink.net with esmtp (Exim 3.33 #2)
	id 17MbMh-00022R-00; Mon, 24 Jun 2002 14:27:15 -0700
Message-ID: <3D178E89.AB893E72@mindspring.com>
Date: Mon, 24 Jun 2002 14:26:33 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Patrick Thomas <root@utility.clubscholarship.com>
Cc: Nielsen <nielsen@memberwebs.com>, hackers@freebsd.org
Subject: Re: (jail) problem and a (possible) solution ?
References: <20020624112143.X68572-100000@utility.clubscholarship.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Patrick Thomas wrote:
> I made an initial change to the kernel of reducing maxusers from 512 to
> 256 - you said that 3gig is right on the border of needing extra KVA or
> not, so I thought maybe this unnecessarily high maxusers might be puching
> me "over the top".  However, as long as I was changing the kernel, I also
> added DDB.
> 
> The bad news is, it crashed again.  The good news is, I dropped to the
> debugger and got the wait channel info you wanted with `ps`.  Here are the
> last four columns of ps output for the first two pages of processes
> (roughly 900 procs were running at the time of the halt, so of course I
> can't give you them all, especially since I am copying by hand)
> 
> 3       select  c0335140        local
> 3       select  c0335140        trivial-rewrite
> 3       select  c0335140        cleanup
> 3       select  c0335140        smtpd
> 3       select  c0335140        imapd
> 2                               httpd
> 2                               httpd
> 3       sbwait  e5ff6a8c        httpd
> 3       lockf   c89b7d40        httpd
> 3       sbwait  e5fc8d0c        httpd
> 2                               httpd
> 3       select  c0335140        top
> 3       accept  e5fc9ef6        httpd
> 3       select  c0335140        imapd
> 3       select  c0335140        couriertls
> 3       select  c0335140        imapd
> 2                               couriertls
> 3       ttyin   c74aa630        bash
> 3       select  c0335140        sshd
> 3       select  c0335140        tt++
> 
> So there it all is.  Does this confirm your feeling that I need to
> increase KVA?  Or does it show you that one of the one or two other low
> probablity problems is occurring?

Matt Dillon is right, that there's nothing conclusive in the information
you've posted.  However... it provides room for additional speculation.

--

The number of "select" waits is reasonable.  The "sbwait" makes
me somewhat worried.

It's obvious that you are running a large number of httpd's; the
sbwait in this case could be reasonably assumed to be waits based
on "sendfile" for a change in so->so_snd->sb_cc; if that's the
case, then it may be that you are simply running out of mbufs,
and are deadlocking.  This can happen if you have enough data in
the pipe that you can not receive more data (e.g. the m_pullup()
in tcp_input() could fail before other things would fail).

If this is too much assumption, you can walk the entry off the
process, and see if it's the address of the sb_cc for so_snd or
for so_rcv for the process in question.

The way to cross-check this would be to run a continuous "netstat -m",
e.g.:

	#!/bin/sh
	while true
	do
		netstat -m
		sleep 1
	done

When the lockup comes, the interesting numbers are:

# netstat -m
3/64/5696 mbufs in use (current/peak/max):		<-- #3
        3 mbufs allocated to data
0/40/1424 mbuf clusters in use (current/peak/max)	<-- #2
96 Kbytes allocated to network (2% of mb_map in use)
0 requests for memory denied				<-- #1
0 requests for memory delayed
0 calls to protocol drain routines

If there are a lot of denials, then you are out of mbuf memory
and/or mbuf clusters (sendfile tends to eat clusters for breakfast;
it's one of the reasons I dislike it immensely; the other is that
the standards for the majority of wire protocols where you'd use it
require CRLF termination, and UNIX text files have only LF termination).

The current vs. peak vs. max will tell you how close to resource
saturation you are.  The ratio of clusters to mbufs will (effectively)
tell you if you need to worry about adjusting the ratio because of
sendfile.

The "lockf" could (maybe) be a deadlock, but if it were, everyone
would be seeing it; it's incredibly doubtful, as long as the "ps"
output you indicated was at all accurate.

Basically, if you have any denials, or if the number of mbuf
clusters gets really large, then you could have a problem.

It would also be interesting to see the output of:

	# sysctl -a | grep tcp | grep space
	net.inet.tcp.sendspace: 32768
	net.inet.tcp.recvspace: 65536

A standard "netstat" would also tell you the contents of the
"Recv-Q Send-Q" columns.  If they were non-zero, then you would
basically be able to tell how much memory was being consumed by
network traffic in and out.

I guess the best way to deal with this would be to drop the size
of the send or receive queues, until it didn't consume all your
memory.  In general, the size of these queues is supposed to be
a *maximum*, not a *mean*, so the number of sockets possible,
times the maximum total of both, will often exceed the amount of
available mbuf space.

An interesting attack that is moderately effective on FreeBSD
boxes is to send with a very large size, and not send one of
the fragments (e.g. the second one) to prevent fragment
reassembly, and therefore saturate the reassembly queue.  The
Linux UDP NFS client code does this unintentionally, but you
could believe that someone might be doing it intentionally,
as well, which would also work against TCP.  It's doubtful that
you are being hit by a FreeBSD targetted attack, however.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message