From owner-freebsd-questions@FreeBSD.ORG  Sat Oct 11 21:06:55 2008
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 930C71065696
	for <questions@freebsd.org>; Sat, 11 Oct 2008 21:06:55 +0000 (UTC)
	(envelope-from mksmith@adhost.com)
Received: from mail-in05.adhost.com (mail-in05.adhost.com [216.211.128.135])
	by mx1.freebsd.org (Postfix) with ESMTP id 788978FC13
	for <questions@freebsd.org>; Sat, 11 Oct 2008 21:06:55 +0000 (UTC)
	(envelope-from mksmith@adhost.com)
Received: from ad-exh01.adhost.lan (exchange.adhost.com [216.211.143.69])
	by mail-in05.adhost.com (Postfix) with ESMTP id 62A04164822;
	Sat, 11 Oct 2008 14:06:54 -0700 (PDT)
	(envelope-from mksmith@adhost.com)
Received: from 10.142.3.89 ([10.142.3.89]) by ad-exh01.adhost.lan
	([10.142.0.20]) with Microsoft Exchange Server HTTP-DAV ; 
	Sat, 11 Oct 2008 21:06:53 +0000
User-Agent: Microsoft-Entourage/12.12.0.080729
Date: Sat, 11 Oct 2008 14:06:49 -0700
From: "Michael K. Smith" <mksmith@adhost.com>
To: Jeremy Chadwick <koitsu@FreeBSD.org>
Message-ID: <C5166379.1DDB9%mksmith@adhost.com>
Thread-Topic: FreeBSD as PF/Router/Firewall dying on the vine
Thread-Index: Ackr5UasHfZNb+sFPk+TNaQjSgIVPQ==
In-Reply-To: <20081007043009.GA38719@icarus.home.lan>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit
Cc: questions@freebsd.org
Subject: Re: FreeBSD as PF/Router/Firewall dying on the vine
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Oct 2008 21:06:55 -0000

Hello Jeremy:


On 10/6/08 9:30 PM, "Jeremy Chadwick" <koitsu@FreeBSD.org> wrote:

> On Mon, Oct 06, 2008 at 06:08:50PM -0700, Michael K. Smith - Adhost wrote:
>> Hello All:
>> 
>> We have a load balanced pair of PF boxes sitting in front of a whole bunch of
>> server doing all manner of things!  It's been working great up until today
>> when it, well, didn't.  Here's what I see in top -S.
>> 
>>   PID USERNAME       THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
>> COMMAND
>>    14 root             1 -44 -163     0K     8K CPU1   0  44:21 88.18% swi1:
>> net
>>    11 root             1 171   52     0K     8K RUN    0  24:58 53.32% idle:
>> cpu0
>>    10 root             1 171   52     0K     8K RUN    1  17:44 35.50% idle:
>> cpu1
>>    24 root             1 -68 -187     0K     8K *Giant 0   5:30 11.62% irq16:
>> em2 uhci3
>>    23 root             1 -68 -187     0K     8K WAIT   0   1:27  3.08% irq25:
>> em1
>>    25 root             1 -68 -187     0K     8K WAIT   1   1:16  2.64% irq17:
>> em3
>> 
>> This is 6.3 with Intel 1000 Fiber and Copper interfaces, all using the 'em'
>> driver.  Also, there are 15 VLAN's configured on one of the NIC's for subnet
>> separation.
>> 
>> If anyone has any ideas I'm all ears.  My google-fu is coming up empty with
>> the swi1: net 
> 
> Can you explain what the problem is?

Sorry it took so long to reply.  We actually got the issue resolved, but I
wanted to make sure our fix actually worked.  Here is what the
problem/solution is.

The problem was significant packet loss and connectivity issue to and
through the PF server.  Even pinging the loopback address on the server
itself was returning 4 ms times.

The problem was a very busy NFS server with clients on the same VLAN, but on
a different subnet.  So, we had a VLAN interface on em1 that had two address
ranges attached, 10.255.0.0/16 and 10.212.6.0/16.  The NFS server was on the
10.255 and the clients were on the 10.212.

Even though they were on the same VLAN, they weren't directly ARP'able, so
all traffic (400 - 600 Mb/sec) between them had to be processed by the
server.  When we moved the clients on to the same subnet as the server,
everything stabilized.

I think this was an issue of bad design on my part.

Regards,

Mike