From owner-freebsd-questions@FreeBSD.ORG Sat Oct 11 21:06:55 2008 Return-Path: Delivered-To: questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 930C71065696 for ; Sat, 11 Oct 2008 21:06:55 +0000 (UTC) (envelope-from mksmith@adhost.com) Received: from mail-in05.adhost.com (mail-in05.adhost.com [216.211.128.135]) by mx1.freebsd.org (Postfix) with ESMTP id 788978FC13 for ; Sat, 11 Oct 2008 21:06:55 +0000 (UTC) (envelope-from mksmith@adhost.com) Received: from ad-exh01.adhost.lan (exchange.adhost.com [216.211.143.69]) by mail-in05.adhost.com (Postfix) with ESMTP id 62A04164822; Sat, 11 Oct 2008 14:06:54 -0700 (PDT) (envelope-from mksmith@adhost.com) Received: from 10.142.3.89 ([10.142.3.89]) by ad-exh01.adhost.lan ([10.142.0.20]) with Microsoft Exchange Server HTTP-DAV ; Sat, 11 Oct 2008 21:06:53 +0000 User-Agent: Microsoft-Entourage/12.12.0.080729 Date: Sat, 11 Oct 2008 14:06:49 -0700 From: "Michael K. Smith" To: Jeremy Chadwick Message-ID: Thread-Topic: FreeBSD as PF/Router/Firewall dying on the vine Thread-Index: Ackr5UasHfZNb+sFPk+TNaQjSgIVPQ== In-Reply-To: <20081007043009.GA38719@icarus.home.lan> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Cc: questions@freebsd.org Subject: Re: FreeBSD as PF/Router/Firewall dying on the vine X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Oct 2008 21:06:55 -0000 Hello Jeremy: On 10/6/08 9:30 PM, "Jeremy Chadwick" wrote: > On Mon, Oct 06, 2008 at 06:08:50PM -0700, Michael K. Smith - Adhost wrote: >> Hello All: >> >> We have a load balanced pair of PF boxes sitting in front of a whole bunch of >> server doing all manner of things! It's been working great up until today >> when it, well, didn't. Here's what I see in top -S. >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >> COMMAND >> 14 root 1 -44 -163 0K 8K CPU1 0 44:21 88.18% swi1: >> net >> 11 root 1 171 52 0K 8K RUN 0 24:58 53.32% idle: >> cpu0 >> 10 root 1 171 52 0K 8K RUN 1 17:44 35.50% idle: >> cpu1 >> 24 root 1 -68 -187 0K 8K *Giant 0 5:30 11.62% irq16: >> em2 uhci3 >> 23 root 1 -68 -187 0K 8K WAIT 0 1:27 3.08% irq25: >> em1 >> 25 root 1 -68 -187 0K 8K WAIT 1 1:16 2.64% irq17: >> em3 >> >> This is 6.3 with Intel 1000 Fiber and Copper interfaces, all using the 'em' >> driver. Also, there are 15 VLAN's configured on one of the NIC's for subnet >> separation. >> >> If anyone has any ideas I'm all ears. My google-fu is coming up empty with >> the swi1: net > > Can you explain what the problem is? Sorry it took so long to reply. We actually got the issue resolved, but I wanted to make sure our fix actually worked. Here is what the problem/solution is. The problem was significant packet loss and connectivity issue to and through the PF server. Even pinging the loopback address on the server itself was returning 4 ms times. The problem was a very busy NFS server with clients on the same VLAN, but on a different subnet. So, we had a VLAN interface on em1 that had two address ranges attached, 10.255.0.0/16 and 10.212.6.0/16. The NFS server was on the 10.255 and the clients were on the 10.212. Even though they were on the same VLAN, they weren't directly ARP'able, so all traffic (400 - 600 Mb/sec) between them had to be processed by the server. When we moved the clients on to the same subnet as the server, everything stabilized. I think this was an issue of bad design on my part. Regards, Mike