From owner-freebsd-current@FreeBSD.ORG  Mon Nov  2 21:48:34 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D11EE106568B
	for <freebsd-current@freebsd.org>; Mon,  2 Nov 2009 21:48:34 +0000 (UTC)
	(envelope-from weldon@excelsusphoto.com)
Received: from mx0.excelsus.net (emmett.excelsus.com [74.93.113.252])
	by mx1.freebsd.org (Postfix) with ESMTP id 7AFAC8FC17
	for <freebsd-current@freebsd.org>; Mon,  2 Nov 2009 21:48:33 +0000 (UTC)
Received: (qmail 89846 invoked by uid 89); 2 Nov 2009 21:48:32 -0000
Received: from unknown (HELO localhost) (127.0.0.1)
	by localhost.excelsus.com with SMTP; 2 Nov 2009 21:48:32 -0000
Date: Mon, 2 Nov 2009 16:48:31 -0500 (EST)
From: Weldon S Godfrey 3 <weldon@excelsusphoto.com>
X-X-Sender: weldon@emmett.excelsus.com
To: freebsd-current@freebsd.org
In-Reply-To: <alpine.BSF.2.00.0911021608590.80499@emmett.excelsus.com>
Message-ID: <alpine.BSF.2.00.0911021648100.80499@emmett.excelsus.com>
References: <alpine.BSF.2.00.0911020747560.80499@emmett.excelsus.com>
	<alpine.BSF.2.00.0911021608590.80499@emmett.excelsus.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Mon, 02 Nov 2009 22:00:41 +0000
Subject: Re: FreeBSD 8.0 - network stack crashes?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Nov 2009 21:48:34 -0000


If memory serves me right, sometime around 4:11pm, Weldon S Godfrey 3 told me:

>
>
> If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told 
> me:
>
>> 
>> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08.  We 
>> started to see a couple months ago some very odd network behavior. Something 
>> happens to the stack that causes processes accessing the network to just 
>> hang.  After the problem happens, usually (but not always), you can't ssh 
>> in.  Always, you can't ssh or telnet out, and nothing can access the NFS 
>> shares on the server. You can ping everything from the server. You can't 
>> even do a route add, you can't ssh if you use just the IP address (although 
>> pinging with hostnames it doesn't have cached or in hosts table resolves). 
>> When you try to ssh out, do a route add from the box, the process just 
>> hangs.  You can't control C it at all, it hangs forever.  There is nothing 
>> in dmesg or messages to indicate an issue.  I try to up/down the interfaces. 
>> In CURRENT-12/08, it may allow things to work for like 30s.
>> 
>> We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to 
>> happen a lot more often.  We expected that was related with the increase in 
>> network performance.  At least in 8.0-RC2, I did see a large amount of input 
>> errors with netstat -in on the heavily loaded interface before it started 
>> the locking up behavior.  I have replaced the ethernet cable and move ports. 
>> The Catalyst 3650 never records any errors.  The problem would reoccur in 
>> about 5 minutes once our load kicked in this morning.
>> 
>> 
>> One change in this upgrade, we switched from NFS v2 to v3.  When we 
>> downgraded to the previous OS, we stayed at v3.  The problem was just about 
>> as bad with v3 with the 12/08 OS
>> 
>> We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
>> It ran for about an hour and a half and then the issue came up
>> 
>> We are currently back to the 12/08 version using NFS2 and watching things.
>> 
>> We are using a Dell PowerEdge 2950-iii, the problem happens when using the 
>> onboard nics using the bce driver and with an Intel card using the em driver
>> 
>> I am hunting down any MTU/duplex/speed problems that could cause it (haven't 
>> found any so far).  Of course, any problems on the network wouldn't 
>> (ideally) freak out the network stack on the server).  I don't know how to 
>> troubleshoot this further on the server since I am not getting any problems 
>> indicated in logging, panics, cores, etc.
>> 
>> Any help is appreciated.
>> 
>
>
> I have swapped out the computer, switch, ethernet card, 3ware card.  We are 
> running on 8.0-CURRENT 12/08 that was what we where using with a lot less 
> issues.  No help.
>
> If it happens again, I am going to try to do a netif restart and routing 
> restart.  Although I believe I tried that at the begining and it did not help.
>

BTW.. doing a netif / routing restart doesn't help