From owner-freebsd-current@FreeBSD.ORG Mon Oct 13 09:28:42 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A7EA216A4B3 for ; Mon, 13 Oct 2003 09:28:42 -0700 (PDT) Received: from postal1.es.net (postal1.es.net [198.128.3.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id C686D43FBD for ; Mon, 13 Oct 2003 09:28:41 -0700 (PDT) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal1.es.net (Postal Node 1) with ESMTP (SSL) id MUA74016; Mon, 13 Oct 2003 09:28:39 -0700 Received: from ptavv (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 73FE25D07; Mon, 13 Oct 2003 09:28:38 -0700 (PDT) To: Sam Leffler In-Reply-To: Message from Sam Leffler of "Sun, 12 Oct 2003 11:56:53 PDT." <200310121156.53425.sam@errno.com> Date: Mon, 13 Oct 2003 09:28:38 -0700 From: "Kevin Oberman" Message-Id: <20031013162838.73FE25D07@ptavv.es.net> cc: current@freebsd.org cc: Andre Guibert de Bruet Subject: Re: What's up with the IP stack? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2003 16:28:42 -0000 > From: Sam Leffler > Date: Sun, 12 Oct 2003 11:56:53 -0700 > Sender: owner-freebsd-current@freebsd.org > > On Sunday 12 October 2003 11:03 am, Andre Guibert de Bruet wrote: > > On Sun, 12 Oct 2003, Josef Karthauser wrote: > > > On Sun, Oct 12, 2003 at 02:48:01PM +0200, Soren Schmidt wrote: > > > > It seems Josef Karthauser wrote: > > > > > I've just built and installed a new kernel, the first since Aug 6th. > > > > > There appears to be a problem with the IP stack. What happens is > > > > > that everything is fine for a few hours, and then the IP stack stops > > > > > working. I can no longer ping anything on the local network, my > > > > > default route drops out (which is probably dhclient's doing). > > > > > Perhaps it is ARP that is broken, it's hard to tell. All I know is > > > > > that I need to reboot to make it work again. > > > > > > > > > > Is anyone else experiencing this kind of problem? > > > > > > > > Do you have dummynet included in the kernel ? > > > > That has been broken for me since sam's latest commit as a backout > > > > of ip_dummynet.c fixes the problem for me... > > > > > > No, I've not got dummynet in there. My current kernel config is: > > > > I experienced this a week ago. I found that ifconfig'ing the interface > > down and back up again "fixed" the problem. I've since reverted to a > > kernel compiled on September 25th. > > It would be good to know more details; I still don't have much to go on. Try > to identify, for example, if the problem is specific to a particular > device/interface or feature you're using (e.g dummynet). If you have ddb in > your system, then when the system gets into a bad state break into the > debugger and look for threads that are blocked on locks. If you have witness > in your kernel then show locks would also be useful. If you don't have > witness in your system then rebuild your kernel with it. > > The most recent round of changes were to lock the routing table. These went > in 10/3 and were extensive. They could easily be the problem but w/o more > info I can't really help. Just a few more data points. I am seeing the problem on my ThinkPad T30 only on the wireless interface. I have never seen it when connected by 10/100 via fxp0. When I see this I can reach some LAN hosts, but not others. I can always seem to reach the access point. I can usually, but not always, reach most other systems on the LAN, but not the gateway router, a Sonic Wall firewall. I have logged onto another system and then connected to the firewall, so it looks like the physical path is OK. The problem is intermittent and I have only scattered data. I've been seeing it sice about the beginning of October. I was blaming it on hardware, but now that I see these reports, maybe it's not. (I just replaced my Apple Airport AP with a D-Link, so there is something to suspect.) In may case things just start working again. The pause can vary from a few seconds to about 10 minutes. netstat -rnf inet and arp -a output both look to be fine. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634