From owner-freebsd-questions Thu Dec 13 14: 5:37 2001 Delivered-To: freebsd-questions@freebsd.org Received: from po4.wam.umd.edu (po4.wam.umd.edu [128.8.10.166]) by hub.freebsd.org (Postfix) with ESMTP id 8E2DB37B405 for ; Thu, 13 Dec 2001 14:05:30 -0800 (PST) Received: from rac4.wam.umd.edu (IDENT:root@rac4.wam.umd.edu [128.8.10.144]) by po4.wam.umd.edu (8.9.3/8.9.3) with ESMTP id RAA12474; Thu, 13 Dec 2001 17:05:22 -0500 (EST) Received: from rac4.wam.umd.edu (IDENT:sendmail@localhost [127.0.0.1]) by rac4.wam.umd.edu (8.9.3/8.9.3) with SMTP id RAA06565; Thu, 13 Dec 2001 17:05:22 -0500 (EST) Received: from localhost (culverk@localhost) by rac4.wam.umd.edu (8.9.3/8.9.3) with ESMTP id RAA06558; Thu, 13 Dec 2001 17:05:22 -0500 (EST) X-Authentication-Warning: rac4.wam.umd.edu: culverk owned process doing -bs Date: Thu, 13 Dec 2001 17:05:22 -0500 (EST) From: Kenneth Wayne Culver To: Ryan Thompson Cc: Anthony Atkielski , FreeBSD Questions Subject: Re: Uptime not so good after all -- why does my net connection go dead? In-Reply-To: <20011213122631.L94416-100000@catalyst.sasknow.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This whole thing basically sounds to me like some form of misconfiguration, either of the router, or of the FreeBSD machine, or both. I've had uptimes greater than 2 years, and then the machine only went down because of a power failure. Ken On Thu, 13 Dec 2001, Ryan Thompson wrote: > Anthony Atkielski wrote to FreeBSD Questions: > > > I thought my FreeBSD system was going to stay up forever, based on > > what I had heard, > > Yes, and it should, barring hardware problems, pilot error, or > extended power outage, or managerial downtime. > > > but I had to boot it today. For the umpteenth time, the OS > > abruptly and silently decided to stop communicating with my > > router. It had no trouble talking to the other PC on my LAN, but > > it absolutely would not talk to the router. As far as I could > > tell, it would not respond to traffic from the router, nor would > > it send traffic to the router. > > To give you a more detailed response, we'll need to see what's > actually going on with FreeBSD. You're reporting, for the most part, > application-level symptoms. ICMP echo requests (ping) in this case > aren't much different. If the problem is with your LAN, you need to go > to the link layer... > > >From the router, AND the NT machine, try arp lookups for the FreeBSD > machine's public IP address. Do you get the same MAC address as is > shown in by the output of ifconfig(8) in FreeBSD? If no, then perhaps > your router has claimed the IP, or the IP was assigned to another > machine, etc, and you need to pinpoint that. This sort of thing can > happen behind your back. > > On the FreeBSD box, put your NIC in promiscuous mode and start > analyzing frames. What actually gets sent out on the wire? Is the > machine seeing the IP packets, but not actually passing them up to the > transport layer? Or maybe it just isn't sending anything out? > > I assume your IP address and netmask are set correctly with > ifconfig(8)? Does the router agree with you in terms of netmask? > > The output of `netstat -rn` would be extremely helpful. The output and > network config of the router would also be helpful. > > > Some things You can do: > > Try plugging your FreeBSD machine directly into a port on your router, > and unplugging everything else (except your uplink :-). If THIS works, > then another device on the wire is misbehaving. > > > > - It's not the FreeBSD machine's NIC; the NIC continues to talk to the NT > > machine, and I can also make it work with the router by adding a new IP > > address to the interface ("ifconfig xl0 xxx.xxx.xxx.xxx alias"). > > This suggests that either something is wrong with ARP, and/or the > routing tables on the FreeBSD machine or the router. > > > > Nothing seemed to make the problem go away, so after two weeks of > > continuous uptime, I finally bit the bullet and rebooted the > > machine. The problem was gone when the machine came back up. I > > did not power-cycle the hardware. > > I'd hardly be "biting the bullet" after 2 weeks: > $ uptime > 12:41PM up 261 days, 9:56, 3 users, load averages: 2.37, 2.46, 2.42 > $ uname -a > FreeBSD ren.sasknow.com 3.5-STABLE FreeBSD 3.5-STABLE #0: Sun Mar 25 > 22:28:19 CST 2001 hutenosa@ren.sasknow.com:/usr/src/sys/compile/REN i386 > > After 10 months or so, I think twice about rebooting. In this time, > this machine has survived two power failures, several brownouts, one > particularly memorable surge, a dead CPU fan, experimental code which > resulted in a fork bomb that filled up the proc table, exhausted the > swap space, and killed just about everything that was running on the > machine, not to mention the abuse it takes from all of our web clients > :-) And, 261 days isn't anywhere near the potential a properly > maintained FreeBSD system can achieve, but it definitely shows it is > sustainable. > > 10 months ago, the system was taken down to be moved to a different > room and be connected to a different UPS. I had a kernel upgrade ready > for that. Total downtime < 5 min. If not for the "managerial > decisions" I have made, this system probably wouldn't have been down > for the past 4 years (when it was installed). > > FWIW, you most did NOT have to reboot the FreeBSD machine :-) There > are plenty of problems that can be "solved" by a reboot, but the vast > majority of those can be solved WITHOUT a reboot if you know what to > fix. That is how many UNIX systems stay operational for several months > or even a few years. > > > > This means that the NT machine still holds the record for uptime > > by a very handsome margin (several weeks). > > > > I'd like to know exactly what is happening inside FreeBSD when it > > decides to consign this particular IP address to the Twilight Zone > > for one particular destination/source (the router). > > Sure, send answers to the questions I've posed, and we'll be able to > get much closer to an explanation. > > > > Obviously, this is a mission-critical issue, as no production > > system can afford to be completely deprived of external network > > connectivity. > > > > I used to have this problem a lot more until I discovered that the > > router was sending out DHCP and RIP traffic to the LAN. I turned > > that off and the problem _seemed_ to go away. Unfortunately, it > > looks like it simply became less frequent instead. Once in two > > weeks is still completely unacceptable, however. > > Which is exactly why you'll have to fix it! :-) > > > > Hope this helps, > - Ryan > > -- > Ryan Thompson > Network Administrator, Accounts > > SaskNow Technologies - http://www.sasknow.com > #106-380 3120 8th St E - Saskatoon, SK - S7H 0W2 > > Tel: 306-664-3600 Fax: 306-664-1161 Saskatoon > Toll-Free: 877-727-5669 (877-SASKNOW) North America > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message