From owner-freebsd-questions Thu Apr 27 14:22:56 2000 Delivered-To: freebsd-questions@freebsd.org Received: from gw.Adl.USSR.net (digita1.lnk.telstra.net [139.130.137.85]) by hub.freebsd.org (Postfix) with ESMTP id 6913537BB3E for ; Thu, 27 Apr 2000 14:22:48 -0700 (PDT) (envelope-from wabit@adl.ussr.net) Received: from localhost (wabit@localhost) by gw.Adl.USSR.net (8.10.1/8.10.1) with ESMTP id e3RLMYp09890; Fri, 28 Apr 2000 06:52:35 +0930 (CST) Date: Fri, 28 Apr 2000 06:52:32 +0930 (CST) From: james To: Eric Withabee Cc: freebsd-questions@FreeBSD.ORG Subject: Re: Network interface hanging on 3.3-RELEASE system In-Reply-To: <20000427210353.79863.qmail@hotmail.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I had this problem when my CPU fan was buggered, might be worth looking at that? regards james On Thu, 27 Apr 2000, Eric Withabee wrote: > Date: Thu, 27 Apr 2000 17:03:53 EDT > From: Eric Withabee > To: freebsd-questions@FreeBSD.ORG > Subject: Network interface hanging on 3.3-RELEASE system > > Hello. > > I'm experiencing some strange problems with a 3.3-RELEASE system. It runs > fine for a few days, then it starts getting a continually increasing number > of TCP connections stuck in the TIME_WAIT state. The number of connections > keeps building until it reaches a total of about 4000 TCP connections, then > the server simply stops responding to any requests from the network. From > the time the connections start building up to the time the server hangs > varies from under half an hour to a few hours. Again, once the buildup > starts, the number of connections in the TIME_WAIT state only increases. > > I've been trying to diagnose the problem, but haven't had much luck. I'm > not sure whether it's due to a bug or not, so I'm posting the question here > instead of to freebsd-bugs. > > The problem started as soon as I took the system live. It replaced another > FreeBSD system, and took over all its duties. It's primarily acting as a > mail server (Sendmail 8.9.3 and QPopper 2.53) and a web server (Apache > 1.3.9). It's also running MySQL 9.33. The server it replaced was a 133MHz > Pentium, and the new server is a 233MHz Pentium II. The old server did not > experience this problem -- in fact, it was extremely stable. > > I originally thought that it might be the NIC card, a 3Com 3C905B, or the > "xl" driver, so I replaced it with a Linksys LNE100TX ("mx" driver). This > seemed to help somewhat, as the duration between occurrences increased from > a few hours to a few days. However, it continues to occur, and I'm > wondering if the improvement when I switched the NIC card was just a > coincidence. Although, since I made the switch, the problem has never > occurred as quickly as it did with the 3Com card. We've had very good luck > with 3Com NICs in the past, but this was the first time we'd used a 3C509B > and the "xl" driver. > > The time between occurrences varies significantly. Sometimes, the system > will run for over a week, while other times it will run for less than a day. > > Just in case the problem was related to the number of mbufs, I bumped up the > default settings so that it has a maximum of 4096 mbuf clusters. It didn't > help. The system seems to be peak at around 300 mbufs until the problem > occurs. > > I decided to see whether it might be a DOS attack, even though that doesn't > really make sense, because the problem started as soon as I took the system > live. At the time the problem is occurring, the connections in the > TIME_WAIT state don't originate primarily from one IP address. I suppose > this doesn't rule out a distributed DOS attack, but I think that's pretty > unlikely. > > Here's some specifics about the system: > > ASUS P3B-F motherboard > Intel 233MHz PII > 128MB RAM > 2 Western Digital Expert 9.1GB 7200 RPM drives > Mirrored via an Arco DupliDisk (Bay Mount) > Linksys EtherFast 10/100 NIC (LNE100TX) > Adaptec 2940UW SCSI Adapter > HP SureStore T20i Travan Tape Drive > Full-tower case with lots of fans > > In the meantime, while I've been trying to figure this out, I've got a > cron'ed a script that checks the number of connections and reboots the > server if it gets to a stage that indicates that the server has passed the > point of no return. Before it reboots it, it sends me an e-mail message > giving the output from a "netstat -n", a "netstat -m" (I just added this > today), and a "ps -ax". It's an ugly hack, but it's keeping me from getting > paged at 3:00AM. > > Does anyone have any thoughts? Thanks for taking the time to read all this. > > Eric > ________________________________________________________________________ > Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message