Date: Fri, 28 Jul 2000 18:02:16 EDT From: "Eric Withabee" <ericwithabee@hotmail.com> To: freebsd-questions@freebsd.org Subject: Network interface hanging on 3.3-RELEASE system Message-ID: <20000728220216.16352.qmail@hotmail.com>
next in thread | raw e-mail | index | archive | help
Hello. I posted this message a while back, and it got a few responses, but nothing to really help solve the problem, so I thought I'd try throwing it out one last time. The responses I got the first time suggested that it may be an overheating processor. However, the processor in the system in question is very adequately cooled. Also, it seems strange that an overheating processor would only affect the TCP/IP code, while all other applications continue to run fine. Anyway, if you have time to read it all, here's the original message with the detailed description of the problem: I'm experiencing some strange problems with a 3.3-RELEASE system. It runs fine for a few days, then it starts getting a continually increasing number of TCP connections stuck in the TIME_WAIT state. The number of connections keeps building until it reaches a total of about 4000 TCP connections, then the server simply stops responding to any requests from the network. From the time the connections start building up to the time the server hangs varies from under half an hour to a few hours. Again, once the buildup starts, the number of connections in the TIME_WAIT state only increases. I've been trying to diagnose the problem, but haven't had much luck. I'm not sure whether it's due to a bug or not, so I'm posting the question here instead of to freebsd-bugs. The problem started as soon as I took the system live. It replaced another FreeBSD system, and took over all its duties. It's primarily acting as a mail server (Sendmail 8.9.3 and QPopper 2.53) and a web server (Apache 1.3.9). It's also running MySQL 9.33. The server it replaced was a 133MHz Pentium, and the new server is a 233MHz Pentium II. The old server did not experience this problem -- in fact, it was extremely stable. I originally thought that it might be the NIC card, a 3Com 3C905B, or the "xl" driver, so I replaced it with a Linksys LNE100TX ("mx" driver). This seemed to help somewhat, as the duration between occurrences increased from a few hours to a few days. However, it continues to occur, and I'm wondering if the improvement when I switched the NIC card was just a coincidence. Although, since I made the switch, the problem has never occurred as quickly as it did with the 3Com card. We've had very good luck with 3Com NICs in the past, but this was the first time we'd used a 3C905B and the "xl" driver. The time between occurrences varies significantly. Sometimes, the system will run for over a week, while other times it will run for less than a day. Just in case the problem was related to the number of mbufs, I bumped up the default settings so that it has a maximum of 4096 mbuf clusters. It didn't help. The system seems to be peak at around 300 mbufs until the problem occurs. I decided to see whether it might be a DOS attack, even though that doesn't really make sense, because the problem started as soon as I took the system live. At the time the problem is occurring, the connections in the TIME_WAIT state don't originate primarily from one IP address. I suppose this doesn't rule out a distributed DOS attack, but I think that's pretty unlikely. Here's some specifics about the system: ASUS P3B-F motherboard Intel 233MHz PII 128MB RAM 2 Western Digital Expert 9.1GB 7200 RPM drives Mirrored via an Arco DupliDisk (Bay Mount) Linksys EtherFast 10/100 NIC (LNE100TX) Adaptec 2940UW SCSI Adapter HP SureStore T20i Travan Tape Drive Full-tower case with lots of fans In the meantime, while I've been trying to figure this out, I've set up a cron script that checks the number of connections and reboots the server if it gets to a stage that indicates that the server has passed the point of no return. Before it reboots it, it sends me an e-mail message giving the output from a "netstat -n", a "netstat -m" (I just added this today), and a "ps -ax". It's an ugly hack, but it's keeping me from getting paged at 3:00AM. Does anyone have any thoughts? Thanks for taking the time to read all this. Eric ________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000728220216.16352.qmail>