From owner-freebsd-hackers Sun Feb 18 10:34:06 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id KAA26615 for hackers-outgoing; Sun, 18 Feb 1996 10:34:06 -0800 (PST) Received: from etinc.com (etinc.com [165.254.13.209]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id KAA26600 for ; Sun, 18 Feb 1996 10:34:02 -0800 (PST) Received: from dialup-usr11.etinc.com (dialup-usr11.etinc.com [204.141.95.132]) by etinc.com (8.6.12/8.6.9) with SMTP id NAA07941 for ; Sun, 18 Feb 1996 13:36:35 -0500 Date: Sun, 18 Feb 1996 13:36:35 -0500 Message-Id: <199602181836.NAA07941@etinc.com> X-Sender: dennis@etinc.com X-Mailer: Windows Eudora Version 2.0.3 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: hackers@freebsd.org From: dennis@etinc.com (dennis) Subject: Re: Web server locks up... but not quite. (?) Sender: owner-hackers@freebsd.org Precedence: bulk >> This sort of thing has happened before with other 2.1.0-R machines >> here, but tonight was the first time I was able to get to the console >> of one before someone else rebooted it. >> >> Our web server is a P90 with 64 megabytes of RAM, running Apache >> 1.0.2. For no discernable reason, it stopped working tonight. >> "Stopped working" in that no TCP services were available, NFS clients >> that mounted a filesystem served from it hung in disk wait and no >> rwhod packets were being broadcast. >> >> You could telnet to various ports on it (indicating that inetd was >> still bound to those ports), but none of the services normally >> attached to those ports would run, including internal ones like >> chargen or daytime (indicating that inetd was blocked in some way). >> It wasn't fielding RPC requests either. The login prompt was still >> displayed on all the virtual consoles (I was still able to switch >> between them), but there was no response from the keyboard, as if the >> getty's had died off. The only sign of life was that it was returning >> pings from another machine. >> >> There were no telltale messages on the console, nor in the syslog. >> This server gets 250,000 to 300,000 hits per day. While it is >> running, it does not appear to be under any excessive load. There are >> typically 40 to 60 httpd's running. It exports a 4-gigabyte >> filesystem containing access logs to client machines so our customers >> can produce statistical reports. It also mounts 26 gigabytes of home >> directories from a central NFS server. >> >> Since there is no indication as to the source of the hang, is >> there anything I can run periodically from cron to help track down the >> problem? I can start tracking load averages, swap space usage, the >> output of vmstat, netstat, iostat and nfsstat if that will help. Any >> suggestions? > >I've seen similar hangs occasionally under both 2.0.5R and 2.1.0R and one >additional "thing" I've noticed is that processes that are completely >in-core appear to keep running (i.e. I had a "vmstat 1" running for a few >weeks and when the box I am thinking of locked up, the vmstat 1 was still >scrolling output, the box was ping-able, but any services that were not >entirely in-core or required other disk accesses were not available). >There is something to the "in-core" business because I have seen the same >box both continue to broadcast rwho and NOT broadcast rwho, presumably >determined by whether or not it was in-core.. The more i read about this, the more i think its gotta be memory allocation failures...no new processes but old ones and kernel stuff keeps on ticking...is there a logging funtion for these, or would logging attempts fail as well? dennis ---------------------------------------------------------------------------- Emerging Technologies, Inc. http://www.etinc.com Synchronous PC Cards and Routers For Discriminating Tastes. 56k to T1 and beyond. Frame Relay, PPP, HDLC, and X.25 for BSD/OS, FreeBSD and LINUX.