From owner-freebsd-hackers Fri Feb 16 22:39:45 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id WAA24189 for hackers-outgoing; Fri, 16 Feb 1996 22:39:45 -0800 (PST) Received: from brasil.moneng.mei.com (brasil.moneng.mei.com [151.186.109.160]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id WAA24179 for ; Fri, 16 Feb 1996 22:39:21 -0800 (PST) Received: (from jgreco@localhost) by brasil.moneng.mei.com (8.7.Beta.1/8.7.Beta.1) id AAA06538; Sat, 17 Feb 1996 00:38:02 -0600 From: Joe Greco Message-Id: <199602170638.AAA06538@brasil.moneng.mei.com> Subject: Re: Web server locks up... but not quite. (?) To: taob@io.org (Brian Tao) Date: Sat, 17 Feb 1996 00:38:01 -0600 (CST) Cc: freebsd-hackers@freebsd.org In-Reply-To: from "Brian Tao" at Feb 16, 96 09:52:00 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-hackers@freebsd.org Precedence: bulk > This sort of thing has happened before with other 2.1.0-R machines > here, but tonight was the first time I was able to get to the console > of one before someone else rebooted it. > > Our web server is a P90 with 64 megabytes of RAM, running Apache > 1.0.2. For no discernable reason, it stopped working tonight. > "Stopped working" in that no TCP services were available, NFS clients > that mounted a filesystem served from it hung in disk wait and no > rwhod packets were being broadcast. > > You could telnet to various ports on it (indicating that inetd was > still bound to those ports), but none of the services normally > attached to those ports would run, including internal ones like > chargen or daytime (indicating that inetd was blocked in some way). > It wasn't fielding RPC requests either. The login prompt was still > displayed on all the virtual consoles (I was still able to switch > between them), but there was no response from the keyboard, as if the > getty's had died off. The only sign of life was that it was returning > pings from another machine. > > There were no telltale messages on the console, nor in the syslog. > This server gets 250,000 to 300,000 hits per day. While it is > running, it does not appear to be under any excessive load. There are > typically 40 to 60 httpd's running. It exports a 4-gigabyte > filesystem containing access logs to client machines so our customers > can produce statistical reports. It also mounts 26 gigabytes of home > directories from a central NFS server. > > Since there is no indication as to the source of the hang, is > there anything I can run periodically from cron to help track down the > problem? I can start tracking load averages, swap space usage, the > output of vmstat, netstat, iostat and nfsstat if that will help. Any > suggestions? I've seen similar hangs occasionally under both 2.0.5R and 2.1.0R and one additional "thing" I've noticed is that processes that are completely in-core appear to keep running (i.e. I had a "vmstat 1" running for a few weeks and when the box I am thinking of locked up, the vmstat 1 was still scrolling output, the box was ping-able, but any services that were not entirely in-core or required other disk accesses were not available). There is something to the "in-core" business because I have seen the same box both continue to broadcast rwho and NOT broadcast rwho, presumably determined by whether or not it was in-core.. ... Joe ------------------------------------------------------------------------------- Joe Greco - Systems Administrator jgreco@ns.sol.net Solaria Public Access UNIX - Milwaukee, WI 414/546-7968