From owner-freebsd-hackers  Sun Feb 18 10:34:06 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id KAA26615
          for hackers-outgoing; Sun, 18 Feb 1996 10:34:06 -0800 (PST)
Received: from etinc.com (etinc.com [165.254.13.209])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id KAA26600
          for <hackers@freebsd.org>; Sun, 18 Feb 1996 10:34:02 -0800 (PST)
Received: from dialup-usr11.etinc.com (dialup-usr11.etinc.com [204.141.95.132]) by etinc.com (8.6.12/8.6.9) with SMTP id NAA07941 for <hackers@freebsd.org>; Sun, 18 Feb 1996 13:36:35 -0500
Date: Sun, 18 Feb 1996 13:36:35 -0500
Message-Id: <199602181836.NAA07941@etinc.com>
X-Sender: dennis@etinc.com
X-Mailer: Windows Eudora Version 2.0.3
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: hackers@freebsd.org
From: dennis@etinc.com (dennis)
Subject: Re: Web server locks up... but not quite. (?)
Sender: owner-hackers@freebsd.org
Precedence: bulk

>>     This sort of thing has happened before with other 2.1.0-R machines
>> here, but tonight was the first time I was able to get to the console
>> of one before someone else rebooted it.
>> 
>>     Our web server is a P90 with 64 megabytes of RAM, running Apache
>> 1.0.2.  For no discernable reason, it stopped working tonight.
>> "Stopped working" in that no TCP services were available, NFS clients
>> that mounted a filesystem served from it hung in disk wait and no
>> rwhod packets were being broadcast.
>> 
>>     You could telnet to various ports on it (indicating that inetd was
>> still bound to those ports), but none of the services normally
>> attached to those ports would run, including internal ones like
>> chargen or daytime (indicating that inetd was blocked in some way).
>> It wasn't fielding RPC requests either.  The login prompt was still
>> displayed on all the virtual consoles (I was still able to switch
>> between them), but there was no response from the keyboard, as if the
>> getty's had died off.  The only sign of life was that it was returning
>> pings from another machine.
>> 
>>     There were no telltale messages on the console, nor in the syslog.
>> This server gets 250,000 to 300,000 hits per day.  While it is
>> running, it does not appear to be under any excessive load.  There are
>> typically 40 to 60 httpd's running.  It exports a 4-gigabyte
>> filesystem containing access logs to client machines so our customers
>> can produce statistical reports.  It also mounts 26 gigabytes of home
>> directories from a central NFS server.
>> 
>>     Since there is no indication as to the source of the hang, is
>> there anything I can run periodically from cron to help track down the
>> problem?  I can start tracking load averages, swap space usage, the
>> output of vmstat, netstat, iostat and nfsstat if that will help.  Any
>> suggestions?
>
>I've seen similar hangs occasionally under both 2.0.5R and 2.1.0R and one
>additional "thing" I've noticed is that processes that are completely
>in-core appear to keep running (i.e. I had a "vmstat 1" running for a few
>weeks and when the box I am thinking of locked up, the vmstat 1 was still
>scrolling output, the box was ping-able, but any services that were not
>entirely in-core or required other disk accesses were not available).
>There is something to the "in-core" business because I have seen the same
>box both continue to broadcast rwho and NOT broadcast rwho, presumably
>determined by whether or not it was in-core..

The more i read about this, the more i think its gotta be memory
allocation failures...no new processes but old ones and kernel
stuff keeps on ticking...is there a logging funtion for these, or 
would logging attempts fail as well?

dennis
----------------------------------------------------------------------------
Emerging Technologies, Inc.      http://www.etinc.com

Synchronous PC Cards and Routers For Discriminating
Tastes. 56k to T1 and beyond. Frame Relay, PPP, HDLC, 
and X.25 for BSD/OS, FreeBSD and LINUX.