From owner-freebsd-hackers  Sat Feb 17 07:44:47 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id HAA15750
          for hackers-outgoing; Sat, 17 Feb 1996 07:44:47 -0800 (PST)
Received: from freebsd.netcom.com (freebsd.netcom.com [198.211.79.3])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id HAA15744
          for <hackers@freebsd.org>; Sat, 17 Feb 1996 07:44:44 -0800 (PST)
Received: by freebsd.netcom.com (8.6.12/SMI-4.1)
	id JAA03588; Sat, 17 Feb 1996 09:48:34 -0600
From: bugs@freebsd.netcom.com (Mark Hittinger)
Message-Id: <199602171548.JAA03588@freebsd.netcom.com>
Subject: Re: Web server locks up... but not quite. (?) (fwd)
To: hackers@freebsd.org
Date: Sat, 17 Feb 1996 09:48:33 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-hackers@freebsd.org
Precedence: bulk

> From: Joe Greco <jgreco@brasil.moneng.mei.com>
> To: taob@io.org (Brian Tao)
> > typically 40 to 60 httpd's running.  It exports a 4-gigabyte
> > filesystem containing access logs to client machines so our customers
> > can produce statistical reports. 

Is the 4 gig drive a Seagate barracuda?    (yes for me, bt946c)

Do you run alias ip's for 'virtual web sites'?  (yes for me, a bunch)

What ethernet card do you run on the box?       (3c509 isa for me)

How large is your swap file?                   (256mb swap file)

The reason I ask these questions is that other boxes running the same rev
of FreeBSD will not exhibit the problem at all.  I am trying to find the
common thread.

> I've seen similar hangs occasionally under both 2.0.5R and 2.1.0R and one
> additional "thing" I've noticed is that processes that are completely
> in-core appear to keep running (i.e. I had a "vmstat 1" running for a few
> weeks and when the box I am thinking of locked up, the vmstat 1 was still
> scrolling output, the box was ping-able, but any services that were not
> entirely in-core or required other disk accesses were not available).
> There is something to the "in-core" business because I have seen the same
> box both continue to broadcast rwho and NOT broadcast rwho, presumably
> determined by whether or not it was in-core..

I saw this behavior before 2.0.5, then it went away until about 3 weeks
before 2.1R was cut.

I will see the following kinds of processes hang (unkillable)
in "D+" state via ps.  Innd, Cern httpd, and ps.

Ps seems to have it happen a lot.  "ps -ax" will hang whereas simply "ps"
will not.   When "ps -ax" hangs, who and top will run ok.

I am wondering if the "ps -ax" hangs are because it is trying to look at
the swap space of another process which is hung and I don't realize it :-)
This would imply some kind of deadlock condition for a page out on the
swap space.

Regards,

Mark Hittinger
Netcom/Dallas
bugs@freebsd.netcom.com