From owner-freebsd-hackers  Sat Jun 17 00:45:06 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id AAA11345
          for hackers-outgoing; Sat, 17 Jun 1995 00:45:06 -0700
Received: from aries.ibms.sinica.edu.tw ([140.109.40.248])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id AAA11332
          for <hackers@freebsd.org>; Sat, 17 Jun 1995 00:44:58 -0700
Received: (from taob@localhost) by aries.ibms.sinica.edu.tw (8.6.11/8.6.9) id PAA19967; Sat, 17 Jun 1995 15:44:02 +0800
Date: Sat, 17 Jun 1995 15:44:02 +0800 (CST)
From: Brian Tao <taob@gate.sinica.edu.tw>
To: Mark Hittinger <bugs@ns1.win.net>
cc: hackers@freebsd.org
Subject: re: too many open files 
In-Reply-To: <199506161653.MAA02959@ns1.win.net>
Message-ID: <Pine.BSI.3.91.950617153813.212N-100000@aries>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: hackers-owner@freebsd.org
Precedence: bulk

On Fri, 16 Jun 1995, Mark Hittinger wrote:
> 
> I still wonder if some other parameter is being consumed other that
> file descriptors.  It would be nice if we could get some sort of "lsof"
> port or some system call to spew a list of open file descriptors out
> in a readable dump file.

    There's fstat(1) which is what I've been using, but CPU and disk
load is so high during the time of the benchmarking that it can take
(no kidding) in excees of 10 minutes for fstat to return.  That
usually means I have one chance to issue the command... the system
hangs while a second attempt is in progress.  :(

> It also sounds like Brian has some sort of deadlock problem.  Maybe NFS is
> creating some sort of lock problem which prevents new processes from
> being created.  Some sort of unresolvable buffer/vm page deadlock condition?

    Possibly, but reducing NFS load by moving the HTTP directory
hierarchy to a local disk (I've got a huge /scratch filesystem, thank
goodness) does not allow the system to run longer or faster.

> I wonder if his processes cannot exit and release their resources because
> of some condition like this.  This would just gum things up as more 
> processes got created.

    This happens with both the NCSA and Apache httpd in pre-fork mode,
but *not* with Apache in standard forking mode (with up to 50
benchmark clients).  Reducing the number to 10 servers and clients
allows at least Apache to run long enough in pre-fork mode to complete
an overnight (10 hours) test.  I'm not sure what else to try, and I
don't have any 950412 systems left for comparison.

    Oh, I did think of one thing I'm going to try tonight before I go
home:  I have a new kernel now with OPEN_MAX=1024 and NMBCLUSTERS=1024
just to see if that will make any difference over OPEN_MAX=256 and
NMBCLUSTERS=1024.
-- 
Brian ("Though this be madness, yet there is method in't") Tao
taob@gate.sinica.edu.tw <-- work ........ play --> taob@io.org