From owner-freebsd-hackers  Thu Jun 15 06:12:35 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id GAA14338
          for hackers-outgoing; Thu, 15 Jun 1995 06:12:35 -0700
Received: from aries.ibms.sinica.edu.tw ([140.109.40.248])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id GAA14317
          for <freebsd-hackers@freebsd.org>; Thu, 15 Jun 1995 06:12:20 -0700
Received: (from taob@localhost) by aries.ibms.sinica.edu.tw (8.6.11/8.6.9) id VAA04453; Thu, 15 Jun 1995 21:12:13 +0800
Date: Thu, 15 Jun 1995 21:12:12 +0800 (CST)
From: Brian Tao <taob@gate.sinica.edu.tw>
To: FREEBSD-HACKERS-L <freebsd-hackers@freebsd.org>
Subject: Too many open files in system
Message-ID: <Pine.BSI.3.91.950615204334.4018G-100000@aries>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: hackers-owner@freebsd.org
Precedence: bulk

    Played around with running various CGI's on the NCSA httpd 1.4
server on one of my FreeBSD 2.0.5 machines this past week.  Test
conditions:  50 clients on the local Ethernet making random requests
for HTML and CGI scripts, with a delay thrown in to simulate slow,
lagged connections.  Good news:  the server was handling 15+ requests
per second.  Bad news:  the machine would lock up not more than 45
minutes after the pounding began.  :(

    John Dyson suggested it might have been an NFS-related problem,
but performing the tests both with an NFS-mounted htdocs/ and a local
htdocs/ directory made little difference.  Everything runs fine (if
not slowly) for the first little while, then all of a sudden, almost
all disk activity stops.  Existing processes still run (e.g., I can
continue to read mail, or switch screens in iscreen) but new ones will
not start (e.g., I cannot get a login prompt when telnetting in).  I
believe it may also involve the pager, since swapped out processes are
not swapped back in (e.g., quitting the mail reader, but not getting
the shell prompt back).

    The machine is still *running*, but practically useless since it
appears the VM system has pretty much locked up.  Makes it rather
difficult to find more details on the problem.  :(  The only thing I
can do is reboot.  During one of the trials, syslog was going nuts
logging this to disk:

Jun 14 12:53:15 aries syslogd: /var/run/utmp: Too many open files in system
Jun 14 12:53:15 aries last message repeated 3 times
Jun 14 12:53:15 aries /kernel: file: table is full
Jun 14 12:53:15 aries syslogd: /var/run/utmp: Too many open files in system
Jun 14 12:53:15 aries last message repeated 3 times
Jun 14 12:53:15 aries /kernel: file: table is full
[...repeat 18-20 times per second...]

    In all cases, the common problem is "too many open files in
system".  The httpd error log shows CGI scripts failing for the same
reason.  Also, I see this in the error log:

[Wed Jun 14 12:51:19 1995] httpd: could not create IPC pipe
[Wed Jun 14 12:52:51 1995] socket error: accept failed

    The first is produced while the server can still run, and the
second appears to occur after everything has died, and is repeated in
the log file at a rate of 200+ per second!!!  This is in pre-forking
mode, if it makes any difference.

    My kernel is compiled with the following options:

options     "NMBCLUSTERS=1024"
options     "CHILD_MAX=128"
options     "OPEN_MAX=256"       <-- does this help?

    The one time I was able to get an "fstat | wc -l" to work, it
showed 1150 files open.  This is with no other users logged on, and X
was not running (essentially in dedicated Web server mode).
-- 
Brian ("Though this be madness, yet there is method in't") Tao
taob@gate.sinica.edu.tw <-- work ........ play --> taob@io.org