Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jul 1996 16:19:46 -0700 (PDT)
From:      Thor Clark <thor@tab012.tabula.com>
To:        "T. William Wells" <bill@twwells.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: system hangs? after resetting rtq_reallyold
Message-ID:  <Pine.SOL.3.91.960717153605.4678A-100000@tab012.tabula.com>
In-Reply-To: <4sg66v$9e@twwells.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 16 Jul 1996, T. William Wells wrote:
> In article <Pine.SOL.3.91.960716011021.11491B-100000@tab012.tabula.com>,
> Thor Clark  <thor@tab012.tabula.com> wrote:
> : Am I out of space for new processes, or out of mbufs, or?
> 
> You may also have user processes that group without limit.  See
> if you can get a systat or other *stat running on a virtual
> console when the problem occurs; it might show interesting
> results.

Thanks for the pointers -  
I can't seem to shake this one - I've combed the mailing list, and found 
3-4 others who've had exactly the same problem, but haven't seen a 
solution anywhere, and no bug reported - I'll put as much information as I 
can here - am I missing something simple?.  My apologies for the length of 
this post.   Any ideas,tests cheerfully tried.

problem:  After a few hours of activity, system will not start new 
processes of any kind. It will respond to pings, but that's about all - 
no telnet, login, http, etc.  
Background and interactive processes continue to run, but killing off 
interactive processes does not have any effect on the system - the only 
recourse is to physically reboot. This is now happening every ~3 hours.

No kernel panic occurs, and no kernel messages are ever logged.  The system 
has never recovered on its own ( down once for > 12 hours).

system: 2.1 Release (from cd) -  kernel recompiled, installed with

maxusers        128
options         "NMBCLUSTERS=2048"
options         "OPEN_MAX=256"
options         "CHILD_MAX=256"

16M, IDE, ASUS P55TP4, 3C509 ethernet



runs CERN3.0 httpd, sends out a lot of mail, a few minor 
background processes, and a lot of short-term, cpu intensive scripts

I ran an 
fstat | wc -l
a few seconds before the last lockup, it returned 347.  Generally this is 
about 200-250.  Is this high?

I'll try pretty much anything at this point...

Thanks
-Thor Clark    (logs below)

Some data:

from (sysctl -a)
kern.maxvnodes = 2813
kern.maxproc = 2068
kern.maxfiles = 4136
kern.maxfilesperproc = 4136
kern.maxprocperuid = 2067

(top) - while machine is locked  (I've never seen load > 2 when system is ok) 
load averages:   5.70,  3.08,  1.64                       16:56:53
51 processes:  1 running, 45 sleeping, 1 stopped, 4 zombie
Cpu states:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
Memory: 7552K Active, 2668K Inact, 2708K Wired, 1104K Cache, 64K Free
Swap:   82M Total, 63M Free, 24% Inuse  

(vmstat -w 5) - while machine is locked
 procs   memory     page                    disks      faults      cpu
 r b w   avm   fre  flt  re  pi  po  fr  sr f0 w0 w2   in   sy  cs us sy id
 5 9 0 31328   608    0   0   0   0   0   0  0  0  0  229   13   3  0  0 100
 610 0 55660  1484   23   2   9  31  14 1784  0 40  0  902 1808  62  4 12 84
 610 0 55660  1476    1   0   0   0   0   0  0  0  0  287 5163  59  8 15 77
 610 0 55660  1384    5   1   2   0   4   0  0  2  0  273   38   7  0  1 99
 711 0 51820  1020   20   0   5   0  11   0  0  5  0  371   24  10  0  2 97
 712 0 51328   904   10   0   2   0   6   0  0  4  2  331   13   6  0  2 98
 712 0 44692   904    0   0   0   0   0   0  0  0  0  235   11   3  0  1 99
 712 0 40436   904    0   0   0   0   0   0  0  0  0  229   11   3  0  0 100
 713 0 35856   908    1   0   0   0   0   0  0  1  0  241   15   4  0  0 100
 713 0 31476   860   17   3   6   0  26   0  0 13  0  417   75  20  0  3 97
 714 0 35696  1060    4   0   2   0  15   0  0  3  0  272   22   7  0  2 98
 
(iostat -w 1) - while machine is locked
tin tout sps tps msps  sps tps msps  sps tps msps  us ni sy in id
 355  753   0   0  0.0    0   0  0.0   16   1  0.0   3  0 10  0 87
   0    0   0   0  0.0    0   0  0.0    0   0  0.0   0  0  1  0 99
   0  627   0   0  0.0    0   0  0.0    0   0  0.0   0  0  1  0 99
   0    0   0   0  0.0    0   0  0.0    0   0  0.0   0  0  0  0100
   0  141   0   0  0.0    0   0  0.0    0   0  0.0   1  0  0  1 98

(tout #'s are characters written by top)


So, near as I can tell, everything slows down, and no new processes are 
created... don't know what else to try.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.91.960717153605.4678A-100000>