Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Mar 1995 11:30:03 -0800
From:      Jin Mazumdar <mazumdar@evita.cs.fredonia.edu>
To:        freebsd-bugs
Subject:   kern/268: Machine locaks up after an extended period of intense disk use.
Message-ID:  <199503221930.LAA12599@freefall.cdrom.com>
In-Reply-To: Your message of Wed, 22 Mar 1995 14:20:22 -0500 <199503221920.OAA01587@evita.cs.fredonia.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

>Number:         268
>Category:       kern
>Synopsis:       Machine locaks up after an extended period of intense disk use.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs (FreeBSD bugs mailing list)
>State:          open
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 22 11:30:01 1995
>Originator:     Jin Mazumdar
>Organization:
Jin Mazumdar        

(internet:) mazumdar@cs.fredonia.edu

  >>> Dept. Of Math and C. S.                          <<<
  >>> State University of New York College at Fredonia <<<    
  >>> Fredonia, N.Y. 14063         (716) 673 3459      <<<
 
>Release:        FreeBSD 2.1.0-Development 950210
>Environment:

Hardware:
	Pentium 90 (Insight Premiere PCI II)
	32 Mb memory (128 k swap)
        Buslogic 946C SCSI controller
	2 x Seagate 2GB SCSI drives
	3c509B ethernet card

>Description:
	
Before, I talk about the bug let me say that 2.0-950210 was
the easiest to install and is the most robust amongst all its 
predecessors.  Great job.  The OS worked great till I decided to really
stress test it.  The best improvement for our site is that the ethernet
card does not hang which was the biggest problem with 1.1.5.1 (3c509).



        I am running a news system with a full newsfeed.  Machine ends 
	up becoming locked.  Disk drives seem to get "stuck" and 
	sometimes I need to disconnect the SCSI cable or power to the 
	drive to boot again.

	When this happens programs that are already running still work.
	Systat/vmstat reports an enormous number of "d" (in disk
	other than paging) processes.

	I have been observing the system closely for some time and I
	think the following happens before the disks freeze up.  Notice
	process 1639 towards the bottom of the following list.  This
	process remains in this state without ever changing its state.
	This is when the trouble starts.  Disks are still accesible at
	this point but if nothing is done the system slowly grinds to
	a halt.

	A shutdown at this point says that the process could not be
	killed and if the system is halted it fails to sync.

	Any advice would be appreciated.  I would be happy to provide
	more data if you could let me know what you need.

  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0     0     0   0 -18  0     0    0 sched  DLs  ??    0:00.00 (swapper)
    0     1     0   1  10  0   400  180 wait   Is   ??    0:00.03 /sbin/init --
    0     2     0   0 -18  0     0   12 psleep DL   ??    0:00.00 (pagedaemon)
    0     3     0   0  28  0     0   12 psleep DL   ??    0:00.00 (vmdaemon)
    0     4     0   1  -6  0     0   12 biowai DL   ??    0:26.50 (update)
    0    22     1  86  18  0   200   80 pause  Is   ??    0:00.01 adjkerntz -i 
    0    51     1   0   2  0   184  328 select Ss   ??    0:00.26 syslogd 
    0    63     1   0  18  0   280  340 pause  Is   ??    0:00.40 cron 
    1    66     1  32   2  0   176  272 select Is   ??    0:00.01 portmap 
    0    70     1   0   2  0   184  264 select Is   ??    0:00.16 routed -q 
    0    77     1   0   2  0   164  240 netio  Ss   ??    0:00.91 rwhod 
    0    79     1  97   2  0   200  308 select Is   ??    0:00.05 lpd 
    0    86     1  37   2  0   412  160 select Is   ??    0:00.01 mountd 
    0    88     1  75   2  0   224   88 netcon Is   ??    0:00.01 nfsd-master (
    0    90    88   0   2  0   216   52 nfsd   I    ??    0:05.52 nfsd-srv (nfs
    0    91    88  75   2  0   216   52 nfsd   I    ??    0:00.00 nfsd-srv (nfs
    0    93    88  75   2  0   216   52 nfsd   I    ??    0:00.00 nfsd-srv (nfs
    0    94    88   0   2  0   216   52 nfsd   I    ??    0:00.00 nfsd-srv (nfs
    0    98     1  98  10  0   208   28 nfsidl I    ??    0:00.00 nfsiod -n 4 
    0    99     1  98  10  0   208   28 nfsidl I    ??    0:00.00 nfsiod -n 4 
    0   100     1  98  10  0   208   28 nfsidl I    ??    0:00.00 nfsiod -n 4 
    0   101     1  99  10  0   208   28 nfsidl I    ??    0:00.00 nfsiod -n 4 
    0   103     1   0   2  0   412  348 netcon Is   ??    0:00.05 sendmail: acc
    0   106     1   0   2  0   224  292 select Is   ??    0:00.14 inetd 
    0   921   106   0 -14  0   920 1156 ufslk2 D    ??    0:29.37 -penny.cs.fre
    0  1728    63   0   2  0   280  200 netio  I    ??    0:00.01 CRON (cron)
    6  1730  1728   0  10  0   428  248 wait   Is   ??    0:00.02 /bin/sh -c /u
    6  1732  1730   5  -6  0  5764 4976 biowai D    ??    0:36.30 /usr/libexec/
    0   126     1   0   3  0   536  368 ttyin  Is+  v0    0:00.43 -csh (csh)
    6   127     1   1   3  0   560  476 ttyin  Is+  v1    0:02.94 -csh (csh)
    6   150   127  80  10  0   468  280 wait   I    v1    0:01.82 /bin/sh /news
    6  1639   150   2  -6  0  1264 1492 getblk D    v1    0:19.55 relaynews -c 
  100   128     1   0  18  0   432  304 pause  Ss   v2    0:00.91 -csh (csh)
    0  1747   128   1  28  0   436  220 -      R+   v2    0:00.02 ps -alx 
  100   129     1   0  18  0   432  300 pause  Is   v3    0:00.23 -csh (csh)
  100   452   129   0   3  0   504  908 ttyin  S+   v3    0:22.53 systat 



>How-To-Repeat:

	

>Fix:
	
	

>Audit-Trail:
>Unformatted:






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503221930.LAA12599>