From owner-freebsd-bugs Wed Mar 22 11:30:05 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id LAA12606 for bugs-outgoing; Wed, 22 Mar 1995 11:30:05 -0800 Received: (from gnats@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id LAA12599; Wed, 22 Mar 1995 11:30:03 -0800 Date: Wed, 22 Mar 1995 11:30:03 -0800 Message-Id: <199503221930.LAA12599@freefall.cdrom.com> From: Jin Mazumdar Reply-To: Jin Mazumdar To: freebsd-bugs Subject: kern/268: Machine locaks up after an extended period of intense disk use. In-Reply-To: Your message of Wed, 22 Mar 1995 14:20:22 -0500 <199503221920.OAA01587@evita.cs.fredonia.edu> Sender: bugs-owner@FreeBSD.org Precedence: bulk >Number: 268 >Category: kern >Synopsis: Machine locaks up after an extended period of intense disk use. >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs (FreeBSD bugs mailing list) >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Mar 22 11:30:01 1995 >Originator: Jin Mazumdar >Organization: Jin Mazumdar (internet:) mazumdar@cs.fredonia.edu >>> Dept. Of Math and C. S. <<< >>> State University of New York College at Fredonia <<< >>> Fredonia, N.Y. 14063 (716) 673 3459 <<< >Release: FreeBSD 2.1.0-Development 950210 >Environment: Hardware: Pentium 90 (Insight Premiere PCI II) 32 Mb memory (128 k swap) Buslogic 946C SCSI controller 2 x Seagate 2GB SCSI drives 3c509B ethernet card >Description: Before, I talk about the bug let me say that 2.0-950210 was the easiest to install and is the most robust amongst all its predecessors. Great job. The OS worked great till I decided to really stress test it. The best improvement for our site is that the ethernet card does not hang which was the biggest problem with 1.1.5.1 (3c509). I am running a news system with a full newsfeed. Machine ends up becoming locked. Disk drives seem to get "stuck" and sometimes I need to disconnect the SCSI cable or power to the drive to boot again. When this happens programs that are already running still work. Systat/vmstat reports an enormous number of "d" (in disk other than paging) processes. I have been observing the system closely for some time and I think the following happens before the disks freeze up. Notice process 1639 towards the bottom of the following list. This process remains in this state without ever changing its state. This is when the trouble starts. Disks are still accesible at this point but if nothing is done the system slowly grinds to a halt. A shutdown at this point says that the process could not be killed and if the system is halted it fails to sync. Any advice would be appreciated. I would be happy to provide more data if you could let me know what you need. UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 0 0 0 0 -18 0 0 0 sched DLs ?? 0:00.00 (swapper) 0 1 0 1 10 0 400 180 wait Is ?? 0:00.03 /sbin/init -- 0 2 0 0 -18 0 0 12 psleep DL ?? 0:00.00 (pagedaemon) 0 3 0 0 28 0 0 12 psleep DL ?? 0:00.00 (vmdaemon) 0 4 0 1 -6 0 0 12 biowai DL ?? 0:26.50 (update) 0 22 1 86 18 0 200 80 pause Is ?? 0:00.01 adjkerntz -i 0 51 1 0 2 0 184 328 select Ss ?? 0:00.26 syslogd 0 63 1 0 18 0 280 340 pause Is ?? 0:00.40 cron 1 66 1 32 2 0 176 272 select Is ?? 0:00.01 portmap 0 70 1 0 2 0 184 264 select Is ?? 0:00.16 routed -q 0 77 1 0 2 0 164 240 netio Ss ?? 0:00.91 rwhod 0 79 1 97 2 0 200 308 select Is ?? 0:00.05 lpd 0 86 1 37 2 0 412 160 select Is ?? 0:00.01 mountd 0 88 1 75 2 0 224 88 netcon Is ?? 0:00.01 nfsd-master ( 0 90 88 0 2 0 216 52 nfsd I ?? 0:05.52 nfsd-srv (nfs 0 91 88 75 2 0 216 52 nfsd I ?? 0:00.00 nfsd-srv (nfs 0 93 88 75 2 0 216 52 nfsd I ?? 0:00.00 nfsd-srv (nfs 0 94 88 0 2 0 216 52 nfsd I ?? 0:00.00 nfsd-srv (nfs 0 98 1 98 10 0 208 28 nfsidl I ?? 0:00.00 nfsiod -n 4 0 99 1 98 10 0 208 28 nfsidl I ?? 0:00.00 nfsiod -n 4 0 100 1 98 10 0 208 28 nfsidl I ?? 0:00.00 nfsiod -n 4 0 101 1 99 10 0 208 28 nfsidl I ?? 0:00.00 nfsiod -n 4 0 103 1 0 2 0 412 348 netcon Is ?? 0:00.05 sendmail: acc 0 106 1 0 2 0 224 292 select Is ?? 0:00.14 inetd 0 921 106 0 -14 0 920 1156 ufslk2 D ?? 0:29.37 -penny.cs.fre 0 1728 63 0 2 0 280 200 netio I ?? 0:00.01 CRON (cron) 6 1730 1728 0 10 0 428 248 wait Is ?? 0:00.02 /bin/sh -c /u 6 1732 1730 5 -6 0 5764 4976 biowai D ?? 0:36.30 /usr/libexec/ 0 126 1 0 3 0 536 368 ttyin Is+ v0 0:00.43 -csh (csh) 6 127 1 1 3 0 560 476 ttyin Is+ v1 0:02.94 -csh (csh) 6 150 127 80 10 0 468 280 wait I v1 0:01.82 /bin/sh /news 6 1639 150 2 -6 0 1264 1492 getblk D v1 0:19.55 relaynews -c 100 128 1 0 18 0 432 304 pause Ss v2 0:00.91 -csh (csh) 0 1747 128 1 28 0 436 220 - R+ v2 0:00.02 ps -alx 100 129 1 0 18 0 432 300 pause Is v3 0:00.23 -csh (csh) 100 452 129 0 3 0 504 908 ttyin S+ v3 0:22.53 systat >How-To-Repeat: >Fix: >Audit-Trail: >Unformatted: