From owner-freebsd-hackers Sun Apr 7 11:20:42 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id LAA24080 for hackers-outgoing; Sun, 7 Apr 1996 11:20:42 -0700 (PDT) Received: from zot.io.org (root@zot.io.org [198.133.36.82]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id LAA24072 for ; Sun, 7 Apr 1996 11:20:39 -0700 (PDT) Received: from localhost (taob@localhost) by zot.io.org (8.6.12/8.6.12) with SMTP id OAA24611 for ; Sun, 7 Apr 1996 14:19:02 -0400 X-Authentication-Warning: zot.io.org: taob owned process doing -bs Date: Sun, 7 Apr 1996 14:19:01 -0400 (EDT) From: Brian Tao To: FREEBSD-HACKERS-L Subject: 'ps' or procfs stuck in disk wait??? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I came across a weird one today. I noticed the load on one of our shell servers was consistently above 1.0 (rare for this machine with only 50 users on it). I tried 'ps aux | head' to get a quick listing of the process chewing up the CPU. No response, can't ^C or ^Z, can't kill -9 it from another tty. 'ps x' and 'ps u' worked fine for listing my own processes, but I couldn't get a full list with 'ps a'. I resorted to "top -nu 9999" to see what was going on. There was a runaway vi which I killed, but the problem persisted. I noticed about three dozen instances of cron, sh, ps and egrep, all paged out. They were spawned from a cron job I have running every five minutes to check on zombie and detached processes. I was able to kill off everything except the ps's. Doing a "ps auxp" on one of the pid's revealed it was sittin in disk wait. I then called "ps auxp" on each of the pid's from the output of 'top'. It hung on a pwd_mkdb process (password files here are regenerated from a master copy every 30 minutes on the shell servers). According to 'top', the process wasn't using any CPU and it was sleeping. 'ps would hang whenever I pointed it at that pid. I looked inside /proc/1522 (the procfs directory associated with the pwd_mkdb process) and I was able to cat the status file. Unfortunately, I didn't save it before it was wiped off my xterm by a screen clear. :( The curious thing is that any read operation on the "mem" file would hang. I think this is why 'ps' hangs when trying to retrieve process information. Any ideas why this would happen? A bug in procfs or the VM system? I've never seen anything like this before. The system will be rebooting itself in about ten minutes, and I doubt I will be able to recreate this problem. Stock 2.1.0R, 128MB physical, 384MB swap, about 8% allocated when I discovered this condition... I'm stumped on this one. -- Brian Tao (BT300, taob@io.org) System and Network Administrator, Internex Online Inc. "Though this be madness, yet there is method in't"