Date: Sun, 7 Apr 1996 14:19:01 -0400 (EDT) From: Brian Tao <taob@io.org> To: FREEBSD-HACKERS-L <freebsd-hackers@freebsd.org> Subject: 'ps' or procfs stuck in disk wait??? Message-ID: <Pine.NEB.3.92.960407140423.1573a-100000@zot.io.org>
next in thread | raw e-mail | index | archive | help
I came across a weird one today. I noticed the load on one of our
shell servers was consistently above 1.0 (rare for this machine with
only 50 users on it). I tried 'ps aux | head' to get a quick listing
of the process chewing up the CPU. No response, can't ^C or ^Z, can't
kill -9 it from another tty.
'ps x' and 'ps u' worked fine for listing my own processes, but I
couldn't get a full list with 'ps a'. I resorted to "top -nu 9999" to
see what was going on. There was a runaway vi which I killed, but the
problem persisted. I noticed about three dozen instances of cron, sh,
ps and egrep, all paged out. They were spawned from a cron job I have
running every five minutes to check on zombie and detached processes.
I was able to kill off everything except the ps's. Doing a "ps
auxp" on one of the pid's revealed it was sittin in disk wait. I then
called "ps auxp" on each of the pid's from the output of 'top'. It
hung on a pwd_mkdb process (password files here are regenerated from a
master copy every 30 minutes on the shell servers). According to
'top', the process wasn't using any CPU and it was sleeping. 'ps
would hang whenever I pointed it at that pid.
I looked inside /proc/1522 (the procfs directory associated with
the pwd_mkdb process) and I was able to cat the status file.
Unfortunately, I didn't save it before it was wiped off my xterm by a
screen clear. :( The curious thing is that any read operation on the
"mem" file would hang. I think this is why 'ps' hangs when trying to
retrieve process information.
Any ideas why this would happen? A bug in procfs or the VM
system? I've never seen anything like this before. The system will
be rebooting itself in about ten minutes, and I doubt I will be able
to recreate this problem.
Stock 2.1.0R, 128MB physical, 384MB swap, about 8% allocated when
I discovered this condition... I'm stumped on this one.
--
Brian Tao (BT300, taob@io.org)
System and Network Administrator, Internex Online Inc.
"Though this be madness, yet there is method in't"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.92.960407140423.1573a-100000>
