From owner-freebsd-hackers  Sun Apr  7 11:20:42 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id LAA24080
          for hackers-outgoing; Sun, 7 Apr 1996 11:20:42 -0700 (PDT)
Received: from zot.io.org (root@zot.io.org [198.133.36.82])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id LAA24072
          for <freebsd-hackers@freebsd.org>; Sun, 7 Apr 1996 11:20:39 -0700 (PDT)
Received: from localhost (taob@localhost) by zot.io.org (8.6.12/8.6.12) with SMTP id OAA24611 for <freebsd-hackers@freebsd.org>; Sun, 7 Apr 1996 14:19:02 -0400
X-Authentication-Warning: zot.io.org: taob owned process doing -bs
Date: Sun, 7 Apr 1996 14:19:01 -0400 (EDT)
From: Brian Tao <taob@io.org>
To: FREEBSD-HACKERS-L <freebsd-hackers@freebsd.org>
Subject: 'ps' or procfs stuck in disk wait???
Message-ID: <Pine.NEB.3.92.960407140423.1573a-100000@zot.io.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

    I came across a weird one today.  I noticed the load on one of our
shell servers was consistently above 1.0 (rare for this machine with
only 50 users on it).  I tried 'ps aux | head' to get a quick listing
of the process chewing up the CPU.  No response, can't ^C or ^Z, can't
kill -9 it from another tty.

    'ps x' and 'ps u' worked fine for listing my own processes, but I
couldn't get a full list with 'ps a'.  I resorted to "top -nu 9999" to
see what was going on.  There was a runaway vi which I killed, but the
problem persisted.  I noticed about three dozen instances of cron, sh,
ps and egrep, all paged out.  They were spawned from a cron job I have
running every five minutes to check on zombie and detached processes.

    I was able to kill off everything except the ps's.  Doing a "ps
auxp" on one of the pid's revealed it was sittin in disk wait.  I then
called "ps auxp" on each of the pid's from the output of 'top'.  It
hung on a pwd_mkdb process (password files here are regenerated from a
master copy every 30 minutes on the shell servers).  According to
'top', the process wasn't using any CPU and it was sleeping.  'ps
would hang whenever I pointed it at that pid.

    I looked inside /proc/1522 (the procfs directory associated with
the pwd_mkdb process) and I was able to cat the status file.
Unfortunately, I didn't save it before it was wiped off my xterm by a
screen clear.  :(  The curious thing is that any read operation on the
"mem" file would hang.  I think this is why 'ps' hangs when trying to
retrieve process information.

    Any ideas why this would happen?  A bug in procfs or the VM
system?  I've never seen anything like this before.  The system will
be rebooting itself in about ten minutes, and I doubt I will be able
to recreate this problem.

    Stock 2.1.0R, 128MB physical, 384MB swap, about 8% allocated when
I discovered this condition... I'm stumped on this one.
--
Brian Tao (BT300, taob@io.org)
System and Network Administrator, Internex Online Inc.
"Though this be madness, yet there is method in't"