Date: Thu, 13 Aug 1998 11:34:00 -0700 (PDT) From: peter@sirius.com To: mrcpu@internetcds.com (Jaye Mathisen) Cc: hackers@FreeBSD.ORG, stable@FreeBSD.ORG Subject: Re: vmopar state in 2.2.7? Message-ID: <199808131834.LAA14961@staff.sirius.com> In-Reply-To: <Pine.NEB.3.95.980813010207.8849B-100000@schizo.cdsnet.net> from Jaye Mathisen at "Aug 13, 98 01:06:02 am"
next in thread | previous in thread | raw e-mail | index | archive | help
>
>
> I'm having a problem with my INN 2.1 newsreader machines NFS mounting
> the spool.
>
> The nnrpd's are occasionally getting stuck in what top shows as
> the vmopar state. ps shows the process in Ds state.
>
>
> No kill (obviously) will get it unstuck, and nothing else I do seems to
> make it come back to life.
>
> The NFS server is a Network Appliance, running latest released code,
> UDP mounts, v2 NFS.
>
> Any tip appreciated.
>
We worked around a similar problem (processes left immortal, here in
the context of several processes [httpd] writing to the same NFS mounted
file [http log file]) by adjusting the timeout value from 0 (never) to
2 * hz (2 seconds). Details are posted as follow-up to kern/4588 in
FreeBSD.org's gnats problem report database.
It looks like other parts of the kernel (here the vm subsystem) suffer
similar problems. It appears to me that an overly optimistic use of
tsleep() with both, interrupts disabled and time-out set to infinity,
leaves immortal yet paralyzed processes around.
>From /usr/src/sys/vm/vm_object.c (a second, similar occurence around
line 1261):
1218 /*
1219 * The busy flags are only cleared at
1220 * interrupt -- minimize the spl transitions
1221 */
1222 if ((p->flags & PG_BUSY) || p->busy) {
1223 s = splvm();
1224 if ((p->flags & PG_BUSY) || p->busy) {
1225 p->flags |= PG_WANTED;
1226 tsleep(p, PVM, "vmopar", 0);
1227 splx(s);
1228 goto again;
1229 }
1230 splx(s);
1231 }
The code in line 1224 checks a condition to see whether somebody else
is already performing an operation on object p; in this case it wants
to ensure that a wakeup() for the following tsleep() is delivered by
setting a flag in line 1225.
But what ensures that the world did not change between lines 1224 and
1225? Could the wakeup() happen after 1224 has determined to issue
the tsleep() but before the flagging in 1225 was registered? Then it
would be missed. Is this a race condition biting heavily hit machines?
Try changing lines 1226 and 1261 to something like:
tsleep(p, PVM, "vmopar", 5 * hz);
>From the tsleep man page:
Tsleep is the general sleep call. Suspends the current process until a
wakeup is performed on the specified identifier. The process will then
be made runnable with the specified priority. Sleeps at most timo / hz
seconds (0 means no timeout). If pri includes the PCATCH flag, signals
are checked before and after sleeping, else signals are not checked. Re-
turns 0 if awakened, EWOULDBLOCK if the timeout expires. If PCATCH is
set and a signal needs to be delivered, ERESTART is returned if the cur-
rent system call should be restarted if possible, and EINTR is returned
if the system call should be interrupted by the signal (return EINTR).
This function would return "EWOULDBLOCK" after the time-out expires then,
no clue what that will do to your system or apps ;) -- I would expect the
blocked process to go away within 5 seconds...
Peter Preuss
Sirius Connections, San Francisco
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808131834.LAA14961>
