Date: Thu, 13 Aug 1998 11:34:00 -0700 (PDT) From: peter@sirius.com To: mrcpu@internetcds.com (Jaye Mathisen) Cc: hackers@FreeBSD.ORG, stable@FreeBSD.ORG Subject: Re: vmopar state in 2.2.7? Message-ID: <199808131834.LAA14961@staff.sirius.com> In-Reply-To: <Pine.NEB.3.95.980813010207.8849B-100000@schizo.cdsnet.net> from Jaye Mathisen at "Aug 13, 98 01:06:02 am"
next in thread | previous in thread | raw e-mail | index | archive | help
> > > I'm having a problem with my INN 2.1 newsreader machines NFS mounting > the spool. > > The nnrpd's are occasionally getting stuck in what top shows as > the vmopar state. ps shows the process in Ds state. > > > No kill (obviously) will get it unstuck, and nothing else I do seems to > make it come back to life. > > The NFS server is a Network Appliance, running latest released code, > UDP mounts, v2 NFS. > > Any tip appreciated. > We worked around a similar problem (processes left immortal, here in the context of several processes [httpd] writing to the same NFS mounted file [http log file]) by adjusting the timeout value from 0 (never) to 2 * hz (2 seconds). Details are posted as follow-up to kern/4588 in FreeBSD.org's gnats problem report database. It looks like other parts of the kernel (here the vm subsystem) suffer similar problems. It appears to me that an overly optimistic use of tsleep() with both, interrupts disabled and time-out set to infinity, leaves immortal yet paralyzed processes around. >From /usr/src/sys/vm/vm_object.c (a second, similar occurence around line 1261): 1218 /* 1219 * The busy flags are only cleared at 1220 * interrupt -- minimize the spl transitions 1221 */ 1222 if ((p->flags & PG_BUSY) || p->busy) { 1223 s = splvm(); 1224 if ((p->flags & PG_BUSY) || p->busy) { 1225 p->flags |= PG_WANTED; 1226 tsleep(p, PVM, "vmopar", 0); 1227 splx(s); 1228 goto again; 1229 } 1230 splx(s); 1231 } The code in line 1224 checks a condition to see whether somebody else is already performing an operation on object p; in this case it wants to ensure that a wakeup() for the following tsleep() is delivered by setting a flag in line 1225. But what ensures that the world did not change between lines 1224 and 1225? Could the wakeup() happen after 1224 has determined to issue the tsleep() but before the flagging in 1225 was registered? Then it would be missed. Is this a race condition biting heavily hit machines? Try changing lines 1226 and 1261 to something like: tsleep(p, PVM, "vmopar", 5 * hz); >From the tsleep man page: Tsleep is the general sleep call. Suspends the current process until a wakeup is performed on the specified identifier. The process will then be made runnable with the specified priority. Sleeps at most timo / hz seconds (0 means no timeout). If pri includes the PCATCH flag, signals are checked before and after sleeping, else signals are not checked. Re- turns 0 if awakened, EWOULDBLOCK if the timeout expires. If PCATCH is set and a signal needs to be delivered, ERESTART is returned if the cur- rent system call should be restarted if possible, and EINTR is returned if the system call should be interrupted by the signal (return EINTR). This function would return "EWOULDBLOCK" after the time-out expires then, no clue what that will do to your system or apps ;) -- I would expect the blocked process to go away within 5 seconds... Peter Preuss Sirius Connections, San Francisco To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808131834.LAA14961>