Date: Thu, 13 Aug 1998 17:15:44 -0700 From: David Greenman <dg@root.com> To: peter@sirius.com Cc: mrcpu@internetcds.com (Jaye Mathisen), hackers@FreeBSD.ORG, stable@FreeBSD.ORG Subject: Re: vmopar state in 2.2.7? Message-ID: <199808140015.RAA17635@implode.root.com> In-Reply-To: Your message of "Thu, 13 Aug 1998 11:34:00 PDT." <199808131834.LAA14961@staff.sirius.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>We worked around a similar problem (processes left immortal, here in >the context of several processes [httpd] writing to the same NFS mounted >file [http log file]) by adjusting the timeout value from 0 (never) to >2 * hz (2 seconds). Details are posted as follow-up to kern/4588 in >FreeBSD.org's gnats problem report database. > >It looks like other parts of the kernel (here the vm subsystem) suffer >similar problems. It appears to me that an overly optimistic use of >tsleep() with both, interrupts disabled and time-out set to infinity, >leaves immortal yet paralyzed processes around. No, there's just a missing or unprotected wakeup() somewhere. >>From /usr/src/sys/vm/vm_object.c (a second, similar occurence around >line 1261): > > 1218 /* > 1219 * The busy flags are only cleared at > 1220 * interrupt -- minimize the spl transitions > 1221 */ > 1222 if ((p->flags & PG_BUSY) || p->busy) { > 1223 s = splvm(); > 1224 if ((p->flags & PG_BUSY) || p->busy) { > 1225 p->flags |= PG_WANTED; > 1226 tsleep(p, PVM, "vmopar", 0); > 1227 splx(s); > 1228 goto again; > 1229 } > 1230 splx(s); > 1231 } > >The code in line 1224 checks a condition to see whether somebody else >is already performing an operation on object p; in this case it wants >to ensure that a wakeup() for the following tsleep() is delivered by >setting a flag in line 1225. > >But what ensures that the world did not change between lines 1224 and >1225? Could the wakeup() happen after 1224 has determined to issue >the tsleep() but before the flagging in 1225 was registered? Then it >would be missed. Is this a race condition biting heavily hit machines? No. The wakeup occurs as a function of IO rundown which occurs in an interrupt context. The purpose of splvm() is to block interrupts to prevent the race condition. >Try changing lines 1226 and 1261 to something like: > tsleep(p, PVM, "vmopar", 5 * hz); > >>From the tsleep man page: > > Tsleep is the general sleep call. Suspends the current process until a > wakeup is performed on the specified identifier. The process will then > be made runnable with the specified priority. Sleeps at most timo / hz > seconds (0 means no timeout). If pri includes the PCATCH flag, signals > are checked before and after sleeping, else signals are not checked. Re- > turns 0 if awakened, EWOULDBLOCK if the timeout expires. If PCATCH is > set and a signal needs to be delivered, ERESTART is returned if the cur- > rent system call should be restarted if possible, and EINTR is returned > if the system call should be interrupted by the signal (return EINTR). > >This function would return "EWOULDBLOCK" after the time-out expires then, >no clue what that will do to your system or apps ;) -- I would expect the >blocked process to go away within 5 seconds... It would do bad things. There's a bug, but not there and that isn't the fix. I think this is another manifestation of the lack of NFSnode locking in the kernel, but that's just a guess. -DG David Greenman Co-founder/Principal Architect, The FreeBSD Project To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140015.RAA17635>