Date: Thu, 13 Aug 1998 12:59:45 -0700 From: Mike Smith <mike@smith.net.au> To: peter@sirius.com Cc: mrcpu@internetcds.com (Jaye Mathisen), hackers@FreeBSD.ORG, stable@FreeBSD.ORG Subject: Re: vmopar state in 2.2.7? Message-ID: <199808131959.MAA00604@dingo.cdrom.com> In-Reply-To: Your message of "Thu, 13 Aug 1998 11:34:00 PDT." <199808131834.LAA14961@staff.sirius.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>
> We worked around a similar problem (processes left immortal, here in
> the context of several processes [httpd] writing to the same NFS mounted
> file [http log file]) by adjusting the timeout value from 0 (never) to
> 2 * hz (2 seconds). Details are posted as follow-up to kern/4588 in
> FreeBSD.org's gnats problem report database.
As this is an NFS-related issue, you should follow this up with Poul
Henning (phk@freebsd.org). I understand he's working on NFS amongst
other things at the moment (I know he's working on FreeBSD for us, as
he keeps sending us invoices... 8).
> It looks like other parts of the kernel (here the vm subsystem) suffer
> similar problems. It appears to me that an overly optimistic use of
> tsleep() with both, interrupts disabled and time-out set to infinity,
> leaves immortal yet paralyzed processes around.
I don't think you mean interrupts disabled.
> >From /usr/src/sys/vm/vm_object.c (a second, similar occurence around
> line 1261):
>
> 1218 /*
> 1219 * The busy flags are only cleared at
> 1220 * interrupt -- minimize the spl transitions
> 1221 */
> 1222 if ((p->flags & PG_BUSY) || p->busy) {
> 1223 s = splvm();
> 1224 if ((p->flags & PG_BUSY) || p->busy) {
> 1225 p->flags |= PG_WANTED;
> 1226 tsleep(p, PVM, "vmopar", 0);
> 1227 splx(s);
> 1228 goto again;
> 1229 }
> 1230 splx(s);
> 1231 }
>
> The code in line 1224 checks a condition to see whether somebody else
> is already performing an operation on object p; in this case it wants
> to ensure that a wakeup() for the following tsleep() is delivered by
> setting a flag in line 1225.
>
> But what ensures that the world did not change between lines 1224 and
> 1225? Could the wakeup() happen after 1224 has determined to issue
> the tsleep() but before the flagging in 1225 was registered? Then it
> would be missed. Is this a race condition biting heavily hit machines?
It shouldn't. the splvm() call should mask vm-related activities from
its return through to the call to tsleep (where the mask is saved and
the mask for the new context is restored). There is a risk that the
assumption in the comment is invalid; you would want to look for any
likely operations involving PG_BUSY.
To track this one further, you would want to look at the code that's
responsible for for dealing with pages with PG_WANTED set, and work out
why it's never satisfying this request (or if it is, why it's not
waking the above caller up).
> Try changing lines 1226 and 1261 to something like:
> tsleep(p, PVM, "vmopar", 5 * hz);
...
> This function would return "EWOULDBLOCK" after the time-out expires then,
> no clue what that will do to your system or apps ;) -- I would expect the
> blocked process to go away within 5 seconds...
I dont' have 2.2 sources to hand, and the above is now just a call to
vm_page_sleep, but if the timeout expires, the entire operation is
retried, so it should be harmless (although it is masking a legitimate
bug).
This might be a candidate for a bandaid patch for 2.2 systems, as 2.2
goes into life-support mode.
BTW, thanks for looking at this at all, and thanks for making your
findings generally known. If you can roll a patch and put it out for
general testing, we'd be very interested in hearing about the feedback
you get.
--
\\ Sometimes you're ahead, \\ Mike Smith
\\ sometimes you're behind. \\ mike@smith.net.au
\\ The race is long, and in the \\ msmith@freebsd.org
\\ end it's only with yourself. \\ msmith@cdrom.com
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808131959.MAA00604>
