Date: Tue, 13 May 2003 13:45:31 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.org> To: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org> Cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage Message-ID: <Pine.NEB.3.96L.1030513133547.72145O-100000@fledge.watson.org> In-Reply-To: <Pine.LNX.4.44.0305130104010.31214-100000@mail.allcaps.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 13 May 2003, Andrew P. Lentvorski, Jr. wrote:
> On Mon, 12 May 2003, Robert Watson wrote:
>
> > (3) Sometimes rpc.lockd on 5.x acting as a server gets really confused
> > when you mix local and remote locks. I haven't quite figured out the
> > circumstances, but occasionally I run into a situation where a client
> > contends against an existing lock on the server, and the client never
> > receives a notification from the server that the lock has been
> > released. It looks like the server stores state that the lock is
> > contended, but perhaps never properly re-polls the kernel to see if
> > the lock has been locally re-released:
>
> I just looked at the code again. rpc.lockd does not spawn off extra
> processes to continuously poll the kernel. It assumes that it has control
> of the underlying file and only rechecks the blockedlocklist when it
> receives and grants an NFS file unlock.
>
> Consequently, contention on the hardware needs to actually cause a *fail*
> and not queue up a lock for later. Currently, it returns a fail but
> still executes add_blockingfilelock. The offending code in lockd_lock.c
> is:
<...>
> A possible fix should be:
<...>
> This should cause the server to return nlm4_denied and the client should
> eventually retry the lock rather than waiting on the server.
>
> CAUTION! I haven't checked or compiled this code. If folks need me to,
> I can, but it will be a couple of days as I don't have two machines
> handy that I can install -CURRENT on and set up NFS.
The code actually compiles fine, and even runs :-). I now reliably get
EACCES for blocking and non-blocking lock requests on the client when
contending against a server lock. Here are the cases:
(1) Client attempts blocking and non-blocking O_EXLOCK on open,
uncontended:
crash1:/tmp> ./locktest nocreate openlock block noflock test 1
sleep 1
crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1
sleep 1
Log entries on client:
May 13 13:38:24 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:38:25 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 13:38:25 crash1 rpc.lockd: process 596: No such process
May 13 13:38:38 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:38:39 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 13:38:39 crash1 rpc.lockd: process 597: No such process
Note odd ESRCH at the end, although things appear to operate fine in
the test program.
(2) Client attempts blocking and non-blocking O_EXLOCK on open, contended
against a server exclusive lock:
crash1:/tmp> ./locktest nocreate openlock block noflock test 1
open: Permission denied
crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1
open: Permission denied
May 13 13:40:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:40:57 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
So the client isn't retrying, or mapping errors right after this patch,
but the failure modes are more consistent and I seem not to be getting any
interminable hangs anymore on the client.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org Network Associates Laboratories
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030513133547.72145O-100000>
