Date: Tue, 13 May 2003 13:45:31 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.org> To: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org> Cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage Message-ID: <Pine.NEB.3.96L.1030513133547.72145O-100000@fledge.watson.org> In-Reply-To: <Pine.LNX.4.44.0305130104010.31214-100000@mail.allcaps.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 13 May 2003, Andrew P. Lentvorski, Jr. wrote: > On Mon, 12 May 2003, Robert Watson wrote: > > > (3) Sometimes rpc.lockd on 5.x acting as a server gets really confused > > when you mix local and remote locks. I haven't quite figured out the > > circumstances, but occasionally I run into a situation where a client > > contends against an existing lock on the server, and the client never > > receives a notification from the server that the lock has been > > released. It looks like the server stores state that the lock is > > contended, but perhaps never properly re-polls the kernel to see if > > the lock has been locally re-released: > > I just looked at the code again. rpc.lockd does not spawn off extra > processes to continuously poll the kernel. It assumes that it has control > of the underlying file and only rechecks the blockedlocklist when it > receives and grants an NFS file unlock. > > Consequently, contention on the hardware needs to actually cause a *fail* > and not queue up a lock for later. Currently, it returns a fail but > still executes add_blockingfilelock. The offending code in lockd_lock.c > is: <...> > A possible fix should be: <...> > This should cause the server to return nlm4_denied and the client should > eventually retry the lock rather than waiting on the server. > > CAUTION! I haven't checked or compiled this code. If folks need me to, > I can, but it will be a couple of days as I don't have two machines > handy that I can install -CURRENT on and set up NFS. The code actually compiles fine, and even runs :-). I now reliably get EACCES for blocking and non-blocking lock requests on the client when contending against a server lock. Here are the cases: (1) Client attempts blocking and non-blocking O_EXLOCK on open, uncontended: crash1:/tmp> ./locktest nocreate openlock block noflock test 1 sleep 1 crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1 sleep 1 Log entries on client: May 13 13:38:24 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 13:38:25 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1 May 13 13:38:25 crash1 rpc.lockd: process 596: No such process May 13 13:38:38 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 13:38:39 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1 May 13 13:38:39 crash1 rpc.lockd: process 597: No such process Note odd ESRCH at the end, although things appear to operate fine in the test program. (2) Client attempts blocking and non-blocking O_EXLOCK on open, contended against a server exclusive lock: crash1:/tmp> ./locktest nocreate openlock block noflock test 1 open: Permission denied crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1 open: Permission denied May 13 13:40:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 13:40:57 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 So the client isn't retrying, or mapping errors right after this patch, but the failure modes are more consistent and I seem not to be getting any interminable hangs anymore on the client. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030513133547.72145O-100000>