FreeBSD Mail Archives

Date:      Tue, 13 May 2003 13:45:31 -0400 (EDT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        "Andrew P. Lentvorski, Jr." <bsder@allcaps.org>
Cc:        current@FreeBSD.org
Subject:   Re: rpc.lockd spinning; much breakage
Message-ID:  <Pine.NEB.3.96L.1030513133547.72145O-100000@fledge.watson.org>
In-Reply-To: <Pine.LNX.4.44.0305130104010.31214-100000@mail.allcaps.org>



On Tue, 13 May 2003, Andrew P. Lentvorski, Jr. wrote:

> On Mon, 12 May 2003, Robert Watson wrote:
> 
> > (3) Sometimes rpc.lockd on 5.x acting as a server gets really confused
> >     when you mix local and remote locks.  I haven't quite figured out the
> >     circumstances, but occasionally I run into a situation where a client
> >     contends against an existing lock on the server, and the client never
> >     receives a notification from the server that the lock has been
> >     released.  It looks like the server stores state that the lock is
> >     contended, but perhaps never properly re-polls the kernel to see if
> >     the lock has been locally re-released:
> 
> I just looked at the code again.  rpc.lockd does not spawn off extra
> processes to continuously poll the kernel.  It assumes that it has control
> of the underlying file and only rechecks the blockedlocklist when it
> receives and grants an NFS file unlock.
> 
> Consequently, contention on the hardware needs to actually cause a *fail* 
> and not queue up a lock for later.  Currently, it returns a fail but 
> still executes add_blockingfilelock.  The offending code in lockd_lock.c 
> is:
<...>
> A possible fix should be:
<...>
> This should cause the server to return nlm4_denied and the client should
> eventually retry the lock rather than waiting on the server. 
> 
> CAUTION!  I haven't checked or compiled this code.  If folks need me to,
> I can, but it will be a couple of days as I don't have two machines
> handy that I can install -CURRENT on and set up NFS. 

The code actually compiles fine, and even runs :-).  I now reliably get
EACCES for blocking and non-blocking lock requests on the client when
contending against a server lock.  Here are the cases:

(1) Client attempts blocking and non-blocking O_EXLOCK on open,
    uncontended:

crash1:/tmp> ./locktest nocreate openlock block noflock test 1
sleep 1
crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1
sleep 1

    Log entries on client:

May 13 13:38:24 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:38:25 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 13:38:25 crash1 rpc.lockd: process 596: No such process

May 13 13:38:38 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1
May 13 13:38:39 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1
May 13 13:38:39 crash1 rpc.lockd: process 597: No such process

    Note odd ESRCH at the end, although things appear to operate fine in
    the test program.

(2) Client attempts blocking and non-blocking O_EXLOCK on open, contended
    against a server exclusive lock:

crash1:/tmp> ./locktest nocreate openlock block noflock test 1
open: Permission denied
crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1
open: Permission denied

May 13 13:40:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1

May 13 13:40:57 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1

So the client isn't retrying, or mapping errors right after this patch,
but the failure modes are more consistent and I seem not to be getting any
interminable hangs anymore on the client. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030513133547.72145O-100000>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation