From owner-freebsd-current@FreeBSD.ORG Tue May 13 10:45:45 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B695837B401; Tue, 13 May 2003 10:45:44 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9316C43FAF; Tue, 13 May 2003 10:45:41 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.9/8.12.9) with ESMTP id h4DHjVOn082795; Tue, 13 May 2003 13:45:31 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)h4DHjVxt082792; Tue, 13 May 2003 13:45:31 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Tue, 13 May 2003 13:45:31 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: "Andrew P. Lentvorski, Jr." In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Don Lewis cc: alfred@FreeBSD.org cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 May 2003 17:45:45 -0000 On Tue, 13 May 2003, Andrew P. Lentvorski, Jr. wrote: > On Mon, 12 May 2003, Robert Watson wrote: > > > (3) Sometimes rpc.lockd on 5.x acting as a server gets really confused > > when you mix local and remote locks. I haven't quite figured out the > > circumstances, but occasionally I run into a situation where a client > > contends against an existing lock on the server, and the client never > > receives a notification from the server that the lock has been > > released. It looks like the server stores state that the lock is > > contended, but perhaps never properly re-polls the kernel to see if > > the lock has been locally re-released: > > I just looked at the code again. rpc.lockd does not spawn off extra > processes to continuously poll the kernel. It assumes that it has control > of the underlying file and only rechecks the blockedlocklist when it > receives and grants an NFS file unlock. > > Consequently, contention on the hardware needs to actually cause a *fail* > and not queue up a lock for later. Currently, it returns a fail but > still executes add_blockingfilelock. The offending code in lockd_lock.c > is: <...> > A possible fix should be: <...> > This should cause the server to return nlm4_denied and the client should > eventually retry the lock rather than waiting on the server. > > CAUTION! I haven't checked or compiled this code. If folks need me to, > I can, but it will be a couple of days as I don't have two machines > handy that I can install -CURRENT on and set up NFS. The code actually compiles fine, and even runs :-). I now reliably get EACCES for blocking and non-blocking lock requests on the client when contending against a server lock. Here are the cases: (1) Client attempts blocking and non-blocking O_EXLOCK on open, uncontended: crash1:/tmp> ./locktest nocreate openlock block noflock test 1 sleep 1 crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1 sleep 1 Log entries on client: May 13 13:38:24 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 13:38:25 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1 May 13 13:38:25 crash1 rpc.lockd: process 596: No such process May 13 13:38:38 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 13:38:39 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1 May 13 13:38:39 crash1 rpc.lockd: process 597: No such process Note odd ESRCH at the end, although things appear to operate fine in the test program. (2) Client attempts blocking and non-blocking O_EXLOCK on open, contended against a server exclusive lock: crash1:/tmp> ./locktest nocreate openlock block noflock test 1 open: Permission denied crash1:/tmp> ./locktest nocreate openlock nonblock noflock test 1 open: Permission denied May 13 13:40:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 13:40:57 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 So the client isn't retrying, or mapping errors right after this patch, but the failure modes are more consistent and I seem not to be getting any interminable hangs anymore on the client. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories