Date: Tue, 13 May 2003 14:51:17 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.org> To: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org> Cc: current@FreeBSD.org Subject: Re: rpc.lockd spinning; much breakage Message-ID: <Pine.NEB.3.96L.1030513135121.72145Q-100000@fledge.watson.org> In-Reply-To: <Pine.NEB.3.96L.1030513133547.72145O-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 13 May 2003, Robert Watson wrote: > So the client isn't retrying, or mapping errors right after this patch, > but the failure modes are more consistent and I seem not to be getting > any interminable hangs anymore on the client. I should clarify this statement: I no longer get the odd hangs when it comes to client and server interactions when contending a lock established on the server and now tested by the client. I still bump into the "client isn't woken up in a timely manner after a lock is released by the same or another client". Here's the demonstration case with a bit more detail from what I presented earlier. The server runs on host cboss, the client runs twice on host crash1 on different pty's. In this scenario, each client attempts to grab an exclusive lock, potentially blocking, and then sleep for 10 seconds (this is with one of the earlier posted patches): crash1:/tmp> ./locktest nocreate openlock block noflock test 10 933 open(test, 32, 0666) Tue May 13 14:31:31 2003 933 open() returns Tue May 13 14:31:31 2003 933 sleep(10) Tue May 13 14:31:31 2003 933 sleep() returns Tue May 13 14:31:41 2003 crash1:/tmp> ./locktest nocreate openlock block noflock test 0 934 open(test, 32, 0666) Tue May 13 14:31:33 2003 934 open() returns Tue May 13 14:31:53 2003 rpc.lockd results on crash1: May 13 14:31:31 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 14:31:33 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 May 13 14:31:42 crash1 rpc.lockd: nlm_granted_msg from 192.168.50.1 May 13 14:31:42 crash1 rpc.lockd: nlm_unlock_res from 192.168.50.1 May 13 14:31:42 crash1 rpc.lockd: process 933: No such process May 13 14:31:53 crash1 rpc.lockd: nlm_lock_res from 192.168.50.1 In this example, pid 934 requests the lock on the object at 14:31:33 -- pid 933 released that lock at 14:31:41, but the pid 934 isn't notified until 14:31:53. It looks like it should have been notified at 14:31:42 when a granted message is received, but instead it is notified when the client rpc.lockd polls again 10 seconds from lock inception. I almost wonder if that ESRCH shouldn't have been the notification for 934 and it was using the wrong pid. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030513135121.72145Q-100000>