Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Jul 2006 14:04:59 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org, Michel Talon <talon@lpthe.jussieu.fr>
Subject:   Re: NFS Locking Issue
Message-ID:  <20060705140225.X18236@fledge.watson.org>
In-Reply-To: <20060705122040.GN37822@deviant.kiev.zoral.com.ua>
References:  <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <20060705113822.GM37822@deviant.kiev.zoral.com.ua> <20060705122040.GN37822@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 5 Jul 2006, Kostik Belousov wrote:

>> Also, the both lockd processes now put identification information in the 
>> proctitle (srv and kern). SIGUSR1 shall be sent to srv process.
>
> Hmm, after looking at the dump there and some code reading, I have noted the 
> following:
>
> 1. NLM lock request contains the field caller_name. It is filled by (let 
> call it) kernel rpc.lockd by the results of hostname(3).
>
> 2. This caller_name is used by server rpc.lockd to send request for host 
> monitoring to rpc.statd (see send_granted). Request is made by clnt_call, 
> that is blocking rpc call.
>
> 3. rpc.statd does getaddrinfo on caller_name to determine address of the 
> host to monitor.
>
> If the getaddrinfo in step 3 waits for resolver, then your client machine 
> will get locking process in"lockd" state.
>
> Could people experiencing rpc.lockd mistery at least report whether _server_ 
> machine successfully resolve hostname of clients as reported by hostname? 
> And, if yes, to what family of IP protocols ?

It's not impossible.  It would be interesting to see if ps axl reports that 
rpc.lockd is in the kqread state, which would suggest it was blocked in the 
resolver.  We probably ought to review rpc.statd and make sure it's generally 
sensible.  I've noticed that its notification process on start is a bit poorly 
structured in terms of how it notifies hosts of its state change -- if one 
host is down, it may take a very long time to notify other hosts.

There are a number of other dubious things about the NLM protocol design (at 
least, from my reading last night). I've also noticed that our rpc.lockd is 
particularly sensitive, on the client side, to locks being released by a 
different process than the process that acquired the lock, which is triggered 
excessively by our new libpidfile in RELENG_6.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060705140225.X18236>