From owner-freebsd-stable@FreeBSD.ORG Wed Jul 5 13:05:01 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0DE7216A4DA for ; Wed, 5 Jul 2006 13:05:00 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 72EA343D53 for ; Wed, 5 Jul 2006 13:05:00 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id CBECE46CD3; Wed, 5 Jul 2006 09:04:59 -0400 (EDT) Date: Wed, 5 Jul 2006 14:04:59 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Kostik Belousov In-Reply-To: <20060705122040.GN37822@deviant.kiev.zoral.com.ua> Message-ID: <20060705140225.X18236@fledge.watson.org> References: <20060705100403.Y80381@fledge.watson.org> <20060705113822.GM37822@deviant.kiev.zoral.com.ua> <20060705122040.GN37822@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org, Michel Talon Subject: Re: NFS Locking Issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jul 2006 13:05:01 -0000 On Wed, 5 Jul 2006, Kostik Belousov wrote: >> Also, the both lockd processes now put identification information in the >> proctitle (srv and kern). SIGUSR1 shall be sent to srv process. > > Hmm, after looking at the dump there and some code reading, I have noted the > following: > > 1. NLM lock request contains the field caller_name. It is filled by (let > call it) kernel rpc.lockd by the results of hostname(3). > > 2. This caller_name is used by server rpc.lockd to send request for host > monitoring to rpc.statd (see send_granted). Request is made by clnt_call, > that is blocking rpc call. > > 3. rpc.statd does getaddrinfo on caller_name to determine address of the > host to monitor. > > If the getaddrinfo in step 3 waits for resolver, then your client machine > will get locking process in"lockd" state. > > Could people experiencing rpc.lockd mistery at least report whether _server_ > machine successfully resolve hostname of clients as reported by hostname? > And, if yes, to what family of IP protocols ? It's not impossible. It would be interesting to see if ps axl reports that rpc.lockd is in the kqread state, which would suggest it was blocked in the resolver. We probably ought to review rpc.statd and make sure it's generally sensible. I've noticed that its notification process on start is a bit poorly structured in terms of how it notifies hosts of its state change -- if one host is down, it may take a very long time to notify other hosts. There are a number of other dubious things about the NLM protocol design (at least, from my reading last night). I've also noticed that our rpc.lockd is particularly sensitive, on the client side, to locks being released by a different process than the process that acquired the lock, which is triggered excessively by our new libpidfile in RELENG_6. Robert N M Watson Computer Laboratory University of Cambridge