From owner-freebsd-current@FreeBSD.ORG  Mon May  7 15:35:40 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 7E77D16A400;
	Mon,  7 May 2007 15:35:40 +0000 (UTC) (envelope-from mb@imp.ch)
Received: from pop.imp.ch (mx2.imp.ch [157.161.9.17])
	by mx1.freebsd.org (Postfix) with ESMTP id EBBC713C483;
	Mon,  7 May 2007 15:35:39 +0000 (UTC) (envelope-from mb@imp.ch)
Received: from godot (godot.imp.ch [157.161.4.8])
	by pop.imp.ch (8.13.8/8.13.8/Submit_imp) with ESMTP id l47F0Ni7063381; 
	Mon, 7 May 2007 17:00:24 +0200 (CEST) (envelope-from mb@imp.ch)
Date: Mon, 7 May 2007 17:00:23 +0200 (CEST)
From: Martin Blapp <mb@imp.ch>
X-X-Sender: mb@godot
To: freebsd-current@freebsd.org
Message-ID: <20070507162253.F2786@godot>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: alfred@freebsd.org, rwatson@freebsd.org, mohans@freebsd.org
Subject: NFS deadlock and status of nfs locking (rpc.lockd)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 May 2007 15:35:40 -0000


Hi all,

We have 1-2 times per day a nfs deadlock on a busy 6.2 STABLE (1 week 
old) server, and we suspect rpc.lockd to be the problem. Unfortunalty we depend
on a working rpc.lockd :-( . The problems did not occour on a FreeBSD 5.4 
server, they just appeared after upgrading.

This is an excerpt from 'ps -auxwww' when the deadlock happened. But as I said,
we only supect that rpc.lockd is the real problem.

root     693  0.0  0.1  3248  2040  ??  Ss   11:08AM   0:00.05 rpc.lockd: serve     0     1   0  96  0 select
daemon   700  0.0  0.1  3200  1948  ??  I    11:08AM   0:00.00 rpc.lockd: clien     1   693  38   4  0 nfsloc
root     677  0.0  0.1  2968  1696  ??  Is   11:08AM   0:00.04 nfsd: master (nf     0     1   0  96  0 select
root     678  0.0  0.0  1324   716  ??  D    11:08AM   0:01.02 nfsd: server (nf     0   677   0  -4  0 ufs
root     679  0.0  0.0  1324   716  ??  D    11:08AM   0:00.12 nfsd: server (nf     0   677   0  -8  0 biord
root     680  0.0  0.0  1324   716  ??  D    11:08AM   0:00.15 nfsd: server (nf     0   677   0  -4  0 ufs
root     681  0.0  0.0  1324   716  ??  D    11:08AM   0:00.42 nfsd: server (nf     0   677   0  -4  0 ufs

The nfsd instances with 'ufs' are unkillable. Sometimes it helps to stop 
rpc.lockd and to restart it. The master nfsd process is unkillable too.

The server is a SMP machine, HTT enabled.

Now I have some questions:

- Can rpc.lockd be the underlying problem for such a nfsd hang ?

- Anybody of you knows a fix which hasn't already MFCd which could cause this ?

- Anything I could do to get more debugging informations ? Is turning on
   rpc.lockd debug information safe ? (run rpc.lockd with -d).

- Who is currently working on rpc.lockd ? What is the current status if I'd be
   interested to work on it.

- One instance of the exported file systems is mounted via iscsi. What happens
   if such a export is going away for some seconds, gets reconnected and then
   appears again. How are nfs timeouts handled in such a case ? Could that be
   the problem ? Unfortunatly we have seen such hangs with and without this
   particular filesystem mounted, but it happens definitly a lot more with the
   iscsi filesystem mounted.

--
Martin

Martin Blapp, <mb@imp.ch> <mbr@FreeBSD.org>
------------------------------------------------------------------
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: <finger -l mbr@freebsd.org>
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
------------------------------------------------------------------