From owner-freebsd-current@FreeBSD.ORG Mon May 7 15:35:40 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7E77D16A400; Mon, 7 May 2007 15:35:40 +0000 (UTC) (envelope-from mb@imp.ch) Received: from pop.imp.ch (mx2.imp.ch [157.161.9.17]) by mx1.freebsd.org (Postfix) with ESMTP id EBBC713C483; Mon, 7 May 2007 15:35:39 +0000 (UTC) (envelope-from mb@imp.ch) Received: from godot (godot.imp.ch [157.161.4.8]) by pop.imp.ch (8.13.8/8.13.8/Submit_imp) with ESMTP id l47F0Ni7063381; Mon, 7 May 2007 17:00:24 +0200 (CEST) (envelope-from mb@imp.ch) Date: Mon, 7 May 2007 17:00:23 +0200 (CEST) From: Martin Blapp X-X-Sender: mb@godot To: freebsd-current@freebsd.org Message-ID: <20070507162253.F2786@godot> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: alfred@freebsd.org, rwatson@freebsd.org, mohans@freebsd.org Subject: NFS deadlock and status of nfs locking (rpc.lockd) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2007 15:35:40 -0000 Hi all, We have 1-2 times per day a nfs deadlock on a busy 6.2 STABLE (1 week old) server, and we suspect rpc.lockd to be the problem. Unfortunalty we depend on a working rpc.lockd :-( . The problems did not occour on a FreeBSD 5.4 server, they just appeared after upgrading. This is an excerpt from 'ps -auxwww' when the deadlock happened. But as I said, we only supect that rpc.lockd is the real problem. root 693 0.0 0.1 3248 2040 ?? Ss 11:08AM 0:00.05 rpc.lockd: serve 0 1 0 96 0 select daemon 700 0.0 0.1 3200 1948 ?? I 11:08AM 0:00.00 rpc.lockd: clien 1 693 38 4 0 nfsloc root 677 0.0 0.1 2968 1696 ?? Is 11:08AM 0:00.04 nfsd: master (nf 0 1 0 96 0 select root 678 0.0 0.0 1324 716 ?? D 11:08AM 0:01.02 nfsd: server (nf 0 677 0 -4 0 ufs root 679 0.0 0.0 1324 716 ?? D 11:08AM 0:00.12 nfsd: server (nf 0 677 0 -8 0 biord root 680 0.0 0.0 1324 716 ?? D 11:08AM 0:00.15 nfsd: server (nf 0 677 0 -4 0 ufs root 681 0.0 0.0 1324 716 ?? D 11:08AM 0:00.42 nfsd: server (nf 0 677 0 -4 0 ufs The nfsd instances with 'ufs' are unkillable. Sometimes it helps to stop rpc.lockd and to restart it. The master nfsd process is unkillable too. The server is a SMP machine, HTT enabled. Now I have some questions: - Can rpc.lockd be the underlying problem for such a nfsd hang ? - Anybody of you knows a fix which hasn't already MFCd which could cause this ? - Anything I could do to get more debugging informations ? Is turning on rpc.lockd debug information safe ? (run rpc.lockd with -d). - Who is currently working on rpc.lockd ? What is the current status if I'd be interested to work on it. - One instance of the exported file systems is mounted via iscsi. What happens if such a export is going away for some seconds, gets reconnected and then appears again. How are nfs timeouts handled in such a case ? Could that be the problem ? Unfortunatly we have seen such hangs with and without this particular filesystem mounted, but it happens definitly a lot more with the iscsi filesystem mounted. -- Martin Martin Blapp, ------------------------------------------------------------------ ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 61 826 93 00 Fax: +41 61 826 93 01 PGP: PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E ------------------------------------------------------------------