Date: Mon, 7 May 2007 17:00:23 +0200 (CEST) From: Martin Blapp <mb@imp.ch> To: freebsd-current@freebsd.org Cc: alfred@freebsd.org, rwatson@freebsd.org, mohans@freebsd.org Subject: NFS deadlock and status of nfs locking (rpc.lockd) Message-ID: <20070507162253.F2786@godot>
next in thread | raw e-mail | index | archive | help
Hi all, We have 1-2 times per day a nfs deadlock on a busy 6.2 STABLE (1 week old) server, and we suspect rpc.lockd to be the problem. Unfortunalty we depend on a working rpc.lockd :-( . The problems did not occour on a FreeBSD 5.4 server, they just appeared after upgrading. This is an excerpt from 'ps -auxwww' when the deadlock happened. But as I said, we only supect that rpc.lockd is the real problem. root 693 0.0 0.1 3248 2040 ?? Ss 11:08AM 0:00.05 rpc.lockd: serve 0 1 0 96 0 select daemon 700 0.0 0.1 3200 1948 ?? I 11:08AM 0:00.00 rpc.lockd: clien 1 693 38 4 0 nfsloc root 677 0.0 0.1 2968 1696 ?? Is 11:08AM 0:00.04 nfsd: master (nf 0 1 0 96 0 select root 678 0.0 0.0 1324 716 ?? D 11:08AM 0:01.02 nfsd: server (nf 0 677 0 -4 0 ufs root 679 0.0 0.0 1324 716 ?? D 11:08AM 0:00.12 nfsd: server (nf 0 677 0 -8 0 biord root 680 0.0 0.0 1324 716 ?? D 11:08AM 0:00.15 nfsd: server (nf 0 677 0 -4 0 ufs root 681 0.0 0.0 1324 716 ?? D 11:08AM 0:00.42 nfsd: server (nf 0 677 0 -4 0 ufs The nfsd instances with 'ufs' are unkillable. Sometimes it helps to stop rpc.lockd and to restart it. The master nfsd process is unkillable too. The server is a SMP machine, HTT enabled. Now I have some questions: - Can rpc.lockd be the underlying problem for such a nfsd hang ? - Anybody of you knows a fix which hasn't already MFCd which could cause this ? - Anything I could do to get more debugging informations ? Is turning on rpc.lockd debug information safe ? (run rpc.lockd with -d). - Who is currently working on rpc.lockd ? What is the current status if I'd be interested to work on it. - One instance of the exported file systems is mounted via iscsi. What happens if such a export is going away for some seconds, gets reconnected and then appears again. How are nfs timeouts handled in such a case ? Could that be the problem ? Unfortunatly we have seen such hangs with and without this particular filesystem mounted, but it happens definitly a lot more with the iscsi filesystem mounted. -- Martin Martin Blapp, <mb@imp.ch> <mbr@FreeBSD.org> ------------------------------------------------------------------ ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 61 826 93 00 Fax: +41 61 826 93 01 PGP: <finger -l mbr@freebsd.org> PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E ------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070507162253.F2786>
