From owner-freebsd-net@FreeBSD.ORG Thu Jul 9 04:27:53 2009 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3396D1065707 for ; Thu, 9 Jul 2009 04:27:51 +0000 (UTC) (envelope-from jhein@timing.com) Received: from Daffy.timing.com (mail.timing.com [206.168.13.218]) by mx1.freebsd.org (Postfix) with ESMTP id CA2B18FC2D for ; Thu, 9 Jul 2009 04:27:50 +0000 (UTC) (envelope-from jhein@timing.com) Received: from gromit.timing.com (gromit.timing.com [206.168.13.209]) by Daffy.timing.com (8.13.1/8.13.1) with ESMTP id n68Lprar043237 for ; Wed, 8 Jul 2009 15:51:53 -0600 (MDT) (envelope-from jhein@timing.com) Received: from gromit.timing.com (localhost [127.0.0.1]) by gromit.timing.com (8.14.3/8.14.3) with ESMTP id n68LppuT007333; Wed, 8 Jul 2009 15:51:51 -0600 (MDT) (envelope-from jhein@gromit.timing.com) Received: (from jhein@localhost) by gromit.timing.com (8.14.3/8.14.3/Submit) id n68Lpp8U007330; Wed, 8 Jul 2009 15:51:51 -0600 (MDT) (envelope-from jhein) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <19029.5367.534192.928426@gromit.timing.com> Date: Wed, 8 Jul 2009 15:51:51 -0600 From: John Hein To: net@freebsd.org In-Reply-To: <19029.4145.296260.915327@gromit.timing.com> References: <19029.4145.296260.915327@gromit.timing.com> X-Mailer: VM 7.19 under Emacs 22.3.1 X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on Daffy.timing.com X-Virus-Status: Clean Cc: Subject: Re: network lock manager (lockd) deadlocked in 'rpcrecv' X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2009 04:27:54 -0000 John Hein wrote at 15:31 -0600 on Jul 8, 2009: > I have a home directory on FreeBSD 7.2-stable (20090705), amd64. > It is serving up the directory over nfs (v3, tcp), and now > I'm seeing lots of 'lockd not responding' on Fedora 10 & 11 systems. > > USER PID PPID SID NI %CPU %MEM VSZ RSS TT WCHAN STAT STARTED TIME COMMAND > root 791 1 791 0 0.0 0.0 6748 1500 ?? rpcrec Ds 2:45PM 0:05.80 /usr/sbin/rpc.lockd > > Once lockd gets in this state, doing a test lock on a file > from a FreeBSD box locks with 'lockd not responding', too > (and ctrl-c and kill -9 does nothing). > > USER PID PPID SID NI %CPU %MEM VSZ RSS TT WCHAN STAT STARTED TIME COMMAND > jhein 6297 3491 3491 0 0.0 0.0 1412 604 p5 nlmrcv T+ 3:18PM 0:00.00 /h/jhein/nfslocktest /nfs/locktest > > > I see this on an i386 6.4-stable, too. Also in dmesg: NLM: failed to contact remote rpcbind, stat = 5, port = 28416 And from ddb... Tracing command rpc.lockd pid 791 tid 100176 td 0xffffff00069dd720 sched_switch() at 0xffffffff8037df95 = sched_switch+0x1d5 mi_switch() at 0xffffffff803656fb = mi_switch+0x18b sleepq_timedwait() at 0xffffffff80390aeb = sleepq_timedwait+0x3b _sleep() at 0xffffffff80365cd4 = _sleep+0x324 clnt_dg_call() at 0xffffffff80504a0b = clnt_dg_call+0x4fb nlm_get_rpc() at 0xffffffff804f3ef7 = nlm_get_rpc+0x147 nlm_host_get_rpc() at 0xffffffff804f430e = nlm_host_get_rpc+0x10e nlm_do_lock() at 0xffffffff804f58be = nlm_do_lock+0x1ce nlm4_lock_4_svc() at 0xffffffff804f6c91 = nlm4_lock_4_svc+0x11 nlm_prog_4() at 0xffffffff804f8098 = nlm_prog_4+0x308 svc_run() at 0xffffffff8050c1f3 = svc_run+0x293 nlm_syscall() at 0xffffffff804f675c = nlm_syscall+0x79c syscall() at 0xffffffff805818f4 = syscall+0x1b4 Xfast_syscall() at 0xffffffff8056d35b = Xfast_syscall+0xab --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x8008a91ec, rsp = 0x7fffffffed08, rbp = 0x7fffffffe e20 ---