From owner-freebsd-current@FreeBSD.ORG Tue Jun 16 00:58:12 2009 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88BB01065673 for ; Tue, 16 Jun 2009 00:58:12 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id 393968FC18 for ; Tue, 16 Jun 2009 00:58:12 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from localhost (localhost [127.0.0.1]) by mx.egr.msu.edu (Postfix) with ESMTP id 84C0971F11D; Mon, 15 Jun 2009 20:58:11 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mx.egr.msu.edu ([127.0.0.1]) by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ofbA9gVbTLMe; Mon, 15 Jun 2009 20:58:11 -0400 (EDT) Received: from localhost (daemon.egr.msu.edu [35.9.44.65]) by mx.egr.msu.edu (Postfix) with ESMTP id 5F68F71F11B; Mon, 15 Jun 2009 20:58:11 -0400 (EDT) Received: by localhost (Postfix, from userid 21281) id 4A7A5AE; Mon, 15 Jun 2009 20:58:11 -0400 (EDT) Date: Mon, 15 Jun 2009 20:58:11 -0400 From: Adam McDougall To: Kris Kennaway Message-ID: <20090616005810.GE1111@egr.msu.edu> References: <1242075474.72992.118.camel@hood.oook.cz> <4A36B6D8.8000701@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4A36B6D8.8000701@FreeBSD.org> User-Agent: Mutt/1.5.19 (2009-01-05) Cc: current@FreeBSD.org Subject: Re: pointyhat panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 00:58:12 -0000 On Mon, Jun 15, 2009 at 10:02:16PM +0100, Kris Kennaway wrote: Pav Lucistnik wrote: > panic: mtx_lock() of destroyed mutex @ /usr/src/sys/rpc/clnt_vc.c:953 > cpuid = 2 > KDB: enter: panic > [thread pid 0 tid 100029 ] > Stopped at kdb_enter+0x3d: movq $0,0x3f5fb8(%rip) > db> bt > Tracing pid 0 tid 100029 td 0xffffff00018e1000 > kdb_enter() at kdb_enter+0x3d > panic() at panic+0x17b > _mtx_lock_flags() at _mtx_lock_flags+0xc5 > clnt_vc_soupcall() at clnt_vc_soupcall+0x273 > sowakeup() at sowakeup+0xf8 > tcp_do_segment() at tcp_do_segment+0x23c9 > tcp_input() at tcp_input+0x9ec > ip_input() at ip_input+0xbc > ether_demux() at ether_demux+0x1ed > ether_input() at ether_input+0x171 > em_rxeof() at em_rxeof+0x201 > em_handle_rxtx() at em_handle_rxtx+0x4b > taskqueue_run() at taskqueue_run+0x96 > taskqueue_thread_loop() at taskqueue_thread_loop+0x3f > fork_exit() at fork_exit+0x12a > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffff240a6d40, rbp = 0 --- > > The box is in kdb on serial console for now. May 9 -CURRENT, I think. > This happened again. The trigger was this (^C of a find on a busy netapp volume with a lot of other concurrent nfs traffic to the same mountpoint): pointyhat# find . -name \*.bz2 -mmin -10 ^Cnfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding nfs server dumpster:/vol/vol4/pointyhat: not responding load: 4.54 cmd: find 93357 [rpccon] 11.19u 111.62s 0% 4848k About 5-10 minutes later the machine panicked. I'll try updating to a newer -CURRENT. Kris This sounds like nearly exactly the same symptoms I noticed on a -current machine a few months ago, I was doing a du on a nfs mount, decided to ctrl-c it, got the not responding for a while and a few minutes after the system paniced. I hadn't had a chance to report it yet but I did find a workaround, it is stable if I remove "intr" from the NFS mount options. Hope this helps a little.