From owner-freebsd-stable@FreeBSD.ORG Mon Jul 25 08:25:04 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D511106564A for ; Mon, 25 Jul 2011 08:25:04 +0000 (UTC) (envelope-from amon@aelita.org) Received: from aelita.org (unknown [IPv6:2001:7a8:7003::3]) by mx1.freebsd.org (Postfix) with ESMTP id BA9FC8FC13 for ; Mon, 25 Jul 2011 08:25:03 +0000 (UTC) Received: from ra.aabs (localhost [127.0.0.1]) by aelita.org (8.14.4/8.14.4) with ESMTP id p6PAL7ud082363 for ; Mon, 25 Jul 2011 12:21:07 +0200 (CEST) (envelope-from amon@ra.aabs) Received: (from amon@localhost) by ra.aabs (8.14.4/8.14.4/Submit) id p6PAL7Sd082362 for freebsd-stable@freebsd.org; Mon, 25 Jul 2011 12:21:07 +0200 (CEST) (envelope-from amon) Date: Mon, 25 Jul 2011 12:21:07 +0200 From: Herve Boulouis To: freebsd-stable@freebsd.org Message-ID: <20110725102107.GB17204@ra.aabs> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: Sleeping thread owns a nonsleepable lock panic (& lor) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 08:25:04 -0000 Hi list, We have 2 freebsd 8.2-STABLE (cvsuped june 22) that keeps crashing in a bad way : The are doing heavy apache / php4 web serving from a nfs mount and panic at least once a day with the following message (no crash dump produced, hand copied from the console) : Sleeping on "vmopar" with the following non-sleepable locks held: exclusive sleep mutex NFSnode lock (NFSnode lock) r = 0 (0xffffff0201798000) locked @ nfsclient/nfs_subs.c:538 lock order reversal: 1st 0xffffffff018ff6da80 turnstile lock (turnstile lock) @ kern/subr_turnstile.c:190 2nd 0xffffffffff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570 lock order reversal: 1st 0xffffffff018ff6da80 turnstile lock (turnstile lock) @ kern/subr_turnstile.c:190 2nd 0xffffffffff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnstile.c:203 lock order reversal: 1st 0xffffffffff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnstile.c:203 2nd 0xffffffffff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570 Sleeping thread (tid 100998, pid 20700) owns a non-sleepable lock panic: sleeping thread cpuid = 1 panic: bufwrite: buffer is not busy??? cpuid = 1 The 2 servers share the same load and panic consistently. I enabled WITNESS on the 2 in the hope it would allow the boxes to auto reboot after panic and get extra debug info. I got debug info but the servers still hangs after the double panic :( I also noticed that immediately after rebooting following this panic, I got the following LORs (approximatively at the time rc.d is launching ports like apache & co) lock order reversal: 1st 0xffffff81ee00e388 bufwait (bufwait) @ kern/vfs_bio.c:2636 2nd 0xffffff0006e56c00 dirhash (dirhash) @ ufs/ufs/ufs_dirhash.c:285 lock order reversal: 1st 0xffffff0009c709e0 so_snd_sx (so_snd_sx) @ kern/uipc_sockbuf.c:145 2nd 0xffffff0124282620 ufs (ufs) @ kern/uipc_syscalls.c:2086 lock order reversal: 1st 0xffffff0009c709e0 so_snd_sx (so_snd_sx) @ kern/uipc_sockbuf.c:145 2nd 0xffffff01243569d0 nfs (nfs) @ kern/uipc_syscalls.c:2086 The server continued to work despite the lors so I don't know if this is related to the panics or not. What can I do from there to debug this further ? Regards, -- Herve Boulouis