From owner-freebsd-bugs@FreeBSD.ORG Thu Jun 23 17:47:25 2011 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8DD41065677 for ; Thu, 23 Jun 2011 17:47:25 +0000 (UTC) (envelope-from tim@stoo.org) Received: from munch.stoo.org (munch.stoo.org [208.87.198.215]) by mx1.freebsd.org (Postfix) with ESMTP id 8DAE18FC12 for ; Thu, 23 Jun 2011 17:47:25 +0000 (UTC) Received: from [10.1.1.88] (static-71-190-247-30.nycmny.fios.verizon.net [71.190.247.30]) by munch.stoo.org (Stoo Research Mail Services) with ESMTPSA id CD7193289; Thu, 23 Jun 2011 13:30:07 -0400 (EDT) Message-ID: <4E03781D.1090504@stoo.org> Date: Thu, 23 Jun 2011 13:30:05 -0400 From: Tim Stewart Organization: Stoo Research User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.9.2.7) Gecko/20101031 Lightning/1.0b2 Thunderbird/3.1.1 MIME-Version: 1.0 To: freebsd-bugs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: ``Fatal double fault'' when running nightly jobs, perhaps ZFS-related X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 17:47:25 -0000 Hello, I have a FreeBSD 8.2-RELEASE-p2 system that uses ZFS, including booting. The kernel is locally compiled and is just the GENERIC configuration with DTrace enabled and a MFI driver patch from the mailing list post at [1]. The system panicked with a ``Fatal double fault'' at 3:09 AM one night (typed in manually from a screenshot, but proofread): Fatal double fault rip = 0xffffffff805d3eeb rsp = 0xffffff848585b000 rbp = 0xffffff848585b020 cpuid = 0; apic id = 20 panic: double fault cpuid = 0 KDB: stack backtrace #0 0xffffffff80618d3e at kdb_backtrace+0x5e #1 0xffffffff805e4d47 at panic+0x187 #2 0xffffffff808dc834 at dblfault_handler+0xa4 #3 0xffffffff808c53ad at Xdblfault+0xad Uptime: 3d11h11m37s Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort Sleeping thread (tid 100141, pid 5) owns a non-sleepable lock It seems likely that the activity that prompted the fault is related to the nightly periodic jobs. Others have suggested that it may be related to /etc/periodic/security/100.chksetuid (see [2] and [3]), as it does a find on every filesystem not marked as `nosuid.' Indeed, I have a ZFS dataset containing 71+ million files that (at the time) was not marked as nosuid. I have not been able to replicate the issue since and it has only happened once. I have set `setuid=off' on the large ZFS dataset so that 100.chksetuid will no longer traverse it every night. Any clue as to what is happening here? I don't have a kernel core dump since I'm using ZFS for swap, though I can change this if it helps troubleshoot the issue in the event of another crash. Thanks for any help you can provide, -- -TimS Tim Stewart References: [1] http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html [2] http://lists.freebsd.org/pipermail/freebsd-bugs/2011-March/043781.html [3] http://forums.freebsd.org/showthread.php?t=23919