From owner-freebsd-bugs@FreeBSD.ORG Fri Jun 24 13:44:16 2011 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0099A106566B for ; Fri, 24 Jun 2011 13:44:16 +0000 (UTC) (envelope-from tim@stoo.org) Received: from munch.stoo.org (munch.stoo.org [208.87.198.215]) by mx1.freebsd.org (Postfix) with ESMTP id D7CD78FC12 for ; Fri, 24 Jun 2011 13:44:15 +0000 (UTC) Received: from [10.1.1.88] (static-71-190-247-30.nycmny.fios.verizon.net [71.190.247.30]) by munch.stoo.org (Stoo Research Mail Services) with ESMTPSA id 74CB33379 for ; Fri, 24 Jun 2011 09:44:14 -0400 (EDT) Message-ID: <4E0494AC.1060504@stoo.org> Date: Fri, 24 Jun 2011 09:44:12 -0400 From: Tim Stewart Organization: Stoo Research User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.9.2.7) Gecko/20101031 Lightning/1.0b2 Thunderbird/3.1.1 MIME-Version: 1.0 To: freebsd-bugs@freebsd.org References: <4E03781D.1090504@stoo.org> In-Reply-To: <4E03781D.1090504@stoo.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ``Fatal double fault'' when running nightly jobs, perhaps ZFS-related X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2011 13:44:16 -0000 On 06/23/11 01:30 PM, Tim Stewart wrote: > Hello, > > I have a FreeBSD 8.2-RELEASE-p2 system that uses ZFS, including booting. > The kernel is locally compiled and is just the GENERIC configuration > with DTrace enabled and a MFI driver patch from the mailing list post at > [1]. > > The system panicked with a ``Fatal double fault'' at 3:09 AM one night > (typed in manually from a screenshot, but proofread): > > > Fatal double fault > rip = 0xffffffff805d3eeb > rsp = 0xffffff848585b000 > rbp = 0xffffff848585b020 > cpuid = 0; apic id = 20 > panic: double fault > cpuid = 0 > KDB: stack backtrace > #0 0xffffffff80618d3e at kdb_backtrace+0x5e > #1 0xffffffff805e4d47 at panic+0x187 > #2 0xffffffff808dc834 at dblfault_handler+0xa4 > #3 0xffffffff808c53ad at Xdblfault+0xad > Uptime: 3d11h11m37s > Cannot dump. Device not defined or unavailable. > Automatic reboot in 15 seconds - press a key on the console to abort > Sleeping thread (tid 100141, pid 5) owns a non-sleepable lock > > > It seems likely that the activity that prompted the fault is related to > the nightly periodic jobs. Others have suggested that it may be related > to /etc/periodic/security/100.chksetuid (see [2] and [3]), as it does a > find on every filesystem not marked as `nosuid.' Indeed, I have a ZFS > dataset containing 71+ million files that (at the time) was not marked > as nosuid. > > I have not been able to replicate the issue since and it has only > happened once. I have set `setuid=off' on the large ZFS dataset so that > 100.chksetuid will no longer traverse it every night. > > Any clue as to what is happening here? I don't have a kernel core dump > since I'm using ZFS for swap, though I can change this if it helps > troubleshoot the issue in the event of another crash. > > Thanks for any help you can provide, Would a different list be more appropriate for this inquiry? Thanks, -- -TimS Tim Stewart