From owner-freebsd-fs@FreeBSD.ORG Mon Jun 2 09:12:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BB2E21EA for ; Mon, 2 Jun 2014 09:12:20 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 52F302C42 for ; Mon, 2 Jun 2014 09:12:19 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 4C65A20E7088C; Mon, 2 Jun 2014 09:12:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: ** X-Spam-Status: No, score=2.0 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,HELO_NO_DOMAIN,RDNS_DYNAMIC autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 2F99020E70886; Mon, 2 Jun 2014 09:12:15 +0000 (UTC) Message-ID: <782C34792E95484DBA631A96FE3BEF20@multiplay.co.uk> From: "Steven Hartland" To: , References: <5388D64D.4030400@bayphoto.com> <5388E5B4.3030002@bayphoto.com> <538BBEB7.4070008@bayphoto.com> Subject: Re: ZFS Kernel Panic on 10.0-RELEASE Date: Mon, 2 Jun 2014 10:12:20 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="Windows-1252"; reply-type=response Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jun 2014 09:12:20 -0000 ----- Original Message ----- From: "Mike Carlson" > On 5/30/2014 1:10 PM, Mike Carlson wrote: > > On 5/30/2014 12:48 PM, Jordan Hubbard wrote: > >> On May 30, 2014, at 12:04 PM, Mike Carlson wrote: > >> > >>> Over the weekend, we had upgraded one of our servers from 9.1-RELEASE to 10.0-RELEASE, and then the zpool was upgraded (from > >>> 28 to 5000) > >>> > >>> Tuesday afternoon, the server suddenly rebooted (kernel panic), and as soon as it tried to remount all of its ZFS volumes, > >>> it panic'd again. > >> What’s the panic text? That’s pretty crucial in figuring out whether this is recoverable (e.g. if it’s spacemap corruption > >> related, probably not). > >> > >> - Jordan > >> > >> > >> > > I had linked the pictures I took of the console, but here is my manual reproduction: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 7; apic id = 07 > > fault virtual address = 0x4a0 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff81a7f39f > > stack pointer = 0x28:0xfffffe1834789570 > > frame pointer = 0x28:0xfffffe18347895b0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 1849 (txg_thread_enter) > > trap number = 12 > > panic: page fault > > cpuid = 7 > > KDB: stack backtrace: > > #0 0xffffffff808e7dd0 at kdb_backtrace+0x60 > > #1 0xffffffff808af8b5 at panic+0x155 > > #2 0xffffffff80c8e629 at trap_fatal+0x3a2 > > #3 0xffffffff80c8e969 at trap_pfault+0x2c9 > > #4 0xffffffff80c8e0f6 at trap+0x5e6 > > #5 0xffffffff80c75392 at calltrap+0x8 > > #6 0xffffffff81a53b5a at dsl_dataset_block_kill+0x3a > > #7 0xffffffff81a50967 at dnode_sync+0x237 > > #8 0xffffffff81a48fcb at dmu_objset_sync_dnodes+0x2b > > #9 0xffffffff81a48e4d at dmo_objset_sync+0x1ed > > #10 0xffffffff81a5d29a at dsl_pool_sync+0xca > > #11 0xffffffff81a78a4e at spa_sync+0x52e > > #12 0xffffffff81a81925 at txg_sync_thread+0x375 > > #13 0xffffffff8088198a at fork_exit+0x9a > > #14 0xffffffff80c758ce at fork_trampoline+0xe > > uptime: 46s > > Automatic reboot in 15 seconds - press a key on the console to abort > > > This just happened again to another server. We upgraded two servers on the same morning, and now both of them exhibit this > corrupted zfs volume and panic behavior. > > Out of all the volumes, one of them is causing the panic, and the panic message is nearly identical. > > I have 4 snapshots over the last 24 hours, so hopefully a snapshot from noon today can be sent to a new volume ( zfs send | zfs > recv ) > > I guess I can now rule out it being a hardware issue, this is clearly problem related to the upgrade (freebsd-update was used). > I first thought the first system had a bad upgrade, perhaps a mix and match of 9.2 binaries running on a 10 kernel, but I used > the 'freebsd-update IDS' command to verify the integrity of the install, and it looked good, the only differences were config > files in /etc/ that we manage. > Do you have a kernel crash dump from this? Also can you confirm if your amd64 or just i386? Regards Steve