From owner-freebsd-fs@FreeBSD.ORG Tue Feb 14 14:18:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A6ADA1065672 for ; Tue, 14 Feb 2012 14:18:46 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.10]) by mx1.freebsd.org (Postfix) with ESMTP id 50EB98FC0A for ; Tue, 14 Feb 2012 14:18:46 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu0) with ESMTP (Nemesis) id 0MCMOD-1RokVg0n1V-0090sz; Tue, 14 Feb 2012 15:18:45 +0100 Message-ID: <4F3A6D44.4040105@brockmann-consult.de> Date: Tue, 14 Feb 2012 15:18:44 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4F377457.4080807@FreeBSD.org> <20120212084052.GA43095@icarus.home.lan> <4F3789C1.9000903@FreeBSD.org> <4F37A8E7.7060102@brockmann-consult.de> <4F37B25A.10002@FreeBSD.org> <4F37BA49.50700@brockmann-consult.de> <4F37C52A.2030803@infracaninophile.co.uk> <4F3A30A2.9050603@FreeBSD.org> In-Reply-To: <4F3A30A2.9050603@FreeBSD.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:WjI56l9y8OWVJUDK1MJ9QJT99z7LiLEp9S1Y7w7PMdJ qKq7qG0Xj2bTM48h0t73dOn7bJ9/F+VzcuW2nqcdTYVOOrSRvt lrPHO/rvAH0nppfwEWtrYbVJ85hamMHvC5X7FDsCrC1btvC75R dgKOoPOOtT5IPGrNNuLtBoI0qcPMu+cugdRHdo1xkaTlckfXTq B1i3Jjso+EiKTSrdNB0s8FvkD9b1CWVul8Um44ByZfw0jfQ4zb eG5RC70q1SpL/Tz1rVxSaF0VdAmvwXga4o6zH5Bp5I6dj/Upoz hm+XfYG01IsnDzeYJuIO7+sy90t6iEgXagviht5qipxZJfb/oa nGfwMJuyhR93ShzKoWKgmBJ6xY6sB38/iG19MlERV Subject: Re: ZFS Snapshot problems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 14:18:46 -0000 Was your pool created at the current version, or upgraded? Some pools have issues when upgraded. Mine had a separate log that could not be removed after being upgraded to v28. So I destroyed it and recreated it, and things are fine. I don't know if it is the upgrade process that is broken, or just that the old ZFS code in FreeBSD was buggy, so pools are slightly corrupt. And what zpool and zfs version are you running? Is this your FreeBSD version? (FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10 20:35:13 GMT 2012) Your FreeBSD sounds very old. I tried 8.2 stable from April and it was unusably unstable with zfs. If you are using a recent STABLE pull, and have created the pool with an old version of FreeBSD, have you considered destroying the pool and recreating it with your backups, using zfs send & recv? On 02/14/2012 11:00 AM, Matthew Seaman wrote: > On 12/02/2012 13:56, Matthew Seaman wrote: >> On 12/02/2012 13:10, Peter Maloney wrote: >>>> I don't know what side effects that change has though. You can usually >>>> assume that ZFS will just figure out the pool regardless of labels >>>> (because it uses its own label metadata; see zdb output to see the other >>>> id), but apparently your case is something special, getting actual >>>> errors instead of only wrong names. >> Yes. This is most perplexing -- it's such a specific effect. The gpt >> thing may well be a red herring. It is odd though that zdb somehow >> discovers the gpart labels through reading zpool.cache, but zpool(1) >> uses the gptids instead. > Some more data about the underlying problem. > > -- There is another symptom: once the snapshots get wedged, the > system will crash on shutdown. I don't have a crashdump or > anything particularly useful, but this is what appeared in the > kernel log: > > + > +Fatal trap 12: page fault while in kernel mode > +cpuid = 0; apic id = 00 > +fault virtual address = 0xa8 > +fault code = supervisor write data, page not present > +instruction pointer = 0x20:0xffffffff805f9e65 > +stack pointer = 0x28:0xffffff800003a920 > +frame pointer = 0x28:0xffffff800003a930 > +code segment = base 0x0, limit 0xfffff, type 0x1b > + = DPL 0, pres 1, long 1, def32 0, gran 1 > +processor eflags = interrupt enabled, resume, IOPL = 0 > +current process = 1 (init) > +trap number = 12 > +panic: page fault > +cpuid = 0 > +KDB: stack backtrace: > +#0 0xffffffff80624c0e at kdb_backtrace+0x5e > +#1 0xffffffff805f1d53 at panic+0x183 > +#2 0xffffffff808df490 at trap_fatal+0x290 > +#3 0xffffffff808df7e1 at trap_pfault+0x201 > +#4 0xffffffff808dfc9f at trap+0x3df > +#5 0xffffffff808c7284 at calltrap+0x8 > +#6 0xffffffff80f8a2e5 at zfsctl_umount_snapshots+0xa5 > +#7 0xffffffff80f9b74f at zfs_umount+0x6f > +#8 0xffffffff8067dc1c at dounmount+0x26c > +#9 0xffffffff80681332 at vfs_unmountall+0x42 > +#10 0xffffffff805f1b70 at boot+0x790 > +#11 0xffffffff805f1e4c at reboot+0x6c > +#12 0xffffffff808deb44 at amd64_syscall+0x1f4 > +#13 0xffffffff808c757c at Xfast_syscall+0xfc > +Uptime: 10d23h49m19s > +FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10 20:35:13 GMT 2012 > +CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (3166.33-MHz > K8-class CPU) > +avail memory = 8196075520 (7816 MB) > +dcons_crom0: bus_addr 0x3d94000 > +pid 89559 (emacs) is using legacy pty devices - not logging anymore > +instruction pointer = 0x20:0xffffffff8060d275 > +#0 0xffffffff8063801e at kdb_backtrace+0x5e > +#1 0xffffffff80605163 at panic+0x183 > +#2 0xffffffff808f2da0 at trap_fatal+0x290 > +#3 0xffffffff808f30f1 at trap_pfault+0x201 > +#4 0xffffffff808f35af at trap+0x3df > +#5 0xffffffff808dab94 at calltrap+0x8 > +#6 0xffffffff80fa42e5 at zfsctl_umount_snapshots+0xa5 > +#7 0xffffffff80fb574f at zfs_umount+0x6f > +#8 0xffffffff8069103c at dounmount+0x26c > +#9 0xffffffff80695482 at vfs_unmountall+0x42 > +#10 0xffffffff80604f80 at boot+0x790 > +#11 0xffffffff8060525c at reboot+0x6c > +#12 0xffffffff808f2454 at amd64_syscall+0x1f4 > +#13 0xffffffff808dae8c at Xfast_syscall+0xfc > +Uptime: 2d10h51m47s > +FreeBSD 8.2-STABLE #3 r231563: Mon Feb 13 01:37:39 GMT 2012 > +avail memory = 8196034560 (7816 MB) > > -- I can't conform this yet, but I've a feeling that removing the > *last* snapshot is significant. Whether it's the last snapshot > of a particular zfs or the last snapshot in the zpool I don't know > yet. Testing this is a long-winded affair as I can't afford to > keep rebooting this server, and I need it to backup successfully > most of the time. > > -- The problem only seems to occur when snapshots are removed, so my > workaround for the time being is not to remove the snapshots I > create for each nightly backup. > > Cheers, > > Matthew > -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------