From owner-freebsd-fs@FreeBSD.ORG Tue Feb 14 10:00:22 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F7121065673 for ; Tue, 14 Feb 2012 10:00:22 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from smtp.infracaninophile.co.uk (smtp6.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) by mx1.freebsd.org (Postfix) with ESMTP id 4664C8FC17 for ; Tue, 14 Feb 2012 10:00:19 +0000 (UTC) Received: from seedling.black-earth.co.uk (seedling.black-earth.co.uk [IPv6:2001:8b0:151:1:fa1e:dfff:feda:c0bb]) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.14.5/8.14.5) with ESMTP id q1EA0AQM033304 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 14 Feb 2012 10:00:10 GMT (envelope-from matthew@FreeBSD.org) X-DKIM: OpenDKIM Filter v2.4.3 smtp.infracaninophile.co.uk q1EA0AQM033304 Authentication-Results: smtp.infracaninophile.co.uk/q1EA0AQM033304; dkim=none (no signature); dkim-adsp=none Message-ID: <4F3A30A2.9050603@FreeBSD.org> Date: Tue, 14 Feb 2012 10:00:02 +0000 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org References: <4F377457.4080807@FreeBSD.org> <20120212084052.GA43095@icarus.home.lan> <4F3789C1.9000903@FreeBSD.org> <4F37A8E7.7060102@brockmann-consult.de> <4F37B25A.10002@FreeBSD.org> <4F37BA49.50700@brockmann-consult.de> <4F37C52A.2030803@infracaninophile.co.uk> In-Reply-To: <4F37C52A.2030803@infracaninophile.co.uk> X-Enigmail-Version: 1.3.5 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigABA72FEEFCADB51739FE12FD" X-Virus-Scanned: clamav-milter 0.97.3 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lucid-nonsense.infracaninophile.co.uk Cc: Subject: Re: ZFS Snapshot problems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 10:00:22 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigABA72FEEFCADB51739FE12FD Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 12/02/2012 13:56, Matthew Seaman wrote: > On 12/02/2012 13:10, Peter Maloney wrote: >> > I don't know what side effects that change has though. You can usual= ly >> > assume that ZFS will just figure out the pool regardless of labels >> > (because it uses its own label metadata; see zdb output to see the o= ther >> > id), but apparently your case is something special, getting actual >> > errors instead of only wrong names. > Yes. This is most perplexing -- it's such a specific effect. The gpt > thing may well be a red herring. It is odd though that zdb somehow > discovers the gpart labels through reading zpool.cache, but zpool(1) > uses the gptids instead. Some more data about the underlying problem. -- There is another symptom: once the snapshots get wedged, the system will crash on shutdown. I don't have a crashdump or anything particularly useful, but this is what appeared in the kernel log: + +Fatal trap 12: page fault while in kernel mode +cpuid =3D 0; apic id =3D 00 +fault virtual address =3D 0xa8 +fault code =3D supervisor write data, page not present +instruction pointer =3D 0x20:0xffffffff805f9e65 +stack pointer =3D 0x28:0xffffff800003a920 +frame pointer =3D 0x28:0xffffff800003a930 +code segment =3D base 0x0, limit 0xfffff, type 0x1b + =3D DPL 0, pres 1, long 1, def32 0, gran 1 +processor eflags =3D interrupt enabled, resume, IOPL =3D 0 +current process =3D 1 (init) +trap number =3D 12 +panic: page fault +cpuid =3D 0 +KDB: stack backtrace: +#0 0xffffffff80624c0e at kdb_backtrace+0x5e +#1 0xffffffff805f1d53 at panic+0x183 +#2 0xffffffff808df490 at trap_fatal+0x290 +#3 0xffffffff808df7e1 at trap_pfault+0x201 +#4 0xffffffff808dfc9f at trap+0x3df +#5 0xffffffff808c7284 at calltrap+0x8 +#6 0xffffffff80f8a2e5 at zfsctl_umount_snapshots+0xa5 +#7 0xffffffff80f9b74f at zfs_umount+0x6f +#8 0xffffffff8067dc1c at dounmount+0x26c +#9 0xffffffff80681332 at vfs_unmountall+0x42 +#10 0xffffffff805f1b70 at boot+0x790 +#11 0xffffffff805f1e4c at reboot+0x6c +#12 0xffffffff808deb44 at amd64_syscall+0x1f4 +#13 0xffffffff808c757c at Xfast_syscall+0xfc +Uptime: 10d23h49m19s +FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10 20:35:13 GMT 2012 +CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (3166.33-MHz K8-class CPU) +avail memory =3D 8196075520 (7816 MB) +dcons_crom0: bus_addr 0x3d94000 +pid 89559 (emacs) is using legacy pty devices - not logging anymore +instruction pointer =3D 0x20:0xffffffff8060d275 +#0 0xffffffff8063801e at kdb_backtrace+0x5e +#1 0xffffffff80605163 at panic+0x183 +#2 0xffffffff808f2da0 at trap_fatal+0x290 +#3 0xffffffff808f30f1 at trap_pfault+0x201 +#4 0xffffffff808f35af at trap+0x3df +#5 0xffffffff808dab94 at calltrap+0x8 +#6 0xffffffff80fa42e5 at zfsctl_umount_snapshots+0xa5 +#7 0xffffffff80fb574f at zfs_umount+0x6f +#8 0xffffffff8069103c at dounmount+0x26c +#9 0xffffffff80695482 at vfs_unmountall+0x42 +#10 0xffffffff80604f80 at boot+0x790 +#11 0xffffffff8060525c at reboot+0x6c +#12 0xffffffff808f2454 at amd64_syscall+0x1f4 +#13 0xffffffff808dae8c at Xfast_syscall+0xfc +Uptime: 2d10h51m47s +FreeBSD 8.2-STABLE #3 r231563: Mon Feb 13 01:37:39 GMT 2012 +avail memory =3D 8196034560 (7816 MB) -- I can't conform this yet, but I've a feeling that removing the *last* snapshot is significant. Whether it's the last snapshot of a particular zfs or the last snapshot in the zpool I don't know yet. Testing this is a long-winded affair as I can't afford to keep rebooting this server, and I need it to backup successfully most of the time. -- The problem only seems to occur when snapshots are removed, so my workaround for the time being is not to remove the snapshots I create for each nightly backup. Cheers, Matthew --=20 Dr Matthew J Seaman MA, D.Phil. PGP: http://www.infracaninophile.co.uk/pgpkey --------------enigABA72FEEFCADB51739FE12FD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.16 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk86MKkACgkQ8Mjk52CukIxgegCfZQKceGfOlDNbBzwq9CZx4P17 zAUAn3Qh/8HJ9Qq0qHbj971zHDiV87dq =Y+9S -----END PGP SIGNATURE----- --------------enigABA72FEEFCADB51739FE12FD--