Date: Tue, 14 Feb 2012 10:00:02 +0000 From: Matthew Seaman <matthew@FreeBSD.org> To: freebsd-fs@FreeBSD.org Subject: Re: ZFS Snapshot problems Message-ID: <4F3A30A2.9050603@FreeBSD.org> In-Reply-To: <4F37C52A.2030803@infracaninophile.co.uk> References: <4F377457.4080807@FreeBSD.org> <20120212084052.GA43095@icarus.home.lan> <4F3789C1.9000903@FreeBSD.org> <4F37A8E7.7060102@brockmann-consult.de> <4F37B25A.10002@FreeBSD.org> <4F37BA49.50700@brockmann-consult.de> <4F37C52A.2030803@infracaninophile.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigABA72FEEFCADB51739FE12FD
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
On 12/02/2012 13:56, Matthew Seaman wrote:
> On 12/02/2012 13:10, Peter Maloney wrote:
>> > I don't know what side effects that change has though. You can usual=
ly
>> > assume that ZFS will just figure out the pool regardless of labels
>> > (because it uses its own label metadata; see zdb output to see the o=
ther
>> > id), but apparently your case is something special, getting actual
>> > errors instead of only wrong names.
> Yes. This is most perplexing -- it's such a specific effect. The gpt
> thing may well be a red herring. It is odd though that zdb somehow
> discovers the gpart labels through reading zpool.cache, but zpool(1)
> uses the gptids instead.
Some more data about the underlying problem.
-- There is another symptom: once the snapshots get wedged, the
system will crash on shutdown. I don't have a crashdump or
anything particularly useful, but this is what appeared in the
kernel log:
+
+Fatal trap 12: page fault while in kernel mode
+cpuid =3D 0; apic id =3D 00
+fault virtual address =3D 0xa8
+fault code =3D supervisor write data, page not present
+instruction pointer =3D 0x20:0xffffffff805f9e65
+stack pointer =3D 0x28:0xffffff800003a920
+frame pointer =3D 0x28:0xffffff800003a930
+code segment =3D base 0x0, limit 0xfffff, type 0x1b
+ =3D DPL 0, pres 1, long 1, def32 0, gran 1
+processor eflags =3D interrupt enabled, resume, IOPL =3D 0
+current process =3D 1 (init)
+trap number =3D 12
+panic: page fault
+cpuid =3D 0
+KDB: stack backtrace:
+#0 0xffffffff80624c0e at kdb_backtrace+0x5e
+#1 0xffffffff805f1d53 at panic+0x183
+#2 0xffffffff808df490 at trap_fatal+0x290
+#3 0xffffffff808df7e1 at trap_pfault+0x201
+#4 0xffffffff808dfc9f at trap+0x3df
+#5 0xffffffff808c7284 at calltrap+0x8
+#6 0xffffffff80f8a2e5 at zfsctl_umount_snapshots+0xa5
+#7 0xffffffff80f9b74f at zfs_umount+0x6f
+#8 0xffffffff8067dc1c at dounmount+0x26c
+#9 0xffffffff80681332 at vfs_unmountall+0x42
+#10 0xffffffff805f1b70 at boot+0x790
+#11 0xffffffff805f1e4c at reboot+0x6c
+#12 0xffffffff808deb44 at amd64_syscall+0x1f4
+#13 0xffffffff808c757c at Xfast_syscall+0xfc
+Uptime: 10d23h49m19s
+FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10 20:35:13 GMT 2012
+CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz (3166.33-MHz
K8-class CPU)
+avail memory =3D 8196075520 (7816 MB)
+dcons_crom0: bus_addr 0x3d94000
+pid 89559 (emacs) is using legacy pty devices - not logging anymore
+instruction pointer =3D 0x20:0xffffffff8060d275
+#0 0xffffffff8063801e at kdb_backtrace+0x5e
+#1 0xffffffff80605163 at panic+0x183
+#2 0xffffffff808f2da0 at trap_fatal+0x290
+#3 0xffffffff808f30f1 at trap_pfault+0x201
+#4 0xffffffff808f35af at trap+0x3df
+#5 0xffffffff808dab94 at calltrap+0x8
+#6 0xffffffff80fa42e5 at zfsctl_umount_snapshots+0xa5
+#7 0xffffffff80fb574f at zfs_umount+0x6f
+#8 0xffffffff8069103c at dounmount+0x26c
+#9 0xffffffff80695482 at vfs_unmountall+0x42
+#10 0xffffffff80604f80 at boot+0x790
+#11 0xffffffff8060525c at reboot+0x6c
+#12 0xffffffff808f2454 at amd64_syscall+0x1f4
+#13 0xffffffff808dae8c at Xfast_syscall+0xfc
+Uptime: 2d10h51m47s
+FreeBSD 8.2-STABLE #3 r231563: Mon Feb 13 01:37:39 GMT 2012
+avail memory =3D 8196034560 (7816 MB)
-- I can't conform this yet, but I've a feeling that removing the
*last* snapshot is significant. Whether it's the last snapshot
of a particular zfs or the last snapshot in the zpool I don't know
yet. Testing this is a long-winded affair as I can't afford to
keep rebooting this server, and I need it to backup successfully
most of the time.
-- The problem only seems to occur when snapshots are removed, so my
workaround for the time being is not to remove the snapshots I
create for each nightly backup.
Cheers,
Matthew
--=20
Dr Matthew J Seaman MA, D.Phil.
PGP: http://www.infracaninophile.co.uk/pgpkey
--------------enigABA72FEEFCADB51739FE12FD
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk86MKkACgkQ8Mjk52CukIxgegCfZQKceGfOlDNbBzwq9CZx4P17
zAUAn3Qh/8HJ9Qq0qHbj971zHDiV87dq
=Y+9S
-----END PGP SIGNATURE-----
--------------enigABA72FEEFCADB51739FE12FD--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F3A30A2.9050603>
