Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2012 10:00:02 +0000
From:      Matthew Seaman <matthew@FreeBSD.org>
To:        freebsd-fs@FreeBSD.org
Subject:   Re: ZFS Snapshot problems
Message-ID:  <4F3A30A2.9050603@FreeBSD.org>
In-Reply-To: <4F37C52A.2030803@infracaninophile.co.uk>
References:  <4F377457.4080807@FreeBSD.org> <20120212084052.GA43095@icarus.home.lan> <4F3789C1.9000903@FreeBSD.org> <4F37A8E7.7060102@brockmann-consult.de> <4F37B25A.10002@FreeBSD.org> <4F37BA49.50700@brockmann-consult.de> <4F37C52A.2030803@infracaninophile.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigABA72FEEFCADB51739FE12FD
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 12/02/2012 13:56, Matthew Seaman wrote:
> On 12/02/2012 13:10, Peter Maloney wrote:
>> > I don't know what side effects that change has though. You can usual=
ly
>> > assume that ZFS will just figure out the pool regardless of labels
>> > (because it uses its own label metadata; see zdb output to see the o=
ther
>> > id), but apparently your case is something special, getting actual
>> > errors instead of only wrong names.

> Yes.  This is most perplexing -- it's such a specific effect.  The gpt
> thing may well be a red herring.  It is odd though that zdb somehow
> discovers the gpart labels through reading zpool.cache, but zpool(1)
> uses the gptids instead.

Some more data about the underlying problem.

  -- There is another symptom: once the snapshots get wedged, the
     system will crash on shutdown.  I don't have a crashdump or
     anything particularly useful, but this is what appeared in the
     kernel log:

+
+Fatal trap 12: page fault while in kernel mode
+cpuid =3D 0; apic id =3D 00
+fault virtual address	=3D 0xa8
+fault code		=3D supervisor write data, page not present
+instruction pointer	=3D 0x20:0xffffffff805f9e65
+stack pointer	        =3D 0x28:0xffffff800003a920
+frame pointer	        =3D 0x28:0xffffff800003a930
+code segment		=3D base 0x0, limit 0xfffff, type 0x1b
+			=3D DPL 0, pres 1, long 1, def32 0, gran 1
+processor eflags	=3D interrupt enabled, resume, IOPL =3D 0
+current process		=3D 1 (init)
+trap number		=3D 12
+panic: page fault
+cpuid =3D 0
+KDB: stack backtrace:
+#0 0xffffffff80624c0e at kdb_backtrace+0x5e
+#1 0xffffffff805f1d53 at panic+0x183
+#2 0xffffffff808df490 at trap_fatal+0x290
+#3 0xffffffff808df7e1 at trap_pfault+0x201
+#4 0xffffffff808dfc9f at trap+0x3df
+#5 0xffffffff808c7284 at calltrap+0x8
+#6 0xffffffff80f8a2e5 at zfsctl_umount_snapshots+0xa5
+#7 0xffffffff80f9b74f at zfs_umount+0x6f
+#8 0xffffffff8067dc1c at dounmount+0x26c
+#9 0xffffffff80681332 at vfs_unmountall+0x42
+#10 0xffffffff805f1b70 at boot+0x790
+#11 0xffffffff805f1e4c at reboot+0x6c
+#12 0xffffffff808deb44 at amd64_syscall+0x1f4
+#13 0xffffffff808c757c at Xfast_syscall+0xfc
+Uptime: 10d23h49m19s
+FreeBSD 8.2-STABLE #2 r231394: Fri Feb 10 20:35:13 GMT 2012
+CPU: Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (3166.33-MHz
K8-class CPU)
+avail memory =3D 8196075520 (7816 MB)
+dcons_crom0: bus_addr 0x3d94000
+pid 89559 (emacs) is using legacy pty devices - not logging anymore
+instruction pointer	=3D 0x20:0xffffffff8060d275
+#0 0xffffffff8063801e at kdb_backtrace+0x5e
+#1 0xffffffff80605163 at panic+0x183
+#2 0xffffffff808f2da0 at trap_fatal+0x290
+#3 0xffffffff808f30f1 at trap_pfault+0x201
+#4 0xffffffff808f35af at trap+0x3df
+#5 0xffffffff808dab94 at calltrap+0x8
+#6 0xffffffff80fa42e5 at zfsctl_umount_snapshots+0xa5
+#7 0xffffffff80fb574f at zfs_umount+0x6f
+#8 0xffffffff8069103c at dounmount+0x26c
+#9 0xffffffff80695482 at vfs_unmountall+0x42
+#10 0xffffffff80604f80 at boot+0x790
+#11 0xffffffff8060525c at reboot+0x6c
+#12 0xffffffff808f2454 at amd64_syscall+0x1f4
+#13 0xffffffff808dae8c at Xfast_syscall+0xfc
+Uptime: 2d10h51m47s
+FreeBSD 8.2-STABLE #3 r231563: Mon Feb 13 01:37:39 GMT 2012
+avail memory =3D 8196034560 (7816 MB)

   -- I can't conform this yet, but I've a feeling that removing the
      *last* snapshot is significant.  Whether it's the last snapshot
      of a particular zfs or the last snapshot in the zpool I don't know
      yet.  Testing this is a long-winded affair as I can't afford to
      keep rebooting this server, and I need it to backup successfully
      most of the time.

   -- The problem only seems to occur when snapshots are removed, so my
      workaround for the time being is not to remove the snapshots I
      create for each nightly backup.

	Cheers,

	Matthew

--=20
Dr Matthew J Seaman MA, D.Phil.
PGP: http://www.infracaninophile.co.uk/pgpkey



--------------enigABA72FEEFCADB51739FE12FD
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk86MKkACgkQ8Mjk52CukIxgegCfZQKceGfOlDNbBzwq9CZx4P17
zAUAn3Qh/8HJ9Qq0qHbj971zHDiV87dq
=Y+9S
-----END PGP SIGNATURE-----

--------------enigABA72FEEFCADB51739FE12FD--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F3A30A2.9050603>