From owner-freebsd-fs@FreeBSD.ORG Sun Feb 15 11:08:55 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2D1791065670; Sun, 15 Feb 2009 11:08:55 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from koef.zs64.net (koef.zs64.net [212.12.50.230]) by mx1.freebsd.org (Postfix) with ESMTP id CA1D68FC1B; Sun, 15 Feb 2009 11:08:54 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from localhost by koef.zs64.net (8.14.3/8.14.3) with ESMTP id n1FB8qhw003595 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 15 Feb 2009 12:08:53 +0100 (CET) (envelope-from stb@lassitu.de) (authenticated as stb) Message-Id: <171C5946-63D1-4AC7-89F7-A951BEF3D1C6@lassitu.de> From: Stefan Bethke To: freebsd-fs@freebsd.org In-Reply-To: <3A302EE1-F54D-4415-BC13-CA8ABBA320EC@lassitu.de> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Date: Sun, 15 Feb 2009 12:08:52 +0100 References: <76873DDF-D21B-48AF-9AFB-5A2747BE406B@lassitu.de> <3A302EE1-F54D-4415-BC13-CA8ABBA320EC@lassitu.de> X-Mailer: Apple Mail (2.930.3) Cc: Pawel Jakub Dawidek Subject: Re: zfs: using, then destroying a snapshot sometimes panics zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Feb 2009 11:08:55 -0000 Am 15.02.2009 um 11:39 schrieb Stefan Bethke: > Am 08.02.2009 um 14:37 schrieb Stefan Bethke: > >> Sorry I can't be more precise at the moment, but while creating a >> script that mirrors some zfs filesystems to another machine, I've >> now twice gotten weird behaviour and then a panic. >> >> The script iterates over a couple of zfs file systems: >> - creates a snapshot with zfs snapshot tank/foo@mirror >> - uses rsync to copy the contents of the snapshot with rsync /tank/ >> foo/.zfs/snapshot/mirror/ dest:... >> - destroys the snapshot with zfs destroy tank/foo@mirror >> >> During testing the script, I twice got to a point where, after the >> snapshot was created without an error message, rsync dropped out >> with an error message similar to "invalid file handle" on /tank/ >> foo/.zfs/snapshot. >> >> At that point, I could cd to /tank/foo/.zfs, but ls produced the >> same error message. >> >> I then tried to unmount the snapshot with zfs umount, and got a >> panic (which I also didn't manage to capture). >> >> Is this a generally known issue, or should I try to capture more >> information when this happens again? > > > # cd /tank/foo/.zfs > # ls -l > ls: snapshot: Bad file descriptor > total 0 > # cd snapshot > -su: cd: snapshot: Not a directory > > I currently have no snapshots: > # zfs list -t snapshot > no datasets available > > However, on a different file system, I can list and cd into snapshot: > # /tank/bar/.zfs > # ls -l > total 0 > dr-xr-xr-x 2 root wheel 2 Feb 8 00:43 snapshot/ > # cd snapshot > > Trying to umount produces a panic: > # zfs umount /jail/foo > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0xa8 > fault code = supervisor write data, page not present > instruction pointer = 0x8:0xffffffff802ee565 > stack pointer = 0x10:0xfffffffea29c39e0 > frame pointer = 0x10:0xfffffffea29c39f0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 51383 (zfs) > [thread pid 51383 tid 100298 ] > Stopped at _sx_xlock+0x15: lock cmpxchgq %rsi,0x18(%rdi) > db> bt > Tracing pid 51383 tid 100298 td 0xffffff00a598e720 > _sx_xlock() at _sx_xlock+0x15 > zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5 > zfs_umount() at zfs_umount+0xdd > dounmount() at dounmount+0x2b4 > unmount() at unmount+0x24b > syscall() at syscall+0x1a5 > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f412fc, rsp = > 0x7fffffffd1a8, rbp = 0x801202300 --- > db> call doadump > Physical memory: 3314 MB > Dumping 1272 MB: 1257 1241 1225 1209 1193 1177 1161 1145 1129 1113 > 1097 1081 1065 1049 1033 1017 1001 985 969 953 937 921 905 889 873 > 857 841 825 809 793 777 761 745 729 713 697 681 665 649 633 617 601 > 585 569 553 537 521 505 489 473 457 441 425 409 393 377 361 345 329 > 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41 > 25 9 > Dump complete > = 0 > > I've got the crashdump saved, if there's any information in there > that can be helpful. > > This is -current from a week ago on amd64. > > At the current rate, this happens every couple of days, so gathering > more information on the live system probably won't be a problem. Different machine, identical configuration, I just got this panic on reboot: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xa8 fault code = supervisor write data, page not present instruction pointer = 0x8:0xffffffff802ee3b5 stack pointer = 0x10:0xfffffffe40016980 frame pointer = 0x10:0xfffffffe40016990 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1 (init) [thread pid 1 tid 100002 ] Stopped at _sx_xlock+0x15: lock cmpxchgq %rsi,0x18(%rdi) db> bt Tracing pid 1 tid 100002 td 0xffffff000141fab0 _sx_xlock() at _sx_xlock+0x15 zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5 zfs_umount() at zfs_umount+0xdd dounmount() at dounmount+0x2b4 vfs_unmountall() at vfs_unmountall+0x42 boot() at boot+0x655 reboot() at reboot+0x42 syscall() at syscall+0x1a5 Xfast_syscall() at Xfast_syscall+0xab --- syscall (55, FreeBSD ELF64, reboot), rip = 0x40897c, rsp = 0x7fffffffe7b8, rbp = 0x402420 --- -- Stefan Bethke Fon +49 151 14070811