From owner-freebsd-questions@FreeBSD.ORG Fri Aug 16 13:49:42 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B84A6A49 for ; Fri, 16 Aug 2013 13:49:42 +0000 (UTC) (envelope-from dweimer@dweimer.net) Received: from webmail.dweimer.net (24-240-198-187.static.stls.mo.charter.com [24.240.198.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 442B6280D for ; Fri, 16 Aug 2013 13:49:41 +0000 (UTC) Received: from www.dweimer.net (webmail.dweimer.local [192.168.5.2]) by webmail.dweimer.net (8.14.5/8.14.5) with ESMTP id r7GDnepR087755 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 16 Aug 2013 08:49:40 -0500 (CDT) (envelope-from dweimer@dweimer.net) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 16 Aug 2013 08:49:40 -0500 From: dweimer To: freebsd-questions@freebsd.org Subject: Re: ZFS Snapshots Not able to be accessed under .zfs/snapshot/name Organization: dweimer.net Mail-Reply-To: dweimer@dweimer.net In-Reply-To: <776e30b627bf30ece7545e28b2a2e064@dweimer.net> References: <22a7343f4573d6faac5aec1d7c9a1135@dweimer.net> <520C405A.6000408@ShaneWare.Biz> <776e30b627bf30ece7545e28b2a2e064@dweimer.net> Message-ID: <23413f3a4b95328c0bc838e6ffad364d@dweimer.net> X-Sender: dweimer@dweimer.net User-Agent: Roundcube Webmail/0.8.1 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: dweimer@dweimer.net List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Aug 2013 13:49:42 -0000 On 08/15/2013 10:00 am, dweimer wrote: > On 08/14/2013 9:43 pm, Shane Ambler wrote: >> On 14/08/2013 22:57, dweimer wrote: >>> I have a few systems running on ZFS with a backup script that creates >>> snapshots, then backs up the .zfs/snapshot/name directory to make >>> sure >>> open files are not missed. This has been working great but all of >>> the >>> sudden one of my systems has stopped working. It takes the snapshots >>> fine, zfs list -t spnapshot shows the snapshots, but if you do an ls >>> command, on the .zfs/snapshot/ directory it returns not a directory. >>> >>> part of the zfs list output: >>> >>> NAME USED AVAIL REFER MOUNTPOINT >>> zroot 4.48G 29.7G 31K none >>> zroot/ROOT 2.92G 29.7G 31K none >>> zroot/ROOT/91p5-20130812 2.92G 29.7G 2.92G legacy >>> zroot/home 144K 29.7G 122K /home >>> >>> part of the zfs list -t snapshot output: >>> >>> NAME USED AVAIL REFER >>> MOUNTPOINT >>> zroot/ROOT/91p5-20130812@91p5-20130812--bsnap 340K - 2.92G - >>> zroot/home@home--bsnap 22K - 122K - >>> >>> ls /.zfs/snapshot/91p5-20130812--bsnap/ >>> Does work at the right now, since the last reboot, but wasn't always >>> working, this is my boot environment. >>> >>> if I do ls /home/.zfs/snapshot/, result is: >>> ls: /home/.zfs/snapshot/: Not a directory >>> >>> if I do ls /home/.zfs, result is: >>> ls: snapshot: Bad file descriptor >>> shares >>> >>> I have tried zpool scrub zroot, no errors were found, if I reboot the >>> system I can get one good backup, then I start having problems. >>> Anyone >>> else ever ran into this, any suggestions as to a fix? >>> >>> System is running FreeBSD 9.1-RELEASE-p5 #1 r253764: Mon Jul 29 >>> 15:07:35 >>> CDT 2013, zpool is running version 28, zfs is running version 5 >>> >> >> >> I can say I've had this problem. Not certain what fixed it. I do >> remember I decided to stop snapshoting if I couldn't access them and >> deleted existing snapshots. I later restarted the machine before I >> went back for another look and they were working. >> >> So my guess is a restart without existing snapshots may be the key. >> >> Now if only we could find out what started the issue so we can stop it >> happening again. > > I had actually rebooted it last night, prior to seeing this message, I > do know it didn't have any snapshots this time. As I am booting from > ZFS using boot environments I may have had an older boot environment > still on the system the last time it was rebooted. Backups ran great > last night after the reboot, and I was able to kick off my pre-backup > job and access all the snapshots today. Hopefully it doesn't come > back, but if it does I will see if I can find anything else wrong. > > FYI, > It didn't shutdown cleanly, so if this helps anyone find the issue, > this is from my system logs: > Aug 14 22:08:04 cblproxy1 kernel: > Aug 14 22:08:04 cblproxy1 kernel: Fatal trap 12: page fault while in > kernel mode > Aug 14 22:08:04 cblproxy1 kernel: cpuid = 0; apic id = 00 > Aug 14 22:08:04 cblproxy1 kernel: fault virtual address = 0xa8 > Aug 14 22:08:04 cblproxy1 kernel: fault code = supervisor > write data, page not present > Aug 14 22:08:04 cblproxy1 kernel: instruction pointer = > 0x20:0xffffffff808b0562 > Aug 14 22:08:04 cblproxy1 kernel: stack pointer = > 0x28:0xffffff80002238f0 > Aug 14 22:08:04 cblproxy1 kernel: frame pointer = > 0x28:0xffffff8000223910 > Aug 14 22:08:04 cblproxy1 kernel: code segment = base 0x0, > limit 0xfffff, type 0x1b > Aug 14 22:08:04 cblproxy1 kernel: = DPL 0, pres 1, long 1, def32 0, > gran 1 > Aug 14 22:08:04 cblproxy1 kernel: processor eflags = interrupt > enabled, resume, IOPL = 0 > Aug 14 22:08:04 cblproxy1 kernel: current process = 1 > (init) > Aug 14 22:08:04 cblproxy1 kernel: trap number = 12 > Aug 14 22:08:04 cblproxy1 kernel: panic: page fault > Aug 14 22:08:04 cblproxy1 kernel: cpuid = 0 > Aug 14 22:08:04 cblproxy1 kernel: KDB: stack backtrace: > Aug 14 22:08:04 cblproxy1 kernel: #0 0xffffffff808ddaf0 at > kdb_backtrace+0x60 > Aug 14 22:08:04 cblproxy1 kernel: #1 0xffffffff808a951d at panic+0x1fd > Aug 14 22:08:04 cblproxy1 kernel: #2 0xffffffff80b81578 at > trap_fatal+0x388 > Aug 14 22:08:04 cblproxy1 kernel: #3 0xffffffff80b81836 at > trap_pfault+0x2a6 > Aug 14 22:08:04 cblproxy1 kernel: #4 0xffffffff80b80ea1 at trap+0x2a1 > Aug 14 22:08:04 cblproxy1 kernel: #5 0xffffffff80b6c7b3 at calltrap+0x8 > Aug 14 22:08:04 cblproxy1 kernel: #6 0xffffffff815276da at > zfsctl_umount_snapshots+0x8a > Aug 14 22:08:04 cblproxy1 kernel: #7 0xffffffff81536766 at > zfs_umount+0x76 > Aug 14 22:08:04 cblproxy1 kernel: #8 0xffffffff809340bc at > dounmount+0x3cc > Aug 14 22:08:04 cblproxy1 kernel: #9 0xffffffff8093c101 at > vfs_unmountall+0x71 > Aug 14 22:08:04 cblproxy1 kernel: #10 0xffffffff808a8eae at > kern_reboot+0x4ee > Aug 14 22:08:04 cblproxy1 kernel: #11 0xffffffff808a89c0 at > kern_reboot+0 > Aug 14 22:08:04 cblproxy1 kernel: #12 0xffffffff80b81dab at > amd64_syscall+0x29b > Aug 14 22:08:04 cblproxy1 kernel: #13 0xffffffff80b6ca9b at > Xfast_syscall+0xfb Well its back, 3 of the 8 file systems I am taking snapshots of failed in last nights backups. The only thing different on this system from all the 4 others I have running is that it has a second disk volume with a UFS file system. Setup is 2 Disks, both setup with GPART: => 34 83886013 da0 GPT (40G) 34 256 1 boot0 (128k) 290 10485760 2 swap0 (5.0G) 10486050 73399997 3 zroot0 (35G) => 34 41942973 da1 GPT (20G) 34 41942973 1 squid1 (20G) I didn't want the Squid cache directory on ZFS, system is running on an ESX 4.1 server backed by iSCSI SAN. I have 4 other servers running on the same group of ESX servers and SAN, booting from ZFS without this problem. Two of the other 4 are also running Squid but forward to this one so they are running without a local disk cache. -- Thanks, Dean E. Weimer http://www.dweimer.net/