From owner-freebsd-questions@FreeBSD.ORG  Fri Aug 16 13:49:42 2013
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id B84A6A49
 for <freebsd-questions@freebsd.org>; Fri, 16 Aug 2013 13:49:42 +0000 (UTC)
 (envelope-from dweimer@dweimer.net)
Received: from webmail.dweimer.net (24-240-198-187.static.stls.mo.charter.com
 [24.240.198.187])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 442B6280D
 for <freebsd-questions@freebsd.org>; Fri, 16 Aug 2013 13:49:41 +0000 (UTC)
Received: from www.dweimer.net (webmail.dweimer.local [192.168.5.2])
 by webmail.dweimer.net (8.14.5/8.14.5) with ESMTP id r7GDnepR087755
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-questions@freebsd.org>; Fri, 16 Aug 2013 08:49:40 -0500 (CDT)
 (envelope-from dweimer@dweimer.net)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Fri, 16 Aug 2013 08:49:40 -0500
From: dweimer <dweimer@dweimer.net>
To: freebsd-questions@freebsd.org
Subject: Re: ZFS Snapshots Not able to be accessed under .zfs/snapshot/name
Organization: dweimer.net
Mail-Reply-To: dweimer@dweimer.net
In-Reply-To: <776e30b627bf30ece7545e28b2a2e064@dweimer.net>
References: <22a7343f4573d6faac5aec1d7c9a1135@dweimer.net>
 <520C405A.6000408@ShaneWare.Biz>
 <776e30b627bf30ece7545e28b2a2e064@dweimer.net>
Message-ID: <23413f3a4b95328c0bc838e6ffad364d@dweimer.net>
X-Sender: dweimer@dweimer.net
User-Agent: Roundcube Webmail/0.8.1
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: dweimer@dweimer.net
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Aug 2013 13:49:42 -0000

On 08/15/2013 10:00 am, dweimer wrote:
> On 08/14/2013 9:43 pm, Shane Ambler wrote:
>> On 14/08/2013 22:57, dweimer wrote:
>>> I have a few systems running on ZFS with a backup script that creates
>>> snapshots, then  backs up the .zfs/snapshot/name directory to make 
>>> sure
>>> open files are not missed.  This has been working great but all of 
>>> the
>>> sudden one of my systems has stopped working.  It takes the snapshots
>>> fine, zfs list -t spnapshot shows the snapshots, but if you do an ls
>>> command, on the .zfs/snapshot/ directory it returns not a directory.
>>> 
>>> part of the zfs list output:
>>> 
>>> NAME                        USED  AVAIL  REFER  MOUNTPOINT
>>> zroot                      4.48G  29.7G    31K  none
>>> zroot/ROOT                 2.92G  29.7G    31K  none
>>> zroot/ROOT/91p5-20130812   2.92G  29.7G  2.92G  legacy
>>> zroot/home                  144K  29.7G   122K  /home
>>> 
>>> part of the zfs list -t snapshot output:
>>> 
>>> NAME                                            USED  AVAIL  REFER
>>> MOUNTPOINT
>>> zroot/ROOT/91p5-20130812@91p5-20130812--bsnap   340K      -  2.92G  -
>>> zroot/home@home--bsnap                           22K      -   122K  -
>>> 
>>> ls /.zfs/snapshot/91p5-20130812--bsnap/
>>> Does work at the right now, since the last reboot, but wasn't always
>>> working, this is my boot environment.
>>> 
>>> if I do ls /home/.zfs/snapshot/, result is:
>>> ls: /home/.zfs/snapshot/: Not a directory
>>> 
>>> if I do ls /home/.zfs, result is:
>>> ls: snapshot: Bad file descriptor
>>> shares
>>> 
>>> I have tried zpool scrub zroot, no errors were found, if I reboot the
>>> system I can get one good backup, then I start having problems.  
>>> Anyone
>>> else ever ran into this, any suggestions as to a fix?
>>> 
>>> System is running FreeBSD 9.1-RELEASE-p5 #1 r253764: Mon Jul 29 
>>> 15:07:35
>>> CDT 2013, zpool is running version 28, zfs is running version 5
>>> 
>> 
>> 
>> I can say I've had this problem. Not certain what fixed it. I do
>> remember I decided to stop snapshoting if I couldn't access them and
>> deleted existing snapshots. I later restarted the machine before I
>> went back for another look and they were working.
>> 
>> So my guess is a restart without existing snapshots may be the key.
>> 
>> Now if only we could find out what started the issue so we can stop it
>> happening again.
> 
> I had actually rebooted it last night, prior to seeing this message, I
> do know it didn't have any snapshots this time.  As I am booting from
> ZFS using boot environments I may have had an older boot environment
> still on the system the last time it was rebooted.  Backups ran great
> last night after the reboot, and I was able to kick off my pre-backup
> job and access all the snapshots today.  Hopefully it doesn't come
> back, but if it does I will see if I can find anything else wrong.
> 
> FYI,
> It didn't shutdown cleanly, so if this helps anyone find the issue,
> this is from my system logs:
> Aug 14 22:08:04 cblproxy1 kernel:
> Aug 14 22:08:04 cblproxy1 kernel: Fatal trap 12: page fault while in 
> kernel mode
> Aug 14 22:08:04 cblproxy1 kernel: cpuid = 0; apic id = 00
> Aug 14 22:08:04 cblproxy1 kernel: fault virtual address = 0xa8
> Aug 14 22:08:04 cblproxy1 kernel: fault code            = supervisor
> write data, page not present
> Aug 14 22:08:04 cblproxy1 kernel: instruction pointer   =
> 0x20:0xffffffff808b0562
> Aug 14 22:08:04 cblproxy1 kernel: stack pointer         =
> 0x28:0xffffff80002238f0
> Aug 14 22:08:04 cblproxy1 kernel: frame pointer         =
> 0x28:0xffffff8000223910
> Aug 14 22:08:04 cblproxy1 kernel: code segment          = base 0x0,
> limit 0xfffff, type 0x1b
> Aug 14 22:08:04 cblproxy1 kernel: = DPL 0, pres 1, long 1, def32 0, 
> gran 1
> Aug 14 22:08:04 cblproxy1 kernel: processor eflags      = interrupt
> enabled, resume, IOPL = 0
> Aug 14 22:08:04 cblproxy1 kernel: current process               = 1 
> (init)
> Aug 14 22:08:04 cblproxy1 kernel: trap number           = 12
> Aug 14 22:08:04 cblproxy1 kernel: panic: page fault
> Aug 14 22:08:04 cblproxy1 kernel: cpuid = 0
> Aug 14 22:08:04 cblproxy1 kernel: KDB: stack backtrace:
> Aug 14 22:08:04 cblproxy1 kernel: #0 0xffffffff808ddaf0 at 
> kdb_backtrace+0x60
> Aug 14 22:08:04 cblproxy1 kernel: #1 0xffffffff808a951d at panic+0x1fd
> Aug 14 22:08:04 cblproxy1 kernel: #2 0xffffffff80b81578 at 
> trap_fatal+0x388
> Aug 14 22:08:04 cblproxy1 kernel: #3 0xffffffff80b81836 at 
> trap_pfault+0x2a6
> Aug 14 22:08:04 cblproxy1 kernel: #4 0xffffffff80b80ea1 at trap+0x2a1
> Aug 14 22:08:04 cblproxy1 kernel: #5 0xffffffff80b6c7b3 at calltrap+0x8
> Aug 14 22:08:04 cblproxy1 kernel: #6 0xffffffff815276da at
> zfsctl_umount_snapshots+0x8a
> Aug 14 22:08:04 cblproxy1 kernel: #7 0xffffffff81536766 at 
> zfs_umount+0x76
> Aug 14 22:08:04 cblproxy1 kernel: #8 0xffffffff809340bc at 
> dounmount+0x3cc
> Aug 14 22:08:04 cblproxy1 kernel: #9 0xffffffff8093c101 at 
> vfs_unmountall+0x71
> Aug 14 22:08:04 cblproxy1 kernel: #10 0xffffffff808a8eae at 
> kern_reboot+0x4ee
> Aug 14 22:08:04 cblproxy1 kernel: #11 0xffffffff808a89c0 at 
> kern_reboot+0
> Aug 14 22:08:04 cblproxy1 kernel: #12 0xffffffff80b81dab at 
> amd64_syscall+0x29b
> Aug 14 22:08:04 cblproxy1 kernel: #13 0xffffffff80b6ca9b at 
> Xfast_syscall+0xfb

Well its back, 3 of the 8 file systems I am taking snapshots of failed 
in last nights backups.

The only thing different on this system from all the 4 others I have 
running is that it has a second disk volume with a UFS file system.

Setup is 2 Disks, both setup with GPART:
=>      34  83886013  da0  GPT  (40G)
         34       256    1  boot0  (128k)
        290  10485760    2  swap0  (5.0G)
   10486050  73399997    3  zroot0  (35G)

=>      34  41942973  da1  GPT  (20G)
         34  41942973    1  squid1  (20G)

I didn't want the Squid cache directory on ZFS, system is running on an 
ESX 4.1 server backed by iSCSI SAN.  I have 4 other servers running on  
the same group of ESX servers and SAN, booting from ZFS without this 
problem.  Two of the other 4 are also running Squid but forward to this 
one so they are running without a local disk cache.


-- 
Thanks,
    Dean E. Weimer
    http://www.dweimer.net/