From owner-freebsd-fs@FreeBSD.ORG Thu Feb 9 13:29:52 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62CE31065670 for ; Thu, 9 Feb 2012 13:29:52 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.freebsd.org (Postfix) with ESMTP id E4FFE8FC13 for ; Thu, 9 Feb 2012 13:29:51 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis) id 0Mfeu7-1S9zqn1ED5-00Omyy; Thu, 09 Feb 2012 14:29:50 +0100 Message-ID: <4F33CA4D.6060400@brockmann-consult.de> Date: Thu, 09 Feb 2012 14:29:49 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:jiWBvVJchfwvi5iPwvfloJuItVFPOJgaFVCcDoxs8t1 aAnDgCxkA7ost1etPScOeIN6uVcFO3x1mC7DABE+RYgo57ilra 9RqYFQ1AWXgN+ooFpHEKe1/m4VwYwWgpUofL1fR2FQboOEj3ep Xd+DbdPxOksSFhIPiejTgzwHHMTs5jzfA6SGDSDVgikawYrh3B NZGmnEITPC8iRNH++0AKXwp0myUgsWrth50GLeJthsHoyalANg /8HFksJrcmjU3AJuowTcndJMkXS6rl2QqD9uBvLRn7uLxlsoUm EIK/2qEUnAjWXknauZdCeIn9dz4TofzsUMtpHp1uxxkw5hfB58 tBRTjt0FuGWXByA9GLH+QJUwiVZY7dJJn6ekdOGHX Subject: zfs snapdir NFS hang X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Feb 2012 13:29:52 -0000 So, I have an issue where after some point (arbitrary number of snapshots? a specific snapshot? gremlins?), exporting a directory that contains a .zfs directory, whether or not snapdir=hidden is set, then listing the directory in the NFS client causes a total hang of the dataset and some commands like "zdb -d poolname". I don't know the root cause, so I don't know how to reproduce it, or create a PR. eg. |# echo /tank/dataset -maproot=root 10.10.10.10 >> /etc/exports| |# kill -HUP `cat /var/run/mountd.pid`| |# tail /var/log/messages| Code: Feb 8 15:47:54 bcnas1 mountd[46760]: can't delete exports for /tank/dataset/.zfs/snapshot/replication-20120204134001: Invalid argument Feb 8 15:47:54 bcnas1 mountd[46760]: can't delete exports for /tank/dataset/.zfs/snapshot/replication-20120208140000: Invalid argument ... (I would think the above shows that the NFS server is not really compatible with this situation / buggy) |# ssh 10.10.10.10 "mount bcnas1:/tank/dataset /mountpoint ; ls /mountpoint/.zfs/snapshot"| (hang on this command, and anything else after this poing using the same dataset) There was a point when this would not hang, but instead just show many directories, and then for many other snapshots (all of the ones listed in the /var/log/messages errors and more), there would be strange binary files, or directories with the wrong files in them (it would show me a subdirectory of the correct root of the snapshot). All of these problems happen whether or not I set snapdir=hidden or snapdir=visible. I am currently running 8-STABLE from Sept. 28th. Today an identical problem happened, and I rebooted to fix it. I don't know if it was the same cause, but I would like to find out. Can someone give me ideas of how to track the problem, or tell me which source files I should open up in /usr/src, hack apart or add debugging and either: * find the root cause of the problem * prevent NFS from exporting any .zfs directories Or does someone know if this has been fixed in the latest 8-STABLE or 9? The best workaround I can think of is reorganizing all my datasets so the root directory is empty except one directory, and then share only that subdirectory which does not contain a .zfs directory (or any other child datasets). But ideally, nfs clients should be able to view snapshots to recover files.