Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Mar 2012 11:01:32 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Peter Maloney <peter.maloney@brockmann-consult.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Deadlock (?) with ZFS, NFS
Message-ID:  <1116960020.624724.1331222492424.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <4F5861B7.7010201@brockmann-consult.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Maloney wrote:
> On 03/08/2012 07:02 AM, Garrett Wollman wrote:
> > This is a 9.0-RELEASE system with the mps driver backported from
> > 9-stable. Hourly and daily snapshots were enabled. It had been
> > working extremely well up to this point, and we were looking at
> > possibly replacing our existing NFS servers with this architecture.
> On one system, (haven't tried it lately), it will hang a single
> dataset
> if a Linux *client* mounts the zfs dataset and does:
> 
> cd /mount/point
> ls .zfs/snapshot
> 
> So try that and see if it reproduces the problem.
> 
> Setting snapdir=hidden doesn't prevent accessing it, only hides it
> from
> "ls -a" output.
> 
> The problem did not occur back when I had few or no snapshots. It also
> doesn't happen on the replicated backup server, with all the same
> software, data and snapshots.
> 
> So far, my hack solution is to mount /var/empty on top of every .zfs
> directory on the client side. Another idea is to never export a whole
> dataset from the root of it, because that is the only place that
> contains the .zfs directory, other than if you have subdatasets inside
> that one.
> 
There was a patch specifically for readdirplus, where it avoids doing a
VFS_VGET() when EOPNOTSUPP is replied by VFS_VGET(). This apparently
happens for ZFS snapshots. The patch went into head as r220507 almost
a year ago (Apr. 9, 2011), so Garrett will have it, but you might not?

Also, note that, if all your clients are FreeBSD and none of the mounts
specify the "rdirplus" mount option, the patch isn't relevant, since the
clients will never do a ReaddirPlus RPC.

Although I am highly doubtful that it would be the cause of the above
hang, you should apply this patch, which went into head recently:
   http://people.freebsd.org/~rmacklem/nfsd-enoent.patch
Without it, Lookup RPCs are being done with ni_topdir uninitialized.
For NFSv4 mounts to a UFS volume, it resulted in spurious ENOENT replies
to Lookup. Since the ZFS code doesn't appear to use ni_topdir, I really
doubt it would cause a hang, but strange things can occur when variables
aren't properly initialized and the patch should be safe to use.

Good luck with it. I don't know anything about ZFS, so I can't really
help much, rick

> 
> 
> --
> 
> --------------------------------------------
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.maloney@brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> --------------------------------------------
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1116960020.624724.1331222492424.JavaMail.root>