From owner-freebsd-fs@FreeBSD.ORG  Mon Jul  7 17:40:25 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E299710656BE
	for <freebsd-fs@freebsd.org>; Mon,  7 Jul 2008 17:40:25 +0000 (UTC)
	(envelope-from hartzell@alerce.com)
Received: from merlin.alerce.com (merlin.alerce.com [64.62.142.94])
	by mx1.freebsd.org (Postfix) with ESMTP id 40E038FC16
	for <freebsd-fs@freebsd.org>; Mon,  7 Jul 2008 17:40:25 +0000 (UTC)
	(envelope-from hartzell@alerce.com)
Received: from merlin.alerce.com (localhost [127.0.0.1])
	by merlin.alerce.com (Postfix) with ESMTP id 88B5633C62
	for <freebsd-fs@freebsd.org>; Mon,  7 Jul 2008 10:19:13 -0700 (PDT)
Received: from postfix.alerce.com (w092.z064001164.sjc-ca.dsl.cnc.net
	[64.1.164.92])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by merlin.alerce.com (Postfix) with ESMTP id 2BAFC33C5B
	for <freebsd-fs@freebsd.org>; Mon,  7 Jul 2008 10:19:13 -0700 (PDT)
Received: by postfix.alerce.com (Postfix, from userid 501)
	id 9C84C465668; Mon,  7 Jul 2008 10:18:52 -0700 (PDT)
From: George Hartzell <hartzell@alerce.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <18546.20476.590665.29995@almost.alerce.com>
Date: Mon, 7 Jul 2008 10:18:52 -0700
To: freebsd-fs@freebsd.org
X-Mailer: VM 7.19 under Emacs 22.1.50.1
X-Virus-Scanned: ClamAV using ClamSMTP
Subject: using zfs and unionfs together, does zfs need to be extended?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: hartzell@alerce.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jul 2008 17:40:26 -0000


I'd like to be able to set up a large-ish number of very similar
jails, with a minimum of fuss and take advantage of zfs' cool
features.  I'd like to use unionfs to do this, but zfs' lack of
whiteout support seems to make it impossible.  [jump to the bottom if
you want to skip the setup and get to the questions]

It seems like the most popular way to set up jails these days uses
read-only nullfs mounts of a base system and symbolic links into a
read-write nullfs mount for each jail's specific stuff (etc,
/usr/local, etc...).  These approaches are well described in:

 http://erdgeist.org/arts/software/ezjail
 http://www.freebsd.org/doc/en/books/handbook/jails-application.html

and they work fine with zfs based storage.

It's also possible to use unionfs to layer jail-specific storage over
a base system.  While this approach gives more per-jail flexibility
and avoids having to relocate various directories in the base system,
various unionfs problems seem to have pushed it out of favor.  The
ongoing work of daichi@freebsd.org et al. that fixes various problems
with unionfs,

   http://people.freebsd.org/~daichi/unionfs/

makes it look as if this approach might be now be safe, using
something like:

   mount -t unionfs -o below,noatime /usr/jails/base /usr/jails/www

The obvious zfs analog to this:

   mount -t unionfs -o below,noatime /tank/jails/base /tank/jails/www

fails with:

   mount_unionfs: /tank/jails/www: Operation not supported

A bit of digging suggests that the mount fails when the unionfs code
checks to see if /tank/jails/www supports whiteouts.  The fact that
this check doesn't occur if the uniondir is read-only provides a way
to superficially check if whiteouts are the only problem, this:

   mount -t unionfs -o ro,below,noatime /tank/jails/base /tank/jails/www

does indeed seem to lead to a working [albeit read-only] union mount.

One can work around the problem by creating a ZFS volume, building a
UFS filesystem on it, and then using that as the uniondir, e.g.:

  zfs create -V 5G tank/jail/vol1
  newfs /dev/zvol/tank/jail/vol1
  mkdir /usr/jail/zvol-www
  mount /dev/zvol/tank/jail/vol1 /usr/jail/zvol-www/
  mount -t unionfs -o below,noatime /tank/jail/base/ /usr/jail/zvol-www

The upper layer is still [presumably, I haven't tested these yet]
snapshot-able, send-able, etc.... but this approach leaves me with a
bunch of UFS filesystems that need care and feeding (fsck, etc...).

So finally, the question: How hard would it be to add whiteout support
to our ZFS?  Is it "just" a matter of understanding the places in the
UFS code that do whiteout things, locating the analogous places in the
ZFS tree and doing similar things (it seems to be a "simple" matter of
creating/destroying a whiteout vnode when necessary and checking for
it when appropriate) or is there something fundamentally harder about
it?  Has anyone already done it?  If it were doable/done cleanly,
might it get committed?

Thanks,

g.