From owner-freebsd-questions@FreeBSD.ORG  Wed Aug  6 22:26:55 2014
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C918D789
 for <freebsd-questions@freebsd.org>; Wed,  6 Aug 2014 22:26:55 +0000 (UTC)
Received: from mail-qg0-f52.google.com (mail-qg0-f52.google.com
 [209.85.192.52])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 83D78229D
 for <freebsd-questions@freebsd.org>; Wed,  6 Aug 2014 22:26:55 +0000 (UTC)
Received: by mail-qg0-f52.google.com with SMTP id f51so3549044qge.11
 for <freebsd-questions@freebsd.org>; Wed, 06 Aug 2014 15:26:54 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:content-type:mime-version:subject:from
 :in-reply-to:date:content-transfer-encoding:message-id:references:to;
 bh=4IuzPhXMLDQ/7phASBjHA5LhyBsHX18lKx1Pcozlovs=;
 b=P15OimXIWXdKr+gVemzGxXo23Lqa7lz1xqkcvjS1I64zB1ZTGxttYBiR3mfZIRFUTl
 sWejG1mMRscZwlJAWESNxQ2du24/pjhnT7XTlh/mDrNop0OmLJDm3tsQCPEzNdspYo3b
 w8e/Kjw5W/Dd94zpJ58MsDGf9l3saKoDzGNif0FdSRB5jddmQg9Fm66GYXvYKMJ9lWo7
 V12r+MfXpTrnkOCV1d2jWqe+cykcuqvlhl5pijj7QRlk/jasMHpvfKW1lh4tI16kKRaE
 C0ok8MhS3Jt8/T83Ue3CZMqUOvz52vfivW318v7mMmebB3bnwCJR9u5Ou0iQgi2QiGBM
 zBPA==
X-Gm-Message-State: ALoCoQl5BndKkxnP7mvUVJHnuFp7162ed7nFrLzeKPTVVB6Tyj5g03TXRdfEoGV/zY39q0gnAL1H
X-Received: by 10.140.38.17 with SMTP id s17mr7416520qgs.40.1407364014132;
 Wed, 06 Aug 2014 15:26:54 -0700 (PDT)
Received: from [192.168.1.127] (c-71-234-255-65.hsd1.vt.comcast.net.
 [71.234.255.65])
 by mx.google.com with ESMTPSA id 63sm2397235qgy.41.2014.08.06.15.26.53
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 06 Aug 2014 15:26:53 -0700 (PDT)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: some ZFS questions
From: Paul Kraus <paul@kraus-haus.org>
In-Reply-To: <201408060732.s767WlPP027322@sdf.org>
Date: Wed, 6 Aug 2014 18:26:51 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <938511B1-128F-48AF-8D16-2C720B844847@kraus-haus.org>
References: <201408060732.s767WlPP027322@sdf.org>
To: FreeBSD Questions !!!! <freebsd-questions@freebsd.org>
X-Mailer: Apple Mail (2.1878.6)
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Aug 2014 22:26:55 -0000

On Aug 6, 2014, at 3:32, Scott Bennett <bennett@sdf.org> wrote:

> 	2) How does one start or stop a pool?

I assume your question comes from other Volume Managers that need to =
have a process (or kernel thread) running to manage the volumes. ZFS =
does not really work that way (and at the same time it does).

>  =46rom what I've read, it
> 	appears that ZFS automatically starts all the pools at once.

The system will keep track of which zpools were active on that system =
and automatically import them at boot time. ZFS records in the zpool =
which host has last imported it to prevent automatically importing the =
same pool on multiple systems at once.

>  If
> 	there is a problem after a crash that causes ZFS to decide to
> 	run some sort of repairs without waiting for a go-ahead from a
> 	human, ZFS might create still more problems.

Not likely. The =93repairs=94 you speak of consist of two different =
mechanisms.

1. ZFS is transactional, so if a change has been committed to the =
transaction log (know as transaction groups of TXG) but not marked as =
committed, then at import time the TXG log will be played (re-played) to =
insure that the data is as up to date as possible. Because ZFS changes =
is Copy on Write and the changes are applied atomically the actual data =
is always consistent, hence no need for an fsck-like utility.

2. If a device that makes up a zpool is missing (failed) or otherwise =
unavailable *and* a hot spare is available, then ZFS will start =
resilvering (the ZFS term for a sync-like operation) the new device to =
substitute for the missing (failed) device. The resilver operation is =
handled at a lower priority than real I/O so it has little impact to =
operations.

>  For example, if
> 	a set of identically partitioned drives has a pool made of one
> 	partition from each drive and another pool made from a different
> 	set of partitions,

Not an advised configuration, but a permitted one (yes, I have done =
this).

> a rebuild after a failed/corrupted drive might
> 	start on both pools at once, thereby hammering all of the drives
> 	mercilessly until something else, hardware or software, failed.

Yup, but just using I/O bandwidth that is not already being used for =
production I/O. But, yes, the drives will be seeing the maximum amount =
of random I/O that they can sustain.

> 	Having a way to allow one to complete before starting another
> 	would be critical in such a configuration.

Avoid such configurations.

>  Also, one might need
> 	stop a pool in order to switch hardware connections around.

zpool export <zpool name> or zpool export -f <zpool name> if necessary. =
Yes, you can do this while a resilver is running. It will start again =
(depending on specific ZFS code, maybe at the point it left off) when =
the zpool is next imported.

>  I
> 	see the zpool(8) command has a "reopen" command, but I don't see
> 	a "close" counterpart, nor a description of when a "reopen" =
might
> 	be used.

I think you are looking for the zpool import and zpool export commands =
here.

>=20
> 	3) If a raidz2 or raidz3 loses more than one component, does one
> 	simply replace and rebuild all of them at once?  Or is it =
necessary
> 	to rebuild them serially?  In some particular order?

I do not believe that you can replace more than one device at a time, =
but if you issue a zpool replace <zpool name> <old device> <new device> =
command while a resilver is running I believe that it will just re-start =
the resilver writing data to *both* new devices at once. Note that since =
you can have multiple top level vdevs, and each vdev can be a RAIDz<n>, =
this is *not* as ludicrous as might seem at first glance. The resilver =
is really happening within a top level vdev.

No need to replace failed devices in any particular order, unless your =
specific configuration depends on it. You might have two failing devices =
and one is much worse than the other. I would replace the device with =
the more serious errors first, but you may have a reason to choose =
otherwise.

> 	4) At present, I'm running 9-STABLE i386.  The box has 4 GB of
> 	memory, but the kernel ignores a bit over 1 GB of it.

I would NOT run ZFS on a 32-bit system.

<snip>

> 	5) When I upgrade to amd64, the usage would continue to be low-
> 	intensity as defined above.  Will the 4 GB be enough?

ZFS uses a memory structure called the ARC (Adaptize Reuse Cache) and it =
is the key to any kind of performance out of ZFS. It is both a write =
cache and a read (and read ahead) cache. If it is not large enough =
(compared to the amount of data you will be writing in any 30 second =
period of time) then you will be in serious trouble. My rule of thumb is =
to not use ZFS on systems (real or virtual) with less than 4GB RAM. I =
have been running 9.2 on a systems with 8GB RAM with no issues, but when =
I was testing 10.0 with 3GB RAM I occasionally had memory related hangs =
(I was testing with iozone before my additional RMA arrived).

>  I will not
> 	be using the "deduplication" feature at all.

The reduplication in ZFS has a very small =93sweet spot=94 and it is =
highly recommended that you run the deduce test before turning on deduce =
to see the real effect it has (I am not near my systems right now or I =
would include the specific zfs command). Also note that 1GB RAM per 1TB =
of raw space under deduce is functionally mandatory for a functional =
system.=20

> 	6) I have a much fancier computer sitting unused that I intend =
to
> 	put into service fairly soon after getting my current disk and =
data
> 	situation resolved.  The drives that would be in use for raidz
> 	pools I would like to attach to that system when it is ready.  =
It
> 	also has 4 GB of memory, but would start out as an amd64 system =
and
> 	might well have another 2 GB or 4 GB added at some point(s), =
though
> 	not immediately.  What problems/pitfalls/precautions would I =
need
> 	to have in mind and be prepared for in order to move those =
drives
> 	from the current system to that newer one?

You should be able to physically move the drives from *any* system to =
*any* other that supports the ZFS version and features that you are =
running (using). ZFS was even designed to even handle endien differences =
(SPARC to INTEL for example). I would caution you you to EXPORT the =
zpool when removing the drives and IMPORT it fresh on the new system. =
Technically you *can* do a `zpool import -f`, but from years of reading =
horror stories on the ZFS list, I *always* export / import if moving =
drives (if I can).

--
Paul Kraus
paul@kraus-haus.org