FreeBSD Mail Archives

Date:      Wed, 6 Aug 2014 18:26:51 -0400
From:      Paul Kraus <paul@kraus-haus.org>
To:        FreeBSD Questions !!!! <freebsd-questions@freebsd.org>
Subject:   Re: some ZFS questions
Message-ID:  <938511B1-128F-48AF-8D16-2C720B844847@kraus-haus.org>
In-Reply-To: <201408060732.s767WlPP027322@sdf.org>
References:  <201408060732.s767WlPP027322@sdf.org>

On Aug 6, 2014, at 3:32, Scott Bennett <bennett@sdf.org> wrote:

> 	2) How does one start or stop a pool?

I assume your question comes from other Volume Managers that need to =
have a process (or kernel thread) running to manage the volumes. ZFS =
does not really work that way (and at the same time it does).

>  =46rom what I've read, it
> 	appears that ZFS automatically starts all the pools at once.

The system will keep track of which zpools were active on that system =
and automatically import them at boot time. ZFS records in the zpool =
which host has last imported it to prevent automatically importing the =
same pool on multiple systems at once.

>  If
> 	there is a problem after a crash that causes ZFS to decide to
> 	run some sort of repairs without waiting for a go-ahead from a
> 	human, ZFS might create still more problems.

Not likely. The =93repairs=94 you speak of consist of two different =
mechanisms.

1. ZFS is transactional, so if a change has been committed to the =
transaction log (know as transaction groups of TXG) but not marked as =
committed, then at import time the TXG log will be played (re-played) to =
insure that the data is as up to date as possible. Because ZFS changes =
is Copy on Write and the changes are applied atomically the actual data =
is always consistent, hence no need for an fsck-like utility.

2. If a device that makes up a zpool is missing (failed) or otherwise =
unavailable *and* a hot spare is available, then ZFS will start =
resilvering (the ZFS term for a sync-like operation) the new device to =
substitute for the missing (failed) device. The resilver operation is =
handled at a lower priority than real I/O so it has little impact to =
operations.

>  For example, if
> 	a set of identically partitioned drives has a pool made of one
> 	partition from each drive and another pool made from a different
> 	set of partitions,

Not an advised configuration, but a permitted one (yes, I have done =
this).

> a rebuild after a failed/corrupted drive might
> 	start on both pools at once, thereby hammering all of the drives
> 	mercilessly until something else, hardware or software, failed.

Yup, but just using I/O bandwidth that is not already being used for =
production I/O. But, yes, the drives will be seeing the maximum amount =
of random I/O that they can sustain.

> 	Having a way to allow one to complete before starting another
> 	would be critical in such a configuration.

Avoid such configurations.

>  Also, one might need
> 	stop a pool in order to switch hardware connections around.

zpool export <zpool name> or zpool export -f <zpool name> if necessary. =
Yes, you can do this while a resilver is running. It will start again =
(depending on specific ZFS code, maybe at the point it left off) when =
the zpool is next imported.

>  I
> 	see the zpool(8) command has a "reopen" command, but I don't see
> 	a "close" counterpart, nor a description of when a "reopen" =
might
> 	be used.

I think you are looking for the zpool import and zpool export commands =
here.

>=20
> 	3) If a raidz2 or raidz3 loses more than one component, does one
> 	simply replace and rebuild all of them at once?  Or is it =
necessary
> 	to rebuild them serially?  In some particular order?

I do not believe that you can replace more than one device at a time, =
but if you issue a zpool replace <zpool name> <old device> <new device> =
command while a resilver is running I believe that it will just re-start =
the resilver writing data to *both* new devices at once. Note that since =
you can have multiple top level vdevs, and each vdev can be a RAIDz<n>, =
this is *not* as ludicrous as might seem at first glance. The resilver =
is really happening within a top level vdev.

No need to replace failed devices in any particular order, unless your =
specific configuration depends on it. You might have two failing devices =
and one is much worse than the other. I would replace the device with =
the more serious errors first, but you may have a reason to choose =
otherwise.

> 	4) At present, I'm running 9-STABLE i386.  The box has 4 GB of
> 	memory, but the kernel ignores a bit over 1 GB of it.

I would NOT run ZFS on a 32-bit system.

<snip>

> 	5) When I upgrade to amd64, the usage would continue to be low-
> 	intensity as defined above.  Will the 4 GB be enough?

ZFS uses a memory structure called the ARC (Adaptize Reuse Cache) and it =
is the key to any kind of performance out of ZFS. It is both a write =
cache and a read (and read ahead) cache. If it is not large enough =
(compared to the amount of data you will be writing in any 30 second =
period of time) then you will be in serious trouble. My rule of thumb is =
to not use ZFS on systems (real or virtual) with less than 4GB RAM. I =
have been running 9.2 on a systems with 8GB RAM with no issues, but when =
I was testing 10.0 with 3GB RAM I occasionally had memory related hangs =
(I was testing with iozone before my additional RMA arrived).

>  I will not
> 	be using the "deduplication" feature at all.

The reduplication in ZFS has a very small =93sweet spot=94 and it is =
highly recommended that you run the deduce test before turning on deduce =
to see the real effect it has (I am not near my systems right now or I =
would include the specific zfs command). Also note that 1GB RAM per 1TB =
of raw space under deduce is functionally mandatory for a functional =
system.=20

> 	6) I have a much fancier computer sitting unused that I intend =
to
> 	put into service fairly soon after getting my current disk and =
data
> 	situation resolved.  The drives that would be in use for raidz
> 	pools I would like to attach to that system when it is ready.  =
It
> 	also has 4 GB of memory, but would start out as an amd64 system =
and
> 	might well have another 2 GB or 4 GB added at some point(s), =
though
> 	not immediately.  What problems/pitfalls/precautions would I =
need
> 	to have in mind and be prepared for in order to move those =
drives
> 	from the current system to that newer one?

You should be able to physically move the drives from *any* system to =
*any* other that supports the ZFS version and features that you are =
running (using). ZFS was even designed to even handle endien differences =
(SPARC to INTEL for example). I would caution you you to EXPORT the =
zpool when removing the drives and IMPORT it fresh on the new system. =
Technically you *can* do a `zpool import -f`, but from years of reading =
horror stories on the ZFS list, I *always* export / import if moving =
drives (if I can).

--
Paul Kraus
paul@kraus-haus.org

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?938511B1-128F-48AF-8D16-2C720B844847>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation