From owner-freebsd-questions@FreeBSD.ORG Wed Aug 6 22:26:55 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C918D789 for ; Wed, 6 Aug 2014 22:26:55 +0000 (UTC) Received: from mail-qg0-f52.google.com (mail-qg0-f52.google.com [209.85.192.52]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 83D78229D for ; Wed, 6 Aug 2014 22:26:55 +0000 (UTC) Received: by mail-qg0-f52.google.com with SMTP id f51so3549044qge.11 for ; Wed, 06 Aug 2014 15:26:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=4IuzPhXMLDQ/7phASBjHA5LhyBsHX18lKx1Pcozlovs=; b=P15OimXIWXdKr+gVemzGxXo23Lqa7lz1xqkcvjS1I64zB1ZTGxttYBiR3mfZIRFUTl sWejG1mMRscZwlJAWESNxQ2du24/pjhnT7XTlh/mDrNop0OmLJDm3tsQCPEzNdspYo3b w8e/Kjw5W/Dd94zpJ58MsDGf9l3saKoDzGNif0FdSRB5jddmQg9Fm66GYXvYKMJ9lWo7 V12r+MfXpTrnkOCV1d2jWqe+cykcuqvlhl5pijj7QRlk/jasMHpvfKW1lh4tI16kKRaE C0ok8MhS3Jt8/T83Ue3CZMqUOvz52vfivW318v7mMmebB3bnwCJR9u5Ou0iQgi2QiGBM zBPA== X-Gm-Message-State: ALoCoQl5BndKkxnP7mvUVJHnuFp7162ed7nFrLzeKPTVVB6Tyj5g03TXRdfEoGV/zY39q0gnAL1H X-Received: by 10.140.38.17 with SMTP id s17mr7416520qgs.40.1407364014132; Wed, 06 Aug 2014 15:26:54 -0700 (PDT) Received: from [192.168.1.127] (c-71-234-255-65.hsd1.vt.comcast.net. [71.234.255.65]) by mx.google.com with ESMTPSA id 63sm2397235qgy.41.2014.08.06.15.26.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Aug 2014 15:26:53 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: some ZFS questions From: Paul Kraus In-Reply-To: <201408060732.s767WlPP027322@sdf.org> Date: Wed, 6 Aug 2014 18:26:51 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <938511B1-128F-48AF-8D16-2C720B844847@kraus-haus.org> References: <201408060732.s767WlPP027322@sdf.org> To: FreeBSD Questions !!!! X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2014 22:26:55 -0000 On Aug 6, 2014, at 3:32, Scott Bennett wrote: > 2) How does one start or stop a pool? I assume your question comes from other Volume Managers that need to = have a process (or kernel thread) running to manage the volumes. ZFS = does not really work that way (and at the same time it does). > =46rom what I've read, it > appears that ZFS automatically starts all the pools at once. The system will keep track of which zpools were active on that system = and automatically import them at boot time. ZFS records in the zpool = which host has last imported it to prevent automatically importing the = same pool on multiple systems at once. > If > there is a problem after a crash that causes ZFS to decide to > run some sort of repairs without waiting for a go-ahead from a > human, ZFS might create still more problems. Not likely. The =93repairs=94 you speak of consist of two different = mechanisms. 1. ZFS is transactional, so if a change has been committed to the = transaction log (know as transaction groups of TXG) but not marked as = committed, then at import time the TXG log will be played (re-played) to = insure that the data is as up to date as possible. Because ZFS changes = is Copy on Write and the changes are applied atomically the actual data = is always consistent, hence no need for an fsck-like utility. 2. If a device that makes up a zpool is missing (failed) or otherwise = unavailable *and* a hot spare is available, then ZFS will start = resilvering (the ZFS term for a sync-like operation) the new device to = substitute for the missing (failed) device. The resilver operation is = handled at a lower priority than real I/O so it has little impact to = operations. > For example, if > a set of identically partitioned drives has a pool made of one > partition from each drive and another pool made from a different > set of partitions, Not an advised configuration, but a permitted one (yes, I have done = this). > a rebuild after a failed/corrupted drive might > start on both pools at once, thereby hammering all of the drives > mercilessly until something else, hardware or software, failed. Yup, but just using I/O bandwidth that is not already being used for = production I/O. But, yes, the drives will be seeing the maximum amount = of random I/O that they can sustain. > Having a way to allow one to complete before starting another > would be critical in such a configuration. Avoid such configurations. > Also, one might need > stop a pool in order to switch hardware connections around. zpool export or zpool export -f if necessary. = Yes, you can do this while a resilver is running. It will start again = (depending on specific ZFS code, maybe at the point it left off) when = the zpool is next imported. > I > see the zpool(8) command has a "reopen" command, but I don't see > a "close" counterpart, nor a description of when a "reopen" = might > be used. I think you are looking for the zpool import and zpool export commands = here. >=20 > 3) If a raidz2 or raidz3 loses more than one component, does one > simply replace and rebuild all of them at once? Or is it = necessary > to rebuild them serially? In some particular order? I do not believe that you can replace more than one device at a time, = but if you issue a zpool replace = command while a resilver is running I believe that it will just re-start = the resilver writing data to *both* new devices at once. Note that since = you can have multiple top level vdevs, and each vdev can be a RAIDz, = this is *not* as ludicrous as might seem at first glance. The resilver = is really happening within a top level vdev. No need to replace failed devices in any particular order, unless your = specific configuration depends on it. You might have two failing devices = and one is much worse than the other. I would replace the device with = the more serious errors first, but you may have a reason to choose = otherwise. > 4) At present, I'm running 9-STABLE i386. The box has 4 GB of > memory, but the kernel ignores a bit over 1 GB of it. I would NOT run ZFS on a 32-bit system. > 5) When I upgrade to amd64, the usage would continue to be low- > intensity as defined above. Will the 4 GB be enough? ZFS uses a memory structure called the ARC (Adaptize Reuse Cache) and it = is the key to any kind of performance out of ZFS. It is both a write = cache and a read (and read ahead) cache. If it is not large enough = (compared to the amount of data you will be writing in any 30 second = period of time) then you will be in serious trouble. My rule of thumb is = to not use ZFS on systems (real or virtual) with less than 4GB RAM. I = have been running 9.2 on a systems with 8GB RAM with no issues, but when = I was testing 10.0 with 3GB RAM I occasionally had memory related hangs = (I was testing with iozone before my additional RMA arrived). > I will not > be using the "deduplication" feature at all. The reduplication in ZFS has a very small =93sweet spot=94 and it is = highly recommended that you run the deduce test before turning on deduce = to see the real effect it has (I am not near my systems right now or I = would include the specific zfs command). Also note that 1GB RAM per 1TB = of raw space under deduce is functionally mandatory for a functional = system.=20 > 6) I have a much fancier computer sitting unused that I intend = to > put into service fairly soon after getting my current disk and = data > situation resolved. The drives that would be in use for raidz > pools I would like to attach to that system when it is ready. = It > also has 4 GB of memory, but would start out as an amd64 system = and > might well have another 2 GB or 4 GB added at some point(s), = though > not immediately. What problems/pitfalls/precautions would I = need > to have in mind and be prepared for in order to move those = drives > from the current system to that newer one? You should be able to physically move the drives from *any* system to = *any* other that supports the ZFS version and features that you are = running (using). ZFS was even designed to even handle endien differences = (SPARC to INTEL for example). I would caution you you to EXPORT the = zpool when removing the drives and IMPORT it fresh on the new system. = Technically you *can* do a `zpool import -f`, but from years of reading = horror stories on the ZFS list, I *always* export / import if moving = drives (if I can). -- Paul Kraus paul@kraus-haus.org