From owner-freebsd-questions@freebsd.org  Thu Sep 24 14:20:10 2015
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E8D2A08B0A
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Thu, 24 Sep 2015 14:20:10 +0000 (UTC)
 (envelope-from paul@kraus-haus.org)
Received: from mail-qg0-f52.google.com (mail-qg0-f52.google.com
 [209.85.192.52])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 60E03184B
 for <freebsd-questions@freebsd.org>; Thu, 24 Sep 2015 14:20:09 +0000 (UTC)
 (envelope-from paul@kraus-haus.org)
Received: by qgev79 with SMTP id v79so45179305qge.0
 for <freebsd-questions@freebsd.org>; Thu, 24 Sep 2015 07:20:08 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:content-type:mime-version:subject:from
 :in-reply-to:date:content-transfer-encoding:message-id:references:to;
 bh=Cl/kVNJf8LVb81YYjvzbkCxb7xaCAEtrNjzZSQ5SdYs=;
 b=HTydHg9sfs5HdFuQ4Mv19hYWW/YNevsQqtr2FbAEswK39FcpesRu6JUVeqHEfX8ynl
 7qNj2/3QdPabF+Iog5eooxxIEmdv57Z2w5QFc35CmoptOwUh9TRY8+06dIwA8BAz7h+Q
 2zWllutTIdKCCiN2cNrOQddallUKAuydJ2lhuJBVJwk13QMDDxrg7BcTiCnkuTc9KCVO
 zVdZaQKmRsLqDIGCevAysgUHF0ud0N01CHz+HwHrcOPZw/SMhz21PWihUmF6E7+RngKD
 ckZ6/x01tM+XhQW3vqER3jZrsSWdeukl4VvwTam2g4YmCD+jS/kDP1f+7CFd5yjsIdxn
 h18A==
X-Gm-Message-State: ALoCoQm18iwgUUGD21XKq+VEyzg/w7OgH3UapZjV5xGi+eGqYnOci339Sc1NhiLS24RuVXJnrqja
X-Received: by 10.140.151.140 with SMTP id 134mr46170350qhx.49.1443104408600; 
 Thu, 24 Sep 2015 07:20:08 -0700 (PDT)
Received: from mbp-1.thecreativeadvantage.com (mail.thecreativeadvantage.com.
 [96.236.20.34])
 by smtp.gmail.com with ESMTPSA id p14sm3505767qge.43.2015.09.24.07.20.06
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 24 Sep 2015 07:20:06 -0700 (PDT)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: Restructure a ZFS Pool
From: Paul Kraus <paul@kraus-haus.org>
In-Reply-To: <168652008.9504625.1443102487228.JavaMail.zimbra@logitravel.com>
Date: Thu, 24 Sep 2015 10:20:04 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <0A3AC2BF-0BF6-4CFB-8947-EFA01B58CF93@kraus-haus.org>
References: <480627999.9462316.1443098561442.JavaMail.zimbra@logitravel.com>
 <9EE24D9C-260A-408A-A7B5-14BACB12DDA9@kraus-haus.org>
 <168652008.9504625.1443102487228.JavaMail.zimbra@logitravel.com>
To: Raimund Sacherer <rs@logitravel.com>,
 FreeBSD Questions <freebsd-questions@freebsd.org>
X-Mailer: Apple Mail (2.1878.6)
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Sep 2015 14:20:10 -0000

On Sep 24, 2015, at 9:48, Raimund Sacherer <rs@logitravel.com> wrote:

> Yes, I understood that it will only help preventing fragmentation in =
the future. I also read that performance is great when using async ZFS,

That is an overly general statement. I have seen zpools perform badly =
with async as well as sync writes when not configured to match the =
workload.

> would it be safe to use async ZFS if I have Battery Backed Hardware =
Raid Controller (1024G ram cache)?
> The server is a HP G8 and I have configured all discs as single disk =
mirrors (the only way to get a JBOD on this raid controller). In the =
event of a power outage, everything should be held in the raid =
controller by the battery and it should write on disk as soon as power =
is restored,

Turning off sync behavior violates Posix compliance and is not a very =
good idea. Also remember that async writes are cached in the ARC=85 so =
you need power for the entire server, not just the disk caches, until =
all activity has ceased _and_ all pending Transaction Groups (TXG) have =
been committed to non-volatile storage. TXGs are generally committed =
every 5 seconds, but if you are under heavy write load it may take more =
time than that.

> ... would that be safe environment to switch ZFS to async?=20

No one can make that call but you. You know your environment, you know =
your workload, you know the fallout from lost writes _if_ something goes =
wrong.

> If I use async, is there still the *need* for a SLOG device, I read =
that running ZFS async and using the SLOG is comparable, because both =
let the writes be ordered and those prevent fragmentation? It is not a =
critical system (e.g. downtime during the day is possible), but if =
restores need to be done I'd rather have it run as fast as possible.=20

If you disable sync writes (please do NOT say =93use async=94 as that is =
determined by the application code), then you are disabling the ZIL (ZFS =
Intent Log) and the SLOG is a device to hold _just_ the ZIL separate =
from the data vdevs in the zpool. So, yes, disabling sync writes means =
that even if there is a SLOG it will never be used.

>> Yes, but unless you can stand loosing data in flight (writes that the =
system
>> says have been committed but have only made it to the SLOG), you =
really want
>> your SLOG vdev to be a mirror (at least 2 drives).

> Shouldn't this scenario be handled by ZFS (writes to SLOG, power out, =
power on, SLOG is transferred to data disks?)

Not if the single SLOG device _fails_ =85 In the case of a power =
failure, once the system comes back up ZFS will replay the TXGs on the =
SLOG and you will not have lost any writes,

> I thought the only dataloss would be writes which are currently in =
transit TO the SLOG in time of the power outage?

Once again, if the application requests sync writes, the application is =
not told that the write is complete _until_ it is committed to =
non-volatile backing storage, in this case the ZIL/SLOG device(s). So =
from the application=92s perspective, no writes are lost because they =
were not committed when power failed. This is one of the use cases where =
claiming that disabling sync behavior and assuming UPS / battery backed =
up cache is just as good as a SLOG device is misleading. The application =
is asking for a sync write and it is being lied to.

> And I read somewhere that with ZFS since V28 (IIRC) if the SLOG dies =
it turns off the log and you loose the (performance) benefit of the =
SLOG, but the pools should still be operational?

There are separate versions for zpool and zfs, you are referring to =
zpool version 28. Log device removal was added in zpool version 19. =
`zpool upgrade -v` will tell you which versions / features your system =
supports. `zfs upgrade -v` will tell you the same thing for zfs =
versions. FreeBSD 10.1 has zfs version 5 and zpool version 28 plus lots =
of added features. Feature flags were a way to add features to zpools =
while not completely breaking compatibility.

So you can remove a failed SLOG device, and if they are mirrored you =
still don=92t lose any data. I=92m not sure what happens to a running =
zpool if a single (non-mirrored) SLOG device fails.

>> In a zpool of this size, especially a RAIDz<N> zpool, you really want =
a hot
>> spare and a notification mechanism so you can replace a failed drive =
ASAP.
>> The resilver time (to replace afield drive) will be limited by the
>> performance of a _single_ drive for _random_ I/O. See this post
>> http://pk1048.com/zfs-resilver-observations/ for one of my resilver
>> operations and the performance of such.

> Thank you for this info, I'l keep it in mind and bookmark your link.

Benchmark your own zpool, if you can. Do a zpool replace on a device and =
see how long it takes. That is a reasonable first approximation of how =
long it will take to replace a really failed device. I tend to stick =
with no bigger than 1 TB drives to keep resilver times reasonable (for =
me). I add more vdevs of mirrors as I need capacity.

--
Paul Kraus
paul@kraus-haus.org