FreeBSD Mail Archives

Date:      Fri, 13 Oct 2017 17:55:39 +0100
From:      "Frank Leonhardt (m)" <frank2@fjl.co.uk>
To:        Kate Dawson <k4t@3msg.es>, "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: FreeBSD ZFS file server with SSD HDD
Message-ID:  <E7BA6E8B-262E-42CE-9BB3-65847AA89498@fjl.co.uk>
In-Reply-To: <20171013134316.GG24374@apple.rat.burntout.org>
References:  <20171011130512.GE24374@apple.rat.burntout.org> <DB4BCEA2-406C-4CCB-AC7F-60C0DCFD6212@fjl.co.uk> <20171013134316.GG24374@apple.rat.burntout.org>



On 13 October 2017 14:43:16 BST, Kate Dawson <k4t@3msg.es> wrote:
>On Fri, Oct 13, 2017 at 01:04:50PM +0100, Frank Leonhardt (m) wrote:
>> 
>> 
>> This all matters A LOT if you're using ZFS to back a virtual HD for a
>VM. Things lke vSphere  make every NFS write synchronous. Given the
>guest OS is probably using a file format that makes this pointless adds
>insult to injury. ZFS writing every block to a CoW file will fragment
>it all to hell and back.
>> 
>> So, throwing hardware at it isn't going to solve the underlying
>problem. You need to sort out the sync writes. If your file store is on
>a UPS, ignore them (I comment out the code). And store your virtual HD
>on UFS if possible. 
>
>Thanks,
>
>sync is disabled on the dataset, I think it's all async.  The VM's all
>have journalled files systems, and the the system is UPS backed.
>
>NFS is mounted async - I think that is the default for Debian Linux
>
>My understanding is that ZFS will always be consistent, however in the
>case of catastrophic shutdown some data may not be written to disk, and
>we will just have to take our chances.
>
>A key reason for ZFS being chosen was snapshots.  Otherwise I think
>GNU/Linux would have been chosen over FreeBSD and ZFS
>
>Thanks for the detailed info on ZIL/SLOG
>
>
>Regards, 
>
>Kate Dawson
>> -- 
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>-- 
>"The introduction of a coordinate system to geometry is an act of
>violence"

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
From owner-freebsd-questions@freebsd.org  Fri Oct 13 17:22:44 2017
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FE61E27ED3
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Fri, 13 Oct 2017 17:22:44 +0000 (UTC)
 (envelope-from frank2@fjl.co.uk)
Received: from bs1.fjl.org.uk (bs1.fjl.org.uk [84.45.41.196])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "bs1.fjl.org.uk", Issuer "bs1.fjl.org.uk" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id D952E6E0EF
 for <freebsd-questions@freebsd.org>; Fri, 13 Oct 2017 17:22:43 +0000 (UTC)
 (envelope-from frank2@fjl.co.uk)
Received: from [10.154.179.141] ([85.255.236.88]) (authenticated bits=0)
 by bs1.fjl.org.uk (8.14.4/8.14.4) with ESMTP id v9DHMTwF078862
 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
 Fri, 13 Oct 2017 18:22:30 +0100 (BST)
 (envelope-from frank2@fjl.co.uk)
User-Agent: K-9 Mail for Android
In-Reply-To: <20171013134316.GG24374@apple.rat.burntout.org>
References: <20171011130512.GE24374@apple.rat.burntout.org>
 <DB4BCEA2-406C-4CCB-AC7F-60C0DCFD6212@fjl.co.uk>
 <20171013134316.GG24374@apple.rat.burntout.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain;
 charset=UTF-8
Subject: Re: FreeBSD ZFS file server with SSD HDD
From: "Frank Leonhardt (m)" <frank2@fjl.co.uk>
Date: Fri, 13 Oct 2017 18:18:21 +0100
To: Kate Dawson <k4t@3msg.es>,
 "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Message-ID: <267A5351-9CA8-4C43-8A3E-5B1CF6E79030@fjl.co.uk>
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>;
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Oct 2017 17:22:44 -0000



On 13 October 2017 14:43:16 BST, Kate Dawson <k4t@3msg.es> wrote:
>On Fri, Oct 13, 2017 at 01:04:50PM +0100, Frank Leonhardt (m) wrote:
>> 
>> 
>> This all matters A LOT if you're using ZFS to back a virtual HD for a
>VM. Things lke vSphere  make every NFS write synchronous. Given the
>guest OS is probably using a file format that makes this pointless adds
>insult to injury. ZFS writing every block to a CoW file will fragment
>it all to hell and back.
>> 
>> So, throwing hardware at it isn't going to solve the underlying
>problem. You need to sort out the sync writes. If your file store is on
>a UPS, ignore them (I comment out the code). And store your virtual HD
>on UFS if possible. 
>
>Thanks,
>
>sync is disabled on the dataset, I think it's all async.  The VM's all
>have journalled files systems, and the the system is UPS backed.
>
>NFS is mounted async - I think that is the default for Debian Linux
>
>My understanding is that ZFS will always be consistent, however in the
>case of catastrophic shutdown some data may not be written to disk, and
>we will just have to take our chances.
>
>A key reason for ZFS being chosen was snapshots.  Otherwise I think
>GNU/Linux would have been chosen over FreeBSD and ZFS
>
>Thanks for the detailed info on ZIL/SLOG
>

Hi Katie,

My reply was supposed to go to questions but Android mail trouble!

Yes, your understanding of ZFS is correct. And also misleading. The file structure will be consistent and have no write holes. But this isn't true at the application level. Suppose you have two files. The application wants to write a corresponding entry in both. If the power fails after the first write, one file will be fully updated, and the second will be fully NOT updated. Its a matter of good application design as to what happens when the power is restored.

At least ZFS guarantees its structure is correct.

No amount of OS cleverness on fancy hardware is going to save you from a sloppy application.

You said you were running VMs using Xen. I use Xen  out of preference, but I must admit I've never checked to see if all its writes are sync. When I want performance, I don't use a VM on the first place. But let's assume it does. If you have Windows running it won't like any interruption to its disk activity. The backing file may be consistent, but its contents won't be. But that's a hit from using Windows however you do it.

You mentioned you had disabled sync in the dataset. I bet you hardly noticed a difference, right? It has three modes; sync nothing, sync everything and sync when asked. It's not that clear that the default is to only sync when asked, and I'd be inclined to keep it that way.

Your problem is more likely NFS. Your client asks for a sync for every write because it's lazy. This request goes all the way over the network to ZFS, which probably deals with it fairly quickly. However the client has to wait for the confirmation to get all the way back to its OS before it returns from the system call, only to get another block to write and start the journey again. That's POSIX for you. In application programmers only did a sync write when necessary it'd be fine. But they don't.

There is a tuneable to tell NFS to lie, if you don't care about POSIX compliance. Unfortunately I'm on a train and can't remember the detail, but its somewhere on my blog (I think). You'll find it improves matters around 300%.

Regards, Frank.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E7BA6E8B-262E-42CE-9BB3-65847AA89498>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation