From owner-freebsd-fs@FreeBSD.ORG  Tue Jun 11 23:21:30 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D5CC5ADB
 for <freebsd-fs@freebsd.org>; Tue, 11 Jun 2013 23:21:30 +0000 (UTC)
 (envelope-from ken@kdm.org)
Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81])
 by mx1.freebsd.org (Postfix) with ESMTP id A7501138C
 for <freebsd-fs@freebsd.org>; Tue, 11 Jun 2013 23:21:30 +0000 (UTC)
Received: from nargothrond.kdm.org (localhost [127.0.0.1])
 by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r5BNLOCn047835;
 Tue, 11 Jun 2013 17:21:24 -0600 (MDT)
 (envelope-from ken@nargothrond.kdm.org)
Received: (from ken@localhost)
 by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r5BNLOgG047834;
 Tue, 11 Jun 2013 17:21:24 -0600 (MDT) (envelope-from ken)
Date: Tue, 11 Jun 2013 17:21:24 -0600
From: "Kenneth D. Merry" <ken@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: An order of magnitude higher IOPS needed with ZFS than UFS
Message-ID: <20130611232124.GA42577@nargothrond.kdm.org>
References: <51B79023.5020109@fsn.hu>
 <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <253074981.119060.1370985609747.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.4.2i
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Jun 2013 23:21:30 -0000

On Tue, Jun 11, 2013 at 17:20:09 -0400, Rick Macklem wrote:
> Attila Nagy wrote:
> > Hi,
> > 
> > I have two identical machines. They have 14 disks hooked up to a HP
> > smartarray (SA from now on) controller.
> > Both machines have the same SA configuration and layout: the disks are
> > organized into mirror pairs (HW RAID1).
> > 
> > On the first machine, these mirrors are formatted with UFS2+SU
> > (default
> > settings), on the second machine they are used as separate zpools
> > (please don't tell me that ZFS can do the same, I know). Atime is
> > turned
> > off, otherwise, no other modifications (zpool/zfs or sysctl
> > parameters).
> > The file systems are loaded more or less evenly with serving of some
> > kB
> > to few megs files.
> > 
> > The machines act as NFS servers, so there is one, maybe important
> > difference here: the UFS machine runs 8.3-RELEASE, while the ZFS one
> > runs 9.1-STABLE@r248885.
> > They get the same type of load, and according to nfsstat and netstat,
> > the loads don't explain the big difference which can be seen in disk
> > IOs. In fact, the UFS host seems to be more loaded...
> > 
> > According to gstat on the UFS machine:
> > dT: 60.001s w: 60.000s filter: da
> > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> > 0 42 35 404 6.4 8 150 214.2 21.5| da0
> > 0 30 21 215 6.1 9 168 225.2 15.9| da1
> > 0 41 33 474 4.5 8 158 211.3 18.0| da2
> > 0 39 30 425 4.6 9 163 235.0 17.1| da3
> > 1 31 24 266 5.1 7 93 174.1 14.9| da4
> > 0 29 22 273 5.9 7 84 200.7 15.9| da5
> > 0 37 30 692 7.1 7 115 206.6 19.4| da6
> > 
> > and on the ZFS one:
> > dT: 60.001s w: 60.000s filter: da
> > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> > 0 228 201 1045 23.7 27 344 53.5 88.7| da0
> > 5 185 167 855 21.1 19 238 44.9 73.8| da1
> > 10 263 236 1298 34.9 27 454 53.3 99.9| da2
> > 10 255 235 1341 28.3 20 239 64.8 92.9| da3
> > 10 219 195 994 22.3 23 257 46.3 81.3| da4
> > 10 248 221 1213 22.4 27 264 55.8 90.2| da5
> > 9 231 213 1169 25.1 19 229 54.6 88.6| da6
> > 
> > I've seen a lot of cases where ZFS required more memory and CPU (and
> > even IO) to handle the same load, but they were nowhere this bad
> > (often
> > a 10x increase).
> > 
> > Any ideas?
> > 
> ken@ recently committed a change to the new NFS server to add file
> handle affinity support to it. He reported that he had found that,
> without file handle affinity, that ZFS's sequential reading heuristic
> broke badly (or something like that, you can probably find the email
> thread or maybe he will chime in).

That is correct.  The problem, if the I/O is sequential, is that simultaneous
requests for adjacent blocks in a file will get farmed out to different
threads in the NFS server.  These can easily go down into ZFS out of order,
and make the ZFS prefetch code think that the file is not being read
sequentially.  It blows away the zfetch stream, and you wind up with a lot
of I/O bandwidth getting used (with a lot of prefetching done and then
re-done), but not much performance.

The FHA code puts adjacent requests in a single file into the same thread,
so ZFS sees the requests in the right order.

Another change I made was to allow parallel writes to a file if the
underlying filesystem allows it.  (ZFS is the only filesystem that
allows that currently.)  That can help random writes.

Linux clients are more likely than FreeBSD and MacOS clients to queue a lot
of reads to the server.

> Anyhow, you could try switching the FreeBSD 9 system to use the old
> NFS server (assuming your clients are doing NFSv3 mounts) and see if
> that has a significant effect. (For FreeBSD9, the old server has file
> handle affinity, but the new server does not.)

If using the old NFS server helps, then the FHA code for the new server
will help as well.  Perhaps more, because the default FHA tuning parameters
have changed somewhat and parallel writes are now possible.

If you want to try out the FHA changes in stable/9, I just MFCed them,
change 251641.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG