From owner-freebsd-fs@FreeBSD.ORG Thu Jun 27 22:16:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A60271C2 for ; Thu, 27 Jun 2013 22:16:49 +0000 (UTC) (envelope-from zoltan.arnold.nagy@gmail.com) Received: from mail-ob0-x234.google.com (mail-ob0-x234.google.com [IPv6:2607:f8b0:4003:c01::234]) by mx1.freebsd.org (Postfix) with ESMTP id 728B11244 for ; Thu, 27 Jun 2013 22:16:49 +0000 (UTC) Received: by mail-ob0-f180.google.com with SMTP id eh20so1305083obb.11 for ; Thu, 27 Jun 2013 15:16:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+ke7xQYX0bzW46AFZgAomeKDGW4C0sar/Vbi0D/QnQI=; b=INiaX5BGwHqPcNruGoax0PADtZK8zEuZGh5D37z2qqwp5JWN/45Dt15WDxXXp1Tw09 PwM7cRFs4qWM/53TexJRUQ3eoJCcz0lnWxvG0GK8HIDKcX5eLsA0Qaj+rkvUEmww3ZfK No8sdz5qWTV1icwZ55GLe3Ry5ZBndwMt6jNbhvQidMGHW9rVY3vu3Umny7XLE15VXbur JwOQ+P6qCHFCOse1hJAa0xa2EnUkqdaow8SkYq+a28+O75SdEt8QhyvRcWoClZLDcAks 6QeC/lrxGchl+np2AxTGXvrBpx1zNKV9ifgw7YF6SO5X19KPmTUizrDlDveoJR8LFO2v hKXw== MIME-Version: 1.0 X-Received: by 10.60.62.238 with SMTP id b14mr3858517oes.90.1372371409001; Thu, 27 Jun 2013 15:16:49 -0700 (PDT) Received: by 10.76.126.195 with HTTP; Thu, 27 Jun 2013 15:16:48 -0700 (PDT) In-Reply-To: <1508973822.292566.1372370293123.JavaMail.root@uoguelph.ca> References: <1508973822.292566.1372370293123.JavaMail.root@uoguelph.ca> Date: Fri, 28 Jun 2013 00:16:48 +0200 Message-ID: Subject: Re: ZFS-backed NFS export with vSphere From: Zoltan Arnold NAGY To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 22:16:49 -0000 Right. As I said, increasing it to 1M increased my throughput from 17MB/s to 76MB/s. However, the SSD can do much more random writes; any idea why I don't see the ZIL go over this value? (vSphere always uses sync writes). Thanks, Zoltan On Thu, Jun 27, 2013 at 11:58 PM, Rick Macklem wrote: > Zoltan Nagy wrote: > > Hi list, > > > > I'd love to have a ZFS-backed NFS export as my VM datastore, but as > > much as > > I'd like to tune > > it, the performance doesn't even get close to Solaris 11's. > > > > I currently have the system set up as this: > > > > pool: tank > > state: ONLINE > > scan: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > tank ONLINE 0 0 0 > > mirror-0 ONLINE 0 0 0 > > da0 ONLINE 0 0 0 > > da1 ONLINE 0 0 0 > > mirror-1 ONLINE 0 0 0 > > da2 ONLINE 0 0 0 > > da3 ONLINE 0 0 0 > > logs > > ada0p4 ONLINE 0 0 0 > > spares > > da4 AVAIL > > > > ada0 is a samsung 840pro SSD, which I'm using for system+ZIL. > > daX is 1TB, 7200rpm seagate disks. > > (from this test's perspective, if I use a separate ZIL device or just > > a > > partition, doesn't matter - I get roughly the same numbers). > > > > The first thing I noticed is that the FSINFO reply from FreeBSD is > > advertising untunable values (I did not find them documented either > > in the > > manpage, or as a sysctl). > > > > rtmax, rtpref, wtmax, wtpref: 64k (fbsd), 1M (solaris) > > dtpref: 64k (fbsd), 8k (solaris) > > > > After manually patching the nfs code (changing NFS_MAXBSIZE to 1M > > instead > > of MAXBSIZE) to adversize the same read/write values (didn't touch > > dtpref), > > my performance went up from 17MB/s to 76MB/s. > > > > Is there a reason NFS_MAXBSIZE is not tunable and/or is it so slow? > > > For exporting other file system types (UFS, ...) the buffer cache is > used and MAXBSIZE is the largest block you can use for the buffer cache. > Some increase of MAXBSIZE would be nice. (I've tried 128Kb without > observing > difficulties and from what I've been told 128Kb is the ZFS block size.) > > > Here's my iozone output (which is run on an ext4 partition created on > > a > > linux VM which has a disk backed by the NFS exported from the FreeBSD > > box): > > > > Record Size 4096 KB > > File size set to 2097152 KB > > Command line used: iozone -b results.xls -r 4m -s 2g -t 6 -i 0 -i > > 1 -i 2 > > Output is in Kbytes/sec > > Time Resolution = 0.000001 seconds. > > Processor cache size set to 1024 Kbytes. > > Processor cache line size set to 32 bytes. > > File stride size set to 17 * record size. > > Throughput test with 6 processes > > Each process writes a 2097152 Kbyte file in 4096 Kbyte records > > > > Children see throughput for 6 initial writers = 76820.31 > > KB/sec > > Parent sees throughput for 6 initial writers = 74899.44 > > KB/sec > > Min throughput per process = 12298.62 KB/sec > > Max throughput per process = 12972.72 KB/sec > > Avg throughput per process = 12803.38 KB/sec > > Min xfer = 1990656.00 KB > > > > Children see throughput for 6 rewriters = 76030.99 KB/sec > > Parent sees throughput for 6 rewriters = 75062.91 KB/sec > > Min throughput per process = 12620.45 KB/sec > > Max throughput per process = 12762.80 KB/sec > > Avg throughput per process = 12671.83 KB/sec > > Min xfer = 2076672.00 KB > > > > Children see throughput for 6 readers = 114221.39 > > KB/sec > > Parent sees throughput for 6 readers = 113942.71 KB/sec > > Min throughput per process = 18920.14 KB/sec > > Max throughput per process = 19183.80 KB/sec > > Avg throughput per process = 19036.90 KB/sec > > Min xfer = 2068480.00 KB > > > > Children see throughput for 6 re-readers = 117018.50 KB/sec > > Parent sees throughput for 6 re-readers = 116917.01 KB/sec > > Min throughput per process = 19436.28 KB/sec > > Max throughput per process = 19590.40 KB/sec > > Avg throughput per process = 19503.08 KB/sec > > Min xfer = 2080768.00 KB > > > > Children see throughput for 6 random readers = 110072.68 > > KB/sec > > Parent sees throughput for 6 random readers = 109698.99 > > KB/sec > > Min throughput per process = 18260.33 KB/sec > > Max throughput per process = 18442.55 KB/sec > > Avg throughput per process = 18345.45 KB/sec > > Min xfer = 2076672.00 KB > > > > Children see throughput for 6 random writers = 76389.71 > > KB/sec > > Parent sees throughput for 6 random writers = 74816.45 > > KB/sec > > Min throughput per process = 12592.09 KB/sec > > Max throughput per process = 12843.75 KB/sec > > Avg throughput per process = 12731.62 KB/sec > > Min xfer = 2056192.00 KB > > > > The other interesting this is that you can notice the system doesn't > > cache > > the data file to ram (the box has 32G), so even for re-reads I get > > miserable numbers. With solaris, the re-reads happen at nearly wire > > spead. > > > > Any ideas what else I could tune? While 76MB/s is much better than > > the > > original 17MB I was seeing, it's still far from Solaris's ~220MB/s... > > > > Thanks a lot, > > Zoltan > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > >