From owner-freebsd-fs@FreeBSD.ORG Thu Jun 27 13:13:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2FDC4414 for ; Thu, 27 Jun 2013 13:13:54 +0000 (UTC) (envelope-from zoltan.arnold.nagy@gmail.com) Received: from mail-oa0-x232.google.com (mail-oa0-x232.google.com [IPv6:2607:f8b0:4003:c02::232]) by mx1.freebsd.org (Postfix) with ESMTP id 01C1B131E for ; Thu, 27 Jun 2013 13:13:53 +0000 (UTC) Received: by mail-oa0-f50.google.com with SMTP id k7so804103oag.37 for ; Thu, 27 Jun 2013 06:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Fb3LaeU/FQO+GKmkCSa6RAcNmwVJN5LQ/iv8vfyRJx8=; b=goNEFYAAW1a1xwWyCMZPtl/hfK3ot6J6vT2e6DebZZsaN23sXKbzjgr/UvQs8LbrD7 bJ/Y6+4ZC0dGGDkAf6QK+TCOAZWYuP+/HRht3qUsypdPoy3Dj68xXEtJfuQFKQD6ZuRx gwAHjTPuowY2QCT5M2xU1xJw+VSRzGM5Krae2X+kbnWDwYhWz+M89Igu7PbnEMKGfQym ANnyfAY+uT3yxQxjLVinT1CKbKzp7mEmAgtxYAlj4xzDbzhCo8McQ1LbP7tlS09fRArE rjzzVH6KogNi/BE6GdY/ZVJDmzf8Ox+C04rA+fZIs0LAdskDcPtFZZzrqn8nS8SCBEww rgcQ== MIME-Version: 1.0 X-Received: by 10.182.81.233 with SMTP id d9mr4151821oby.43.1372338833455; Thu, 27 Jun 2013 06:13:53 -0700 (PDT) Received: by 10.76.126.195 with HTTP; Thu, 27 Jun 2013 06:13:53 -0700 (PDT) Date: Thu, 27 Jun 2013 15:13:53 +0200 Message-ID: Subject: ZFS-backed NFS export with vSphere From: Zoltan Arnold NAGY To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 13:13:54 -0000 Hi list, I'd love to have a ZFS-backed NFS export as my VM datastore, but as much as I'd like to tune it, the performance doesn't even get close to Solaris 11's. I currently have the system set up as this: pool: tank state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 logs ada0p4 ONLINE 0 0 0 spares da4 AVAIL ada0 is a samsung 840pro SSD, which I'm using for system+ZIL. daX is 1TB, 7200rpm seagate disks. (from this test's perspective, if I use a separate ZIL device or just a partition, doesn't matter - I get roughly the same numbers). The first thing I noticed is that the FSINFO reply from FreeBSD is advertising untunable values (I did not find them documented either in the manpage, or as a sysctl). rtmax, rtpref, wtmax, wtpref: 64k (fbsd), 1M (solaris) dtpref: 64k (fbsd), 8k (solaris) After manually patching the nfs code (changing NFS_MAXBSIZE to 1M instead of MAXBSIZE) to adversize the same read/write values (didn't touch dtpref), my performance went up from 17MB/s to 76MB/s. Is there a reason NFS_MAXBSIZE is not tunable and/or is it so slow? Here's my iozone output (which is run on an ext4 partition created on a linux VM which has a disk backed by the NFS exported from the FreeBSD box): Record Size 4096 KB File size set to 2097152 KB Command line used: iozone -b results.xls -r 4m -s 2g -t 6 -i 0 -i 1 -i 2 Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 6 processes Each process writes a 2097152 Kbyte file in 4096 Kbyte records Children see throughput for 6 initial writers = 76820.31 KB/sec Parent sees throughput for 6 initial writers = 74899.44 KB/sec Min throughput per process = 12298.62 KB/sec Max throughput per process = 12972.72 KB/sec Avg throughput per process = 12803.38 KB/sec Min xfer = 1990656.00 KB Children see throughput for 6 rewriters = 76030.99 KB/sec Parent sees throughput for 6 rewriters = 75062.91 KB/sec Min throughput per process = 12620.45 KB/sec Max throughput per process = 12762.80 KB/sec Avg throughput per process = 12671.83 KB/sec Min xfer = 2076672.00 KB Children see throughput for 6 readers = 114221.39 KB/sec Parent sees throughput for 6 readers = 113942.71 KB/sec Min throughput per process = 18920.14 KB/sec Max throughput per process = 19183.80 KB/sec Avg throughput per process = 19036.90 KB/sec Min xfer = 2068480.00 KB Children see throughput for 6 re-readers = 117018.50 KB/sec Parent sees throughput for 6 re-readers = 116917.01 KB/sec Min throughput per process = 19436.28 KB/sec Max throughput per process = 19590.40 KB/sec Avg throughput per process = 19503.08 KB/sec Min xfer = 2080768.00 KB Children see throughput for 6 random readers = 110072.68 KB/sec Parent sees throughput for 6 random readers = 109698.99 KB/sec Min throughput per process = 18260.33 KB/sec Max throughput per process = 18442.55 KB/sec Avg throughput per process = 18345.45 KB/sec Min xfer = 2076672.00 KB Children see throughput for 6 random writers = 76389.71 KB/sec Parent sees throughput for 6 random writers = 74816.45 KB/sec Min throughput per process = 12592.09 KB/sec Max throughput per process = 12843.75 KB/sec Avg throughput per process = 12731.62 KB/sec Min xfer = 2056192.00 KB The other interesting this is that you can notice the system doesn't cache the data file to ram (the box has 32G), so even for re-reads I get miserable numbers. With solaris, the re-reads happen at nearly wire spead. Any ideas what else I could tune? While 76MB/s is much better than the original 17MB I was seeing, it's still far from Solaris's ~220MB/s... Thanks a lot, Zoltan