From owner-freebsd-fs@FreeBSD.ORG Thu Oct 23 02:50:23 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 104ACAF0; Thu, 23 Oct 2014 02:50:23 +0000 (UTC) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C277297B; Thu, 23 Oct 2014 02:50:22 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.7/8.14.7) with ESMTP id s9N2oKCL036069; Wed, 22 Oct 2014 22:50:20 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.7/8.14.4/Submit) id s9N2oKUB036066; Wed, 22 Oct 2014 22:50:20 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21576.27884.76574.977691@hergotha.csail.mit.edu> Date: Wed, 22 Oct 2014 22:50:20 -0400 From: Garrett Wollman To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org Subject: Some 9.3 NFS testing X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 22 Oct 2014 22:50:20 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED, HEADER_FROM_DIFFERENT_DOMAINS autolearn=disabled version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on hergotha.csail.mit.edu X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 02:50:23 -0000 Just thought I'd share this... I've been doing some acceptance testing on 9.3 prior to upgrading my production NFS servers. My most recent test is running bonnie++ on 192 Ubuntu VMs in parallel, to independent directories in the same server filesystem. It hasn't fallen over yet (will probably take another day or so to complete), and peaked at about 220k ops/s (but this was NFSv4 so there's no FHA and it takes at least two ops for every v3 RPC[1]). bonnie++ is running with -D (O_DIRECT), but I'm actually just using it as a load generator -- I don't care about the output. I have this system configured for a maximum of 64 nfsd threads, and the test load has had it pegged for the past eight hours. Right now all of the load generators are doing the "small file" part of bonnie++, so there's not a lot of activity but there are a lot of synchronous operations; it's been doing 60k ops/s for the past five hours. Load average maxed out at about 24 early on in the test, and has settled around 16-20 for this part of the test. Here's what nfsstat -se has to say (note: not reset for this round of testing): Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove 1566655064 230074779 162549702 0 471311053 1466525587 149235773 115496945 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 125 0 0 245 116 2032193 27485368 223929240 Mknod Fsstat Fsinfo PathConf Commit LookupP SetClId SetClIdCf 0 53 268 131 15999631 0 386 386 Open OpenAttr OpenDwnGr OpenCfrm DelePurge DeleRet GetFH Lock 80924092 0 0 194 0 0 81110394 0 LockT LockU Close Verify NVerify PutFH PutPubFH PutRootFH 0 0 80578106 0 0 1203868156 0 193 Renew RestoreFH SaveFH Secinfo RelLckOwn V4Create 1271 0 14 384 0 570 Server: Retfailed Faults Clients 0 0 191 OpenOwner Opens LockOwner Locks Delegs 192 154 0 0 0 Server Cache Stats: Inprog Idem Non-idem Misses CacheSize TCPPeak 0 0 0 -167156883 1651 115531 I'd love to mix in some FreeBSD-generated loads but as discussed a week or so ago, our NFS client can't handle reading directories from which files are being deleted. FWIW, I just ran a quick "pmcstat -T" and noted the following: PMC: [unhalted-core-cycles] Samples: 775371 (100.0%) , 3264 unresolved Key: q => exiting... %SAMP IMAGE FUNCTION CALLERS 24.0 kernel _mtx_lock_sleep _vm_map_lock:22.4 ... 4.7 kernel Xinvlrng 4.7 kernel _mtx_lock_spin pmclog_reserve 4.2 kernel _sx_xlock_hard _sx_xlock 3.8 pmcstat _init 2.5 kernel bcopy vdev_queue_io_done 1.7 kernel _sx_xlock 1.6 zfs.ko lzjb_compress zio_compress_data 1.4 zfs.ko lzjb_decompress zio_decompress 1.2 kernel _sx_xunlock 1.2 kernel ipfw_chk ipfw_check_hook 1.1 libc.so.7 bsearch 1.0 zfs.ko fletcher_4_native zio_checksum_compute 1.0 kernel vm_page_splay vm_page_find_least 1.0 kernel cpu_idle_mwait sched_idletd 1.0 kernel free 0.9 kernel bzero 0.9 kernel cpu_search_lowest cpu_search_lowest 0.8 kernel vm_map_entry_splay vm_map_lookup_entry 0.8 kernel cpu_search_highest cpu_search_highest I doubt that this is news to anybody. Once I get the production servers upgraded to 9.3, I'll be ready to start testing 10.1 on this same setup. -GAWollman [1] I did previous testing, with smaller numbers of clients, using v3 as that is what we currently require our clients to use. I switched to v4 to try out the worst case -- after finding an OpenStack bug that was preventing me from starting more than 16 load generators at a time.