From owner-freebsd-fs@FreeBSD.ORG Tue Aug 27 23:05:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9CDFD377 for ; Tue, 27 Aug 2013 23:05:24 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-oa0-x22e.google.com (mail-oa0-x22e.google.com [IPv6:2607:f8b0:4003:c02::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 62F1829F0 for ; Tue, 27 Aug 2013 23:05:24 +0000 (UTC) Received: by mail-oa0-f46.google.com with SMTP id j10so6746870oah.33 for ; Tue, 27 Aug 2013 16:05:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=XtjaKFRytuvj8uW53qXhyttuV9UcXueIxnq5V0Ov1tc=; b=BKJx4Bb2SAMkZQwWLAipm2oObQWKxiGwCSFwQJthLWeulDNtK6Q06CT/+h5K3aCmXt peeUpmc0VMXiHnGIPqOM8en5S2C9udTHQRVzKdjckgmWIKSMnb6vk3MpyULNDg6YwAf1 nexJds6pW3PxnJ3eF5K30M6zurEb+aqfKKXqeWZK7FdepVfwUJebcnRQ9ag+jlt73mh7 TTPYZV4O0FwpiJ/D7PYmCkqV/usdgzJpr6euMdrplLyLD1Kk3yLPjI7gfTdTl9L7EJvR s7qaHYrdncAZ7CVm96v5+Aj1anOJp+r7UgixAZrMcSeBX7Q+oi6SYWtjx0rvb2j9pfK8 pOog== MIME-Version: 1.0 X-Received: by 10.60.93.67 with SMTP id cs3mr21120908oeb.12.1377644723344; Tue, 27 Aug 2013 16:05:23 -0700 (PDT) Received: by 10.76.2.110 with HTTP; Tue, 27 Aug 2013 16:05:23 -0700 (PDT) In-Reply-To: <1115794974.14452056.1377644539106.JavaMail.root@uoguelph.ca> References: <1115794974.14452056.1377644539106.JavaMail.root@uoguelph.ca> Date: Tue, 27 Aug 2013 19:05:23 -0400 Message-ID: Subject: Re: NFS on ZFS pure SSD pool From: Outback Dingo To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 23:05:24 -0000 On Tue, Aug 27, 2013 at 7:02 PM, Rick Macklem wrote: > Outback Dingo wrote: > > > > > > > > > > > > > > On Tue, Aug 27, 2013 at 3:29 PM, Rick Macklem < rmacklem@uoguelph.ca > > > wrote: > > > > > > > > > > Eric Browning wrote: > > > Hello, first time posting to this list. I have a new server that is > > > not > > > living up to the promise of SSD speeds and NFS is maxing out the > > > CPU. > > > I'm > > > new to FreeBSD but I've been reading up on it as much as I can. I > > > have > > > obscured my IP addresses and hostname with x's so just ignore that. > > > Server has about 200 users on it each draing under 50Mb/s peak > > > sustained > > > around 1-2Mb/s > > > > > > I've followed some network tuning guides for our I350t4 nic and > > > that > > > has > > > helped with network performance somewhat but the server is still > > > experiencing heavy load with pegging the CPU at 1250% on average > > > with > > > only > > > 50Mb/s of traffin in/out of the machine. All of the network tuning > > > came > > > from https://calomel.org/freebsd_network_tuning.html since it was > > > relevant > > > to the same nic that I have. > > > > > > Server Specs: > > > FreeBSD 9.1 > > > 16 cores AMDx64 > > > 64GB of ram > > > ZFS v28 with four Intel DC S3700 drives (800GB) as a zfs stripe > > > Intel DC S3500 for ZIL and enabling/disabling has made no > > > difference > > > Used a spare DC S3700 for the ZIL and that made no difference > > > either. > > > NFS v3 & v4 for Mac home folders whose Cache fodler is redirected. > > > > > > I've tried: > > > Compression on/of <-- no appreciable difference > > > Deduplication on/off <-- no appreciable difference > > > sync=disabled and sync=standard <-- no appreciable difference > > > setting arc cache to 56GB and also to 32GB <-- no difference in > > > performance > > > in terms of kern. > > > > > > I've tried to follow the freebsd tuning guide: > > > https://wiki.freebsd.org/ZFSTuningGuide to no avail either. I've > > > read > > > everything I can find on NFS on ZFS and nothing has helped. WHere > > > am > > > I > > > going wrong? > > > > > You could try this patch: > > http://people.freebsd.org/~rmacklem/drc4-stable9.patch > > - After applying the patch and booting a kernel built from the > > patched > > sources, you need to increase the value of vfs.nfsd.tcphighwater. > > (Try something like 5000 for it as a starting point.) > > > > > > > > > > can we get a brief on what this is supposed to improve upon ? > > > It was developed for and tested by wollman@ to reduce mutex lock > contention and CPU overheads for the duplicate request cache, mainly > for NFS over TCP. (For the CPU overheads case, it allows the cache > to grow larger, reducing the frequency and, therefore, overhead of > trimming out stale entries.) > Here is the commit message, which I think covers it: > > Fix several performance related issues in the new NFS server's > DRC for NFS over TCP. > - Increase the size of the hash tables. > - Create a separate mutex for each hash list of the TCP hash table. > - Single thread the code that deletes stale cache entries. > - Add a tunable called vfs.nfsd.tcphighwater, which can be increased > to allow the cache to grow larger, avoiding the overhead of frequent > scans to delete stale cache entries. > (The default value will result in frequent scans to delete stale cache > entries, analagous to what the pre-patched code does.) > - Add a tunable called vfs.nfsd.cachetcp that can be used to disable > DRC caching for NFS over TCP, since the old NFS server didn't DRC cache > TCP. > It also adjusts the size of nfsrc_floodlevel dynamically, so that it is > always greater than vfs.nfsd.tcphighwater. > > For UDP the algorithm remains the same as the pre-patched code, but the > tunable vfs.nfsd.udphighwater can be used to allow the cache to grow > larger and reduce the overhead caused by frequent scans for stale entries. > UDP also uses a larger hash table size than the pre-patched code. > > Reported by: wollman > Tested by: wollman (earlier version of patch) > Submitted by: ivoras (earlier patch) > Reviewed by: jhb (earlier version of patch) > Thanks, much appreciated > > > > > Although this patch is somewhat different code, it should be > > semantically > > the same as r254337 in head, that is scheduled to be MFC'd to > > stable/9 in > > a couple of weeks. > > > > rick > > > > > > > > > Here's /boot/loader: > > > [quote] > > > # ZFS tuning tweaks > > > aio_load="YES" # Async IO system calls > > > autoboot_delay="10" # reduce boot menu delay time from 10 to 3 > > > seconds > > > vfs.zfs.arc_max="56868864000" # Reserves 10GB or ram for system, > > > leaves > > > 56GB for ZFS > > > vfs.zfs.cache_flush_disable="1" > > > #vfs.zfs.prefetch_disble="1" > > > vfs.zfs.write_limit_override="429496728" > > > > > > kern.ipc.nmbclusters="264144" # increase the number of network > > > mbufs > > > kern.maxfiles="65535" > > > net.inet.tcp.syncache.hashsize="1024" # Size of the syncache hash > > > table > > > net.inet.tcp.syncache.bucketlimit="100" # Limit the number of > > > entries > > > permitted in each bucket of the hash table. > > > net.inet.tcp.tcbhashsize="32768" > > > > > > # Link Aggregation loader tweaks. see: > > > https://calomel.org/freebsd_network_tuning.html > > > hw.igb.enable_msix="1" > > > hw.igb.num_queues="0" > > > hw.igb.enable_aim="1" > > > hw.igb.max_interrupt_rate="32000" > > > hw.igb.rxd="2048" > > > hw.igb.txd="2048" > > > hw.igb.rx_process_limit="4096" > > > if_lagg_load="YES" > > > [/quote] > > > > > > Here's etc/sysctl.conf: > > > [quote] > > > # $FreeBSD: release/9.1.0/etc/sysctl.conf 112200 2003-03-13 > > > 18:43:50Z > > > mux $ > > > # > > > # This file is read when going to multi-user and its contents piped > > > thru > > > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for > > > details. > > > # > > > > > > # Uncomment this to prevent users from seeing information about > > > processes > > > that > > > # are being run under another UID. > > > #security.bsd.see_other_uids=0 > > > kern.ipc.somaxconn=1024 > > > kern.maxusers=272 > > > #kern.maxvnodes=1096848 #increase this if necessary > > > kern.ipc.maxsockbuf=8388608 > > > net.inet.tcp.mssdflt=1460 > > > net.inet.ip.forwarding=1 > > > net.inet.ip.fastforwarding=1 > > > dev.igb.2.fc=0 > > > dev.igb.3.fc=0 > > > dev.igb.4.fc=0 > > > dev.igb.5.fc=0 > > > dev.igb.2.rx_procesing_limit=10000 > > > dev.igb.3.rx_procesing_limit=10000 > > > dev.igb.4.rx_procesing_limit=10000 > > > dev.igb.5.rx_procesing_limit=10000 > > > net.inet.ip.redirect=0 > > > net.inet.icmp.bmcastecho=0 # do not respond to ICMP packets sent > > > to IP > > > .255 > > > net.inet.icmp.maskfake=0 # do not fake reply to ICMP Address > > > Mask > > > Request packets > > > net.inet.icmp.maskrepl=0 # replies are not sent for ICMP > > > address mask > > > net.inet.icmp.log_redirect=0 # do not log redirected ICMP packet > > > attempts > > > net.inet.icmp.drop_redirect=1 # no redirected ICMP packets > > > net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on > > > initial > > > connection > > > net.inet.tcp.ecn.enable=1 # explicit congestion notification > > > (ecn) > > > warning: some ISP routers abuse it > > > net.inet.tcp.icmp_may_rst=0 # icmp may not send RST to avoid > > > spoofed > > > icmp/udp floods > > > net.inet.tcp.maxtcptw=15000 # max number of tcp time_wait states > > > for > > > closing connections > > > net.inet.tcp.msl=5000 # 5 second maximum segment life > > > waiting for > > > an ACK in reply to a SYN-ACK or FIN-ACK > > > net.inet.tcp.path_mtu_discovery=0 # disable MTU discovery since > > > most > > > ICMP > > > packets are dropped by others > > > net.inet.tcp.rfc3042=0 # disable the limited transmit > > > mechanism > > > which can slow burst transmissions > > > net.inet.ip.rtexpire=60 # 3600 secs > > > net.inet.ip.rtminexpire=2 # 10 secs > > > net.inet.ip.rtmaxcache=1024 # 128 entries > > > [/quote] > > > > > > Here's /etc/rc.conf > > > [quote] > > > #ifconfig_igb2=" inet xxx.xx.x.xx netmask 255.255.248.0" > > > hostname="xxxxxxxxxxxxxxxxxxx" > > > # > > > # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable > > > dumpdev="NO" > > > # > > > ### LACP config > > > ifconfig_igb2="up" > > > ifconfig_igb3="up" > > > ifconfig_igb4="up" > > > ifconfig_igb5="up" > > > cloned_interfaces="lagg0" > > > ifconfig_lagg0="laggproto lacp laggport igb2 laggport igb3 laggport > > > igb4 > > > laggport igb5 xxx.xx.x.xx netmask 255.255.248.0" > > > ipvr_addrs_lagg0="xxx.xx.x.xx" > > > defaultrouter="xxx.xx.x.xx" > > > # > > > ### Defaults for SSH, NTP, ZFS > > > sshd_enable="YES" > > > ntpd_enable="YES" > > > zfs_enable="YES" > > > # > > > ## NFS Server > > > rpcbind_enable="YES" > > > nfs_server_enable="YES" > > > mountd_flags="-r -l" > > > nfsd_enable="YES" > > > mountd_enable="YES" > > > rpc_lockd_enable="NO" > > > rpc_statd_enable="NO" > > > nfs_server_flags="-u -t -n 128" > > > nfsv4_server_enable="YES" > > > nfsuserd_enable="YES" > > > [/quote] > > > > > > Thanks in advance, > > > -- > > > Eric Browning > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to " > > > freebsd-fs-unsubscribe@freebsd.org " > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to " freebsd-fs-unsubscribe@freebsd.org > > " > > > > >