Date: Tue, 27 Aug 2013 19:05:23 -0400 From: Outback Dingo <outbackdingo@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs@freebsd.org Subject: Re: NFS on ZFS pure SSD pool Message-ID: <CAKYr3zw2-0cGsUoi8bo-iYf2sLyxdSLXSjMbxwMBZWOyDx2OCQ@mail.gmail.com> In-Reply-To: <1115794974.14452056.1377644539106.JavaMail.root@uoguelph.ca> References: <CAKYr3zxcYKq_VUr=bhpwXLbMZQH==Et_1Ue-rsiCUSU5AZiFMg@mail.gmail.com> <1115794974.14452056.1377644539106.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 27, 2013 at 7:02 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Outback Dingo wrote: > > > > > > > > > > > > > > On Tue, Aug 27, 2013 at 3:29 PM, Rick Macklem < rmacklem@uoguelph.ca > > > wrote: > > > > > > > > > > Eric Browning wrote: > > > Hello, first time posting to this list. I have a new server that is > > > not > > > living up to the promise of SSD speeds and NFS is maxing out the > > > CPU. > > > I'm > > > new to FreeBSD but I've been reading up on it as much as I can. I > > > have > > > obscured my IP addresses and hostname with x's so just ignore that. > > > Server has about 200 users on it each draing under 50Mb/s peak > > > sustained > > > around 1-2Mb/s > > > > > > I've followed some network tuning guides for our I350t4 nic and > > > that > > > has > > > helped with network performance somewhat but the server is still > > > experiencing heavy load with pegging the CPU at 1250% on average > > > with > > > only > > > 50Mb/s of traffin in/out of the machine. All of the network tuning > > > came > > > from https://calomel.org/freebsd_network_tuning.html since it was > > > relevant > > > to the same nic that I have. > > > > > > Server Specs: > > > FreeBSD 9.1 > > > 16 cores AMDx64 > > > 64GB of ram > > > ZFS v28 with four Intel DC S3700 drives (800GB) as a zfs stripe > > > Intel DC S3500 for ZIL and enabling/disabling has made no > > > difference > > > Used a spare DC S3700 for the ZIL and that made no difference > > > either. > > > NFS v3 & v4 for Mac home folders whose Cache fodler is redirected. > > > > > > I've tried: > > > Compression on/of <-- no appreciable difference > > > Deduplication on/off <-- no appreciable difference > > > sync=disabled and sync=standard <-- no appreciable difference > > > setting arc cache to 56GB and also to 32GB <-- no difference in > > > performance > > > in terms of kern. > > > > > > I've tried to follow the freebsd tuning guide: > > > https://wiki.freebsd.org/ZFSTuningGuide to no avail either. I've > > > read > > > everything I can find on NFS on ZFS and nothing has helped. WHere > > > am > > > I > > > going wrong? > > > > > You could try this patch: > > http://people.freebsd.org/~rmacklem/drc4-stable9.patch > > - After applying the patch and booting a kernel built from the > > patched > > sources, you need to increase the value of vfs.nfsd.tcphighwater. > > (Try something like 5000 for it as a starting point.) > > > > > > > > > > can we get a brief on what this is supposed to improve upon ? > > > It was developed for and tested by wollman@ to reduce mutex lock > contention and CPU overheads for the duplicate request cache, mainly > for NFS over TCP. (For the CPU overheads case, it allows the cache > to grow larger, reducing the frequency and, therefore, overhead of > trimming out stale entries.) > Here is the commit message, which I think covers it: > > Fix several performance related issues in the new NFS server's > DRC for NFS over TCP. > - Increase the size of the hash tables. > - Create a separate mutex for each hash list of the TCP hash table. > - Single thread the code that deletes stale cache entries. > - Add a tunable called vfs.nfsd.tcphighwater, which can be increased > to allow the cache to grow larger, avoiding the overhead of frequent > scans to delete stale cache entries. > (The default value will result in frequent scans to delete stale cache > entries, analagous to what the pre-patched code does.) > - Add a tunable called vfs.nfsd.cachetcp that can be used to disable > DRC caching for NFS over TCP, since the old NFS server didn't DRC cache > TCP. > It also adjusts the size of nfsrc_floodlevel dynamically, so that it is > always greater than vfs.nfsd.tcphighwater. > > For UDP the algorithm remains the same as the pre-patched code, but the > tunable vfs.nfsd.udphighwater can be used to allow the cache to grow > larger and reduce the overhead caused by frequent scans for stale entries. > UDP also uses a larger hash table size than the pre-patched code. > > Reported by: wollman > Tested by: wollman (earlier version of patch) > Submitted by: ivoras (earlier patch) > Reviewed by: jhb (earlier version of patch) > Thanks, much appreciated > > > > > Although this patch is somewhat different code, it should be > > semantically > > the same as r254337 in head, that is scheduled to be MFC'd to > > stable/9 in > > a couple of weeks. > > > > rick > > > > > > > > > Here's /boot/loader: > > > [quote] > > > # ZFS tuning tweaks > > > aio_load="YES" # Async IO system calls > > > autoboot_delay="10" # reduce boot menu delay time from 10 to 3 > > > seconds > > > vfs.zfs.arc_max="56868864000" # Reserves 10GB or ram for system, > > > leaves > > > 56GB for ZFS > > > vfs.zfs.cache_flush_disable="1" > > > #vfs.zfs.prefetch_disble="1" > > > vfs.zfs.write_limit_override="429496728" > > > > > > kern.ipc.nmbclusters="264144" # increase the number of network > > > mbufs > > > kern.maxfiles="65535" > > > net.inet.tcp.syncache.hashsize="1024" # Size of the syncache hash > > > table > > > net.inet.tcp.syncache.bucketlimit="100" # Limit the number of > > > entries > > > permitted in each bucket of the hash table. > > > net.inet.tcp.tcbhashsize="32768" > > > > > > # Link Aggregation loader tweaks. see: > > > https://calomel.org/freebsd_network_tuning.html > > > hw.igb.enable_msix="1" > > > hw.igb.num_queues="0" > > > hw.igb.enable_aim="1" > > > hw.igb.max_interrupt_rate="32000" > > > hw.igb.rxd="2048" > > > hw.igb.txd="2048" > > > hw.igb.rx_process_limit="4096" > > > if_lagg_load="YES" > > > [/quote] > > > > > > Here's etc/sysctl.conf: > > > [quote] > > > # $FreeBSD: release/9.1.0/etc/sysctl.conf 112200 2003-03-13 > > > 18:43:50Z > > > mux $ > > > # > > > # This file is read when going to multi-user and its contents piped > > > thru > > > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for > > > details. > > > # > > > > > > # Uncomment this to prevent users from seeing information about > > > processes > > > that > > > # are being run under another UID. > > > #security.bsd.see_other_uids=0 > > > kern.ipc.somaxconn=1024 > > > kern.maxusers=272 > > > #kern.maxvnodes=1096848 #increase this if necessary > > > kern.ipc.maxsockbuf=8388608 > > > net.inet.tcp.mssdflt=1460 > > > net.inet.ip.forwarding=1 > > > net.inet.ip.fastforwarding=1 > > > dev.igb.2.fc=0 > > > dev.igb.3.fc=0 > > > dev.igb.4.fc=0 > > > dev.igb.5.fc=0 > > > dev.igb.2.rx_procesing_limit=10000 > > > dev.igb.3.rx_procesing_limit=10000 > > > dev.igb.4.rx_procesing_limit=10000 > > > dev.igb.5.rx_procesing_limit=10000 > > > net.inet.ip.redirect=0 > > > net.inet.icmp.bmcastecho=0 # do not respond to ICMP packets sent > > > to IP > > > .255 > > > net.inet.icmp.maskfake=0 # do not fake reply to ICMP Address > > > Mask > > > Request packets > > > net.inet.icmp.maskrepl=0 # replies are not sent for ICMP > > > address mask > > > net.inet.icmp.log_redirect=0 # do not log redirected ICMP packet > > > attempts > > > net.inet.icmp.drop_redirect=1 # no redirected ICMP packets > > > net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on > > > initial > > > connection > > > net.inet.tcp.ecn.enable=1 # explicit congestion notification > > > (ecn) > > > warning: some ISP routers abuse it > > > net.inet.tcp.icmp_may_rst=0 # icmp may not send RST to avoid > > > spoofed > > > icmp/udp floods > > > net.inet.tcp.maxtcptw=15000 # max number of tcp time_wait states > > > for > > > closing connections > > > net.inet.tcp.msl=5000 # 5 second maximum segment life > > > waiting for > > > an ACK in reply to a SYN-ACK or FIN-ACK > > > net.inet.tcp.path_mtu_discovery=0 # disable MTU discovery since > > > most > > > ICMP > > > packets are dropped by others > > > net.inet.tcp.rfc3042=0 # disable the limited transmit > > > mechanism > > > which can slow burst transmissions > > > net.inet.ip.rtexpire=60 # 3600 secs > > > net.inet.ip.rtminexpire=2 # 10 secs > > > net.inet.ip.rtmaxcache=1024 # 128 entries > > > [/quote] > > > > > > Here's /etc/rc.conf > > > [quote] > > > #ifconfig_igb2=" inet xxx.xx.x.xx netmask 255.255.248.0" > > > hostname="xxxxxxxxxxxxxxxxxxx" > > > # > > > # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable > > > dumpdev="NO" > > > # > > > ### LACP config > > > ifconfig_igb2="up" > > > ifconfig_igb3="up" > > > ifconfig_igb4="up" > > > ifconfig_igb5="up" > > > cloned_interfaces="lagg0" > > > ifconfig_lagg0="laggproto lacp laggport igb2 laggport igb3 laggport > > > igb4 > > > laggport igb5 xxx.xx.x.xx netmask 255.255.248.0" > > > ipvr_addrs_lagg0="xxx.xx.x.xx" > > > defaultrouter="xxx.xx.x.xx" > > > # > > > ### Defaults for SSH, NTP, ZFS > > > sshd_enable="YES" > > > ntpd_enable="YES" > > > zfs_enable="YES" > > > # > > > ## NFS Server > > > rpcbind_enable="YES" > > > nfs_server_enable="YES" > > > mountd_flags="-r -l" > > > nfsd_enable="YES" > > > mountd_enable="YES" > > > rpc_lockd_enable="NO" > > > rpc_statd_enable="NO" > > > nfs_server_flags="-u -t -n 128" > > > nfsv4_server_enable="YES" > > > nfsuserd_enable="YES" > > > [/quote] > > > > > > Thanks in advance, > > > -- > > > Eric Browning > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to " > > > freebsd-fs-unsubscribe@freebsd.org " > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to " freebsd-fs-unsubscribe@freebsd.org > > " > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKYr3zw2-0cGsUoi8bo-iYf2sLyxdSLXSjMbxwMBZWOyDx2OCQ>