From owner-freebsd-fs@freebsd.org Fri Oct 21 09:20:02 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EFBC6C1BBFA for ; Fri, 21 Oct 2016 09:20:02 +0000 (UTC) (envelope-from marek.salwerowicz@misal.pl) Received: from mail3.misal.pl (mail3.misal.pl [83.19.131.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7B1F81735 for ; Fri, 21 Oct 2016 09:20:01 +0000 (UTC) (envelope-from marek.salwerowicz@misal.pl) Received: from localhost (mail3.misal.pl [127.0.0.1]) by mail3.misal.pl (Postfix) with ESMTP id 993B81BCEB for ; Fri, 21 Oct 2016 11:12:46 +0200 (CEST) X-Virus-Scanned: amavisd X-Spam-Flag: NO X-Spam-Score: -3 X-Spam-Level: X-Spam-Status: No, score=-3 tagged_above=-9999 required=9 tests=[ALL_TRUSTED=-1, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1] autolearn=ham autolearn_force=no Authentication-Results: mail3.misal.pl (amavisd-new); dkim=pass (1024-bit key) header.d=misal.pl Received: from mail3.misal.pl ([127.0.0.1]) by localhost (mail3.misal.pl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id WdddxqQqKDxc for ; Fri, 21 Oct 2016 11:12:31 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail3.misal.pl E6A79316F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=misal.pl; s=misal.pl; t=1477041151; bh=hgcnrImbNeiYfC7ps4oe0uwDZV8s/MKDstdatsoZ2V8=; h=From:Subject:To:Date:From; b=p2iykMucw0bmoIa+3st2bc5wH7jL4ob8Co/qRTWg3uYkL1t4SID6nUaqhRwGxU7xy 8ZgHhwtuSv+kBr0dZwqmiNxdougQgpfjCZRlwlHAXUW0yoWphC+tYQtlM4ohdPOQKl /YqtlMQrPwee0kEm+btrIkvjfieavXpyY5Wy0780= From: Marek Salwerowicz Subject: ZFS - NFS server for VMware ESXi issues To: freebsd-fs@freebsd.org Message-ID: <930df17b-8db8-121a-a24b-b4909b8162dc@misal.pl> Date: Fri, 21 Oct 2016 11:12:21 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2016 09:20:03 -0000 Hi list, I run a following server: - Supermicro 6047R-E1R36L - 32 GB RAM - 1x INTEL CPU E5-2640 v2 @ 2.00GHz - FreeBSD 10.1 Drive for OS: - HW RAID1: 2x KINGSTON SV300S37A120G zpool: - 18x WD RED 4TB @ raidz2 - log: mirrored Intel 730 SSD atime disabled on zfs datasets. No NFS tuning MTU 9000 The box works as a NFS filer for 3 VMware ESXi (5.0, 5.1, 5.5) servers and iSCSI drive for one VM requiring huge space over 1Gbit Network. Interfaces on the server are aggregated 4x1Gbit. Current usage: # zpool list NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT tank1 65T 27.3T 37.7T - - 41% 1.00x ONLINE - The box has been working fine for about two years. However, about two weeks ago we experienced a NFS service unavailability. ESXi servers lost NFS connection to the filer (shares were grayed out). 'top' command on the filer has shown that "nfsd: server" process hangs with consuming hundreds of CPU time (I didn't capture the output). # service nfsd restart didn't help, the only solution was cold-rebooting machine After the reboot I performed system upgrade (was running on 9.2-RELEASE before) to 10.1-RELEASE Today, after two weeks of working, we experienced the same situation. The nfsd service was in following state: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 984 root 128 20 0 12344K 4020K vq->vq 8 346:27 0.00% nfsd nfsd service didn't respond to service nfsd restart, but this time machine was able to reboot using "# reboot" command. nfsd service works in threaded model, with 128 threads (default) Current top output: last pid: 2535; load averages: 0.13, 0.14, 0.15 up 0+04:29:06 11:00:24 36 processes: 1 running, 35 sleeping CPU: 0.0% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.9% idle Mem: 5724K Active, 48M Inact, 25G Wired, 173M Buf, 5828M Free ARC: 23G Total, 7737M MFU, 16G MRU, 16M Anon, 186M Header, 89M Other Swap: 32G Total, 32G Free Displaying threads as a count PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1949 root 129 24 0 12344K 4132K rpcsvc 10 7:15 1.27% nfsd 1025 root 1 20 0 14472K 1936K select 15 0:01 0.00% powerd 1021 root 1 20 0 26096K 18016K select 2 0:01 0.00% ntpd 1147 marek 1 20 0 86488K 7592K select 2 0:01 0.00% sshd 1083 root 1 20 0 24120K 5980K select 0 0:00 0.00% sendmail 1052 root 1 20 0 30720K 5356K nanslp 6 0:00 0.00% smartd 1260 root 1 20 0 23576K 3576K pause 4 0:00 0.00% csh 1948 root 1 28 0 24632K 5832K select 10 0:00 0.00% nfsd 1144 root 1 20 0 86488K 7544K select 9 0:00 0.00% sshd 1148 marek 1 21 0 24364K 4324K pause 1 0:00 0.00% zsh 852 root 1 20 0 16584K 2192K select 3 0:00 0.00% rpcbind 1926 root 1 52 0 26792K 5956K select 1 0:00 0.00% mountd 848 root 1 20 0 14504K 2144K select 13 0:00 0.00% syslogd 1258 root 1 25 0 50364K 3468K select 6 0:00 0.00% sudo 2432 marek 1 20 0 86488K 7620K select 15 0:00 0.00% sshd 1090 root 1 49 0 16596K 2344K nanslp 0 0:00 0.00% cron 2535 root 1 20 0 21920K 4252K CPU15 15 0:00 0.00% top 1259 root 1 28 0 47708K 2808K wait 12 0:00 0.00% su 2433 marek 1 20 0 24364K 4272K ttyin 1 0:00 0.00% zsh 2429 root 1 20 0 86488K 7616K select 15 0:00 0.00% sshd 751 root 1 20 0 13164K 4548K select 13 0:00 0.00% devd 1086 smmsp 1 20 0 24120K 5648K pause 11 0:00 0.00% sendmail 1080 root 1 20 0 61220K 6996K select 8 0:00 0.00% sshd 1140 root 1 52 0 14492K 2068K ttyin 9 0:00 0.00% getty 1138 root 1 52 0 14492K 2068K ttyin 15 0:00 0.00% getty 1141 root 1 52 0 14492K 2068K ttyin 10 0:00 0.00% getty 1143 root 1 52 0 14492K 2068K ttyin 0 0:00 0.00% getty 1136 root 1 52 0 14492K 2068K ttyin 6 0:00 0.00% getty 1139 root 1 52 0 14492K 2068K ttyin 2 0:00 0.00% getty 1142 root 1 52 0 14492K 2068K ttyin 4 0:00 0.00% getty 1137 root 1 52 0 14492K 2068K ttyin 13 0:00 0.00% getty 953 root 1 20 0 27556K 3468K select 5 0:00 0.00% ctld 152 root 1 52 0 12336K 1800K pause 9 0:00 0.00% adjkerntz 734 root 1 52 0 16708K 2044K select 1 0:00 0.00% moused 692 root 1 52 0 16708K 2040K select 8 0:00 0.00% moused 713 root 1 52 0 16708K 2044K select 8 0:00 0.00% moused and with I/O output: last pid: 2535; load averages: 0.09, 0.12, 0.15 up 0+04:30:05 11:01:23 36 processes: 1 running, 35 sleeping CPU: 0.1% user, 0.0% nice, 1.7% system, 0.1% interrupt, 98.2% idle Mem: 5208K Active, 49M Inact, 25G Wired, 173M Buf, 5821M Free ARC: 23G Total, 7736M MFU, 16G MRU, 9744K Anon, 186M Header, 89M Other Swap: 32G Total, 32G Free PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 1949 root 360 31 0 131 0 131 100.00% nfsd 1025 root 8 0 0 0 0 0 0.00% powerd 1021 root 2 0 0 0 0 0 0.00% ntpd 1147 marek 2 0 0 0 0 0 0.00% sshd 1083 root 0 0 0 0 0 0 0.00% sendmail 1052 root 0 0 0 0 0 0 0.00% smartd 1260 root 0 0 0 0 0 0 0.00% csh 1948 root 0 0 0 0 0 0 0.00% nfsd 1144 root 0 0 0 0 0 0 0.00% sshd My questions: 1. Since we are reaching ~30 TB of allocated space, could it be memory lacks ( magic rule 1 GB of RAM for 1 TB of ZFS storage space) 2. Does NFS server need tuning in a standard 1Gbit network environment ? We use lagg aggregation and agree that for one ESXi server have at most 1Gbit of throughput ? Is 128 threads too much ? 3. Could SMART tests have side effect in I/O performance that result in NFS hangs? I run quite intensively short tests (4 times per day), long test once per week (on weekend) Cheers Marek