From owner-freebsd-fs@freebsd.org  Fri Oct 21 09:20:02 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EFBC6C1BBFA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 21 Oct 2016 09:20:02 +0000 (UTC)
 (envelope-from marek.salwerowicz@misal.pl)
Received: from mail3.misal.pl (mail3.misal.pl [83.19.131.174])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7B1F81735
 for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2016 09:20:01 +0000 (UTC)
 (envelope-from marek.salwerowicz@misal.pl)
Received: from localhost (mail3.misal.pl [127.0.0.1])
 by mail3.misal.pl (Postfix) with ESMTP id 993B81BCEB
 for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2016 11:12:46 +0200 (CEST)
X-Virus-Scanned: amavisd
X-Spam-Flag: NO
X-Spam-Score: -3
X-Spam-Level: 
X-Spam-Status: No, score=-3 tagged_above=-9999 required=9
 tests=[ALL_TRUSTED=-1, BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1] autolearn=ham autolearn_force=no
Authentication-Results: mail3.misal.pl (amavisd-new); dkim=pass (1024-bit key)
 header.d=misal.pl
Received: from mail3.misal.pl ([127.0.0.1])
 by localhost (mail3.misal.pl [127.0.0.1]) (amavisd-new, port 10024)
 with LMTP id WdddxqQqKDxc for <freebsd-fs@freebsd.org>;
 Fri, 21 Oct 2016 11:12:31 +0200 (CEST)
DKIM-Filter: OpenDKIM Filter v2.10.3 mail3.misal.pl E6A79316F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=misal.pl;
 s=misal.pl; t=1477041151;
 bh=hgcnrImbNeiYfC7ps4oe0uwDZV8s/MKDstdatsoZ2V8=;
 h=From:Subject:To:Date:From;
 b=p2iykMucw0bmoIa+3st2bc5wH7jL4ob8Co/qRTWg3uYkL1t4SID6nUaqhRwGxU7xy
 8ZgHhwtuSv+kBr0dZwqmiNxdougQgpfjCZRlwlHAXUW0yoWphC+tYQtlM4ohdPOQKl
 /YqtlMQrPwee0kEm+btrIkvjfieavXpyY5Wy0780=
From: Marek Salwerowicz <marek.salwerowicz@misal.pl>
Subject: ZFS - NFS server for VMware ESXi issues
To: freebsd-fs@freebsd.org
Message-ID: <930df17b-8db8-121a-a24b-b4909b8162dc@misal.pl>
Date: Fri, 21 Oct 2016 11:12:21 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2016 09:20:03 -0000

Hi list,

I run a following server:

- Supermicro 6047R-E1R36L
- 32 GB RAM
- 1x INTEL CPU E5-2640 v2 @ 2.00GHz
- FreeBSD 10.1

Drive for OS:
- HW RAID1: 2x KINGSTON SV300S37A120G

zpool:
- 18x WD RED 4TB @ raidz2
- log: mirrored Intel 730 SSD

atime disabled on zfs datasets.
No NFS tuning
MTU 9000

The box works as a NFS filer for 3 VMware ESXi (5.0, 5.1, 5.5) servers 
and iSCSI drive for one VM requiring huge space over 1Gbit Network. 
Interfaces on the server are aggregated 4x1Gbit.

Current usage:
# zpool list
NAME    SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH ALTROOT
tank1    65T  27.3T  37.7T      -         -    41%  1.00x  ONLINE  -

The box has been working fine for about two years. However, about two 
weeks ago we experienced a NFS service unavailability.
ESXi servers lost NFS connection to the filer (shares were grayed out).

'top' command on the filer has shown that "nfsd: server" process hangs 
with consuming hundreds of CPU time (I didn't capture the output).
# service nfsd restart didn't help, the only solution was cold-rebooting 
machine

After the reboot I performed system upgrade (was running on 9.2-RELEASE 
before) to 10.1-RELEASE

Today, after two weeks of working, we experienced the same situation. 
The nfsd service was in following state:

   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
   984 root        128  20    0 12344K  4020K vq->vq  8 346:27 0.00% nfsd

nfsd service didn't respond to service nfsd restart, but this time 
machine was able to reboot using "# reboot" command.


nfsd service works in threaded model, with 128 threads (default)

Current top output:
last pid:  2535;  load averages:  0.13,  0.14, 0.15 up 0+04:29:06  11:00:24
36 processes:  1 running, 35 sleeping
CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
Mem: 5724K Active, 48M Inact, 25G Wired, 173M Buf, 5828M Free
ARC: 23G Total, 7737M MFU, 16G MRU, 16M Anon, 186M Header, 89M Other
Swap: 32G Total, 32G Free
  Displaying threads as a count
   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
  1949 root        129  24    0 12344K  4132K rpcsvc 10   7:15 1.27% nfsd
  1025 root          1  20    0 14472K  1936K select 15   0:01 0.00% powerd
  1021 root          1  20    0 26096K 18016K select  2   0:01 0.00% ntpd
  1147 marek         1  20    0 86488K  7592K select  2   0:01 0.00% sshd
  1083 root          1  20    0 24120K  5980K select  0   0:00 0.00% 
sendmail
  1052 root          1  20    0 30720K  5356K nanslp  6   0:00 0.00% smartd
  1260 root          1  20    0 23576K  3576K pause   4   0:00 0.00% csh
  1948 root          1  28    0 24632K  5832K select 10   0:00 0.00% nfsd
  1144 root          1  20    0 86488K  7544K select  9   0:00 0.00% sshd
  1148 marek         1  21    0 24364K  4324K pause   1   0:00 0.00% zsh
   852 root          1  20    0 16584K  2192K select  3   0:00 0.00% rpcbind
  1926 root          1  52    0 26792K  5956K select  1   0:00 0.00% mountd
   848 root          1  20    0 14504K  2144K select 13   0:00 0.00% syslogd
  1258 root          1  25    0 50364K  3468K select  6   0:00 0.00% sudo
  2432 marek         1  20    0 86488K  7620K select 15   0:00 0.00% sshd
  1090 root          1  49    0 16596K  2344K nanslp  0   0:00 0.00% cron
  2535 root          1  20    0 21920K  4252K CPU15  15   0:00 0.00% top
  1259 root          1  28    0 47708K  2808K wait   12   0:00 0.00% su
  2433 marek         1  20    0 24364K  4272K ttyin   1   0:00 0.00% zsh
  2429 root          1  20    0 86488K  7616K select 15   0:00 0.00% sshd
   751 root          1  20    0 13164K  4548K select 13   0:00 0.00% devd
  1086 smmsp         1  20    0 24120K  5648K pause  11   0:00 0.00% 
sendmail
  1080 root          1  20    0 61220K  6996K select  8   0:00 0.00% sshd
  1140 root          1  52    0 14492K  2068K ttyin   9   0:00 0.00% getty
  1138 root          1  52    0 14492K  2068K ttyin  15   0:00 0.00% getty
  1141 root          1  52    0 14492K  2068K ttyin  10   0:00 0.00% getty
  1143 root          1  52    0 14492K  2068K ttyin   0   0:00 0.00% getty
  1136 root          1  52    0 14492K  2068K ttyin   6   0:00 0.00% getty
  1139 root          1  52    0 14492K  2068K ttyin   2   0:00 0.00% getty
  1142 root          1  52    0 14492K  2068K ttyin   4   0:00 0.00% getty
  1137 root          1  52    0 14492K  2068K ttyin  13   0:00 0.00% getty
   953 root          1  20    0 27556K  3468K select  5   0:00 0.00% ctld
   152 root          1  52    0 12336K  1800K pause   9   0:00 0.00% 
adjkerntz
   734 root          1  52    0 16708K  2044K select  1   0:00 0.00% moused
   692 root          1  52    0 16708K  2040K select  8   0:00 0.00% moused
   713 root          1  52    0 16708K  2044K select  8   0:00 0.00% moused


and with I/O output:
last pid:  2535;  load averages:  0.09,  0.12, 0.15 up 0+04:30:05  11:01:23
36 processes:  1 running, 35 sleeping
CPU:  0.1% user,  0.0% nice,  1.7% system,  0.1% interrupt, 98.2% idle
Mem: 5208K Active, 49M Inact, 25G Wired, 173M Buf, 5821M Free
ARC: 23G Total, 7736M MFU, 16G MRU, 9744K Anon, 186M Header, 89M Other
Swap: 32G Total, 32G Free

   PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
  1949 root          360     31      0    131      0    131 100.00% nfsd
  1025 root            8      0      0      0      0      0   0.00% powerd
  1021 root            2      0      0      0      0      0   0.00% ntpd
  1147 marek           2      0      0      0      0      0   0.00% sshd
  1083 root            0      0      0      0      0      0   0.00% sendmail
  1052 root            0      0      0      0      0      0   0.00% smartd
  1260 root            0      0      0      0      0      0   0.00% csh
  1948 root            0      0      0      0      0      0   0.00% nfsd
  1144 root            0      0      0      0      0      0   0.00% sshd


My questions:
1. Since we are reaching ~30 TB of allocated space, could it be memory 
lacks ( magic rule 1 GB of RAM for 1 TB of  ZFS storage space)
2. Does NFS server need tuning in a standard 1Gbit network environment ? 
We use lagg aggregation and agree that for one ESXi server have at most 
1Gbit of throughput ? Is 128 threads too much ?
3. Could SMART tests have side effect in I/O performance that result in 
NFS hangs? I run quite intensively short tests (4 times per day), long 
test once per week (on weekend)


Cheers

Marek