FreeBSD Mail Archives

Date:      Sat, 14 Mar 2020 16:36:49 +0100
From:      Peter Eriksson <pen@lysator.liu.se>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   ZFS/NFS hickups and some tools to monitor stuff...
Message-ID:  <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se>

next in thread | raw e-mail | index | archive | help


The last couple of weeks I’ve been fighting with a severe case of NFS users complaining about slow response times from our (5) FreeBSD 11.3-RELEASE-p6 file servers. Now even though our SMB (Windows) users (thankfully since they are like 500 per server vs 50 NFS users) didn’t see the same slowdown (or atleast didn’t complain about it) the root cause is probably ZFS-related.

We’ve identified a number of cases where some ZFS operation can cause severe slowdown of NFS operations, and I’ve been trying to figure our what is the cause and ways to mitigate the problem…

Some operations that have caused issues:

1. Resilver (basically made NFS service useless during the week it took…) with response time for NFS operations regularity up to 10 seconds or more (vs the normal 1-10ms)

2. Snapshot recursive deferred destruction (“zfs destroy -dr DATA@snapnam”). Especially bad together with filesystems at or near quota.

3. Rsync cloning of data into the servers. Response times up to 15 minutes was seen… Yes, 15 minutes to do a mkdir(“test-dir”). Possibly in conjunction with #1 above….

Previously #1 and #2 hasn’t caused that much problems, and #3 definitely. Something has changed the last half year or so but so far I haven’t been able to figure it out.

In order to test/figure things out I’ve written a couple of tools that other might be finding useful:


A couple of Dtrace scripts to watch and monitor timing of kernel calls related to NFS, GSS & ZFS operations (the “nfsuser” script is a simple “nfs user top” tool). Also a patch that adds a sysctl than can be set and have the NFS kernel code be verbose about slow NFS operations:

  - https://github.com/ptrrkssn/freebsd-stuff <https://github.com/ptrrkssn/freebsd-stuff>;

With the kernel patch (nfsd-verbose-timing-11.3.patch or nfsd-verbose-timing-12.1.patch) then it’ll look like this:

(Altert if op takes more than 1000us):
# sysctl vfs.nfsd.verbose_timing=1000

nfsrvd_dorpc(vers=4.1, uid=65534, procnum=9, repstat=0) took 2853 µs
nfsrvd_dorpc(vers=4.0, uid=1003258, procnum=17, repstat=10009) took 5433 µs
nfsrvd_dorpc(vers=4.1, uid=65534, procnum=9, repstat=0) took 2026 µs

(The Dtrace scripts can do similar stuff, but just in case I wanted a real in-kernel way to see if delays are caused by something on the machine or something else (network delays etc).



A very simple C program that basically in a loop tests some simple filesystem operations and measures the time it takes:

  - https://github.com/ptrrkssn/pfst <https://github.com/ptrrkssn/pfst>;


Mount a number of fileservers on /mnt/filur01, /mnt/filur02, /mnt/filur03 etc, then run this to have It complain if an operation takes more than 100ms):
(With “-v” will print time time each operation takes)

$ ./pfst -t100ms /mnt/filur0*
2020-03-14 10:56:36 [ 109 ms]: /mnt/filur03: mkdir("t-omnibus-615-53202") [Time limit exceeded]
2020-03-14 11:14:22 [ 536 s ]: /mnt/filur03: mkdir("t-omnibus-615-53729") [Time limit exceeded]
2020-03-14 11:31:47 [  13 m ]: /mnt/filur03: mkdir("t-omnibus-615-53965") [Time limit exceeded]
2020-03-14 11:35:44 [ 182 ms]: /mnt/filur03: mkdir("t-omnibus-615-54201") [Time limit exceeded]
2020-03-14 12:20:03 [  15 m ]: /mnt/filur03: mkdir("t-omnibus-615-55908") [Time limit exceeded]
2020-03-14 12:39:09 [  15 m ]: /mnt/filur03: mkdir("t-omnibus-615-56103") [Time limit exceeded]
2020-03-14 12:50:58 [ 466 s ]: /mnt/filur03: mkdir("t-omnibus-615-56344") [Time limit exceeded]

With “-v”:
2020-03-14 16:27:48 [1349 µs]: /mnt/filur01: mkdir("t-omnibus-637-2")
2020-03-14 16:27:48 [ 327 µs]: /mnt/filur01: rmdir("t-omnibus-637-2")


It would be interresting to see if others too are seeing ZFS and/or NFS slowdowns during heavy writing operations (resilver, snapshot-destroy, rsync)…


Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB SSD:s for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX set to 64GB (used to be 128GB but we ran into out-of-memory with the 500+ Samba smbd daemons that would compete for the RAM…)


We’ve tried it with and without L2ARC, and replaced the SSD:s. Disabled TRIM. Not much difference. Tried trimming various sysctls but no difference seen so far. Annoying problem this…

- Peter

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CFD0E4E5-EF2B-4789-BF14-F46AC569A191>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation