Date: Sat, 14 Mar 2020 16:36:49 +0100 From: Peter Eriksson <pen@lysator.liu.se> To: FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: ZFS/NFS hickups and some tools to monitor stuff... Message-ID: <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se>
next in thread | raw e-mail | index | archive | help
The last couple of weeks I=E2=80=99ve been fighting with a severe case = of NFS users complaining about slow response times from our (5) FreeBSD = 11.3-RELEASE-p6 file servers. Now even though our SMB (Windows) users = (thankfully since they are like 500 per server vs 50 NFS users) didn=E2=80= =99t see the same slowdown (or atleast didn=E2=80=99t complain about it) = the root cause is probably ZFS-related. We=E2=80=99ve identified a number of cases where some ZFS operation can = cause severe slowdown of NFS operations, and I=E2=80=99ve been trying to = figure our what is the cause and ways to mitigate the problem=E2=80=A6 Some operations that have caused issues: 1. Resilver (basically made NFS service useless during the week it = took=E2=80=A6) with response time for NFS operations regularity up to 10 = seconds or more (vs the normal 1-10ms) 2. Snapshot recursive deferred destruction (=E2=80=9Czfs destroy -dr = DATA@snapnam=E2=80=9D). Especially bad together with filesystems at or = near quota. 3. Rsync cloning of data into the servers. Response times up to 15 = minutes was seen=E2=80=A6 Yes, 15 minutes to do a mkdir(=E2=80=9Ctest-dir=E2= =80=9D). Possibly in conjunction with #1 above=E2=80=A6. Previously #1 and #2 hasn=E2=80=99t caused that much problems, and #3 = definitely. Something has changed the last half year or so but so far I = haven=E2=80=99t been able to figure it out. In order to test/figure things out I=E2=80=99ve written a couple of = tools that other might be finding useful: A couple of Dtrace scripts to watch and monitor timing of kernel calls = related to NFS, GSS & ZFS operations (the =E2=80=9Cnfsuser=E2=80=9D = script is a simple =E2=80=9Cnfs user top=E2=80=9D tool). Also a patch = that adds a sysctl than can be set and have the NFS kernel code be = verbose about slow NFS operations: - https://github.com/ptrrkssn/freebsd-stuff = <https://github.com/ptrrkssn/freebsd-stuff> With the kernel patch (nfsd-verbose-timing-11.3.patch or = nfsd-verbose-timing-12.1.patch) then it=E2=80=99ll look like this: (Altert if op takes more than 1000us): # sysctl vfs.nfsd.verbose_timing=3D1000 nfsrvd_dorpc(vers=3D4.1, uid=3D65534, procnum=3D9, repstat=3D0) took = 2853 =C2=B5s nfsrvd_dorpc(vers=3D4.0, uid=3D1003258, procnum=3D17, repstat=3D10009) = took 5433 =C2=B5s nfsrvd_dorpc(vers=3D4.1, uid=3D65534, procnum=3D9, repstat=3D0) took = 2026 =C2=B5s (The Dtrace scripts can do similar stuff, but just in case I wanted a = real in-kernel way to see if delays are caused by something on the = machine or something else (network delays etc). A very simple C program that basically in a loop tests some simple = filesystem operations and measures the time it takes: - https://github.com/ptrrkssn/pfst <https://github.com/ptrrkssn/pfst> Mount a number of fileservers on /mnt/filur01, /mnt/filur02, = /mnt/filur03 etc, then run this to have It complain if an operation = takes more than 100ms): (With =E2=80=9C-v=E2=80=9D will print time time each operation takes) $ ./pfst -t100ms /mnt/filur0* 2020-03-14 10:56:36 [ 109 ms]: /mnt/filur03: = mkdir("t-omnibus-615-53202") [Time limit exceeded] 2020-03-14 11:14:22 [ 536 s ]: /mnt/filur03: = mkdir("t-omnibus-615-53729") [Time limit exceeded] 2020-03-14 11:31:47 [ 13 m ]: /mnt/filur03: = mkdir("t-omnibus-615-53965") [Time limit exceeded] 2020-03-14 11:35:44 [ 182 ms]: /mnt/filur03: = mkdir("t-omnibus-615-54201") [Time limit exceeded] 2020-03-14 12:20:03 [ 15 m ]: /mnt/filur03: = mkdir("t-omnibus-615-55908") [Time limit exceeded] 2020-03-14 12:39:09 [ 15 m ]: /mnt/filur03: = mkdir("t-omnibus-615-56103") [Time limit exceeded] 2020-03-14 12:50:58 [ 466 s ]: /mnt/filur03: = mkdir("t-omnibus-615-56344") [Time limit exceeded] With =E2=80=9C-v=E2=80=9D: 2020-03-14 16:27:48 [1349 =C2=B5s]: /mnt/filur01: = mkdir("t-omnibus-637-2") 2020-03-14 16:27:48 [ 327 =C2=B5s]: /mnt/filur01: = rmdir("t-omnibus-637-2") It would be interresting to see if others too are seeing ZFS and/or NFS = slowdowns during heavy writing operations (resilver, snapshot-destroy, = rsync)=E2=80=A6 Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB = SSD:s for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX = set to 64GB (used to be 128GB but we ran into out-of-memory with the = 500+ Samba smbd daemons that would compete for the RAM=E2=80=A6) We=E2=80=99ve tried it with and without L2ARC, and replaced the SSD:s. = Disabled TRIM. Not much difference. Tried trimming various sysctls but = no difference seen so far. Annoying problem this=E2=80=A6 - Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CFD0E4E5-EF2B-4789-BF14-F46AC569A191>