Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Mar 2020 00:27:10 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Peter Eriksson <pen@lysator.liu.se>, FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: ZFS/NFS hickups and some tools to monitor stuff...
Message-ID:  <QB1PR01MB3649E19EBBF0348CC2EB4796DDCF0@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se>
References:  <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Eriksson wrote:=0A=
>The last couple of weeks I=92ve been fighting with a severe case of NFS us=
ers >complaining about slow response times from our (5) FreeBSD 11.3-RELEAS=
E-p6 file >servers. Now even though our SMB (Windows) users (thankfully sin=
ce they are like >500 per server vs 50 NFS users) didn=92t see the same slo=
wdown (or atleast didn=92t >complain about it) the root cause is probably Z=
FS-related.=0A=
>=0A=
>We=92ve identified a number of cases where some ZFS operation can cause se=
vere >slowdown of NFS operations, and I=92ve been trying to figure our what=
 is the cause and >ways to mitigate the problem=85=0A=
>=0A=
>Some operations that have caused issues:=0A=
>=0A=
>1. Resilver (basically made NFS service useless during the week it took=85=
) with >response time for NFS operations regularity up to 10 seconds or mor=
e (vs the normal >1-10ms)=0A=
>=0A=
>2. Snapshot recursive deferred destruction (=93zfs destroy -dr DATA@snapna=
m=94). >Especially bad together with filesystems at or near quota.=0A=
>=0A=
>3. Rsync cloning of data into the servers. Response times up to 15 minutes=
 was seen=85 >Yes, 15 minutes to do a mkdir(=93test-dir=94). Possibly in co=
njunction with #1 above=85.=0A=
>=0A=
>Previously #1 and #2 hasn=92t caused that much problems, and #3 definitely=
. >Something has changed the last half year or so but so far I haven=92t be=
en able to >figure it out.=0A=
>=0A=
[stuff snipped]=0A=
>It would be interresting to see if others too are seeing ZFS and/or NFS sl=
owdowns >during heavy writing operations (resilver, snapshot-destroy, rsync=
)=85=0A=
>=0A=
>=0A=
>Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB S=
SD:s >for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX se=
t to 64GB >(used to be 128GB but we ran into out-of-memory with the 500+ Sa=
mba smbd >daemons that would compete for the RAM=85)=0A=
Since no one else has commented, I'll mention a few things.=0A=
First the disclaimer...I never use ZFS and know nothing about SSDs, so a lo=
t of=0A=
what I'll be saying comes from discussions I've seen by others.=0A=
=0A=
Now, I see you use a mirrored pair of SSDs for ZIL logging devices.=0A=
You don't mention what NFS client(s) are mounting the server, so I'm going=
=0A=
to assume they are Linux systems.=0A=
- I don't know how the client decides, but I have seen NFS Linux packet tra=
ces=0A=
  where the client does a lot of 4K writes with FILE_STABLE. FILE_STABLE me=
ans=0A=
  that the data and metadata related to the write must be on stable storage=
=0A=
  before the RPC replies NFS_OK.=0A=
  --> This means the data and metadata changes must be written to the ZIL.=
=0A=
As such, really slow response when a ZIL log device is being resilvered isn=
't=0A=
surprising to me.=0A=
For the other cases, there is a heavy write load, which "might" also be hit=
ting=0A=
the ZIL log hard.=0A=
=0A=
What can you do about this?=0A=
- You can live dangerously and set "sync=3Ddisabled" for ZFS. This means th=
at=0A=
   the writes will reply NFS_OK without needing to write to the ZIL log fir=
st.=0A=
   (I don't know enough about ZFS to know whether or not this makes the ZIL=
=0A=
    log no longer get used?)=0A=
  - Why do I say "live dangerously"? Because data writes could get lost whe=
n=0A=
    the NFS server reboots and the NFS client would think the data was writ=
ten=0A=
    just fine.=0A=
=0A=
I'm the last guy to discuss SSDs, but they definitely have weird performanc=
e=0A=
for writing and can get very slow for writing, especially when they get nea=
rly=0A=
full.=0A=
--> I have heard others recommend limiting the size of your ZIL to at most=
=0A=
      1/2 of the SSD's capacity, assuming the SSD is dedicated to the ZIL=
=0A=
      and nothing else. (I have no idea if you already do this?)=0A=
=0A=
Hopefully others will have further comments, rick=0A=
=0A=
=0A=
We=92ve tried it with and without L2ARC, and replaced the SSD:s. Disabled T=
RIM. Not much difference. Tried trimming various sysctls but no difference =
seen so far. Annoying problem this=85=0A=
=0A=
- Peter=0A=
=0A=
_______________________________________________=0A=
freebsd-fs@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A=
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB3649E19EBBF0348CC2EB4796DDCF0>