Date: Thu, 26 Mar 2020 00:27:10 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Peter Eriksson <pen@lysator.liu.se>, FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: ZFS/NFS hickups and some tools to monitor stuff... Message-ID: <QB1PR01MB3649E19EBBF0348CC2EB4796DDCF0@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se> References: <CFD0E4E5-EF2B-4789-BF14-F46AC569A191@lysator.liu.se>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Eriksson wrote:=0A= >The last couple of weeks I=92ve been fighting with a severe case of NFS us= ers >complaining about slow response times from our (5) FreeBSD 11.3-RELEAS= E-p6 file >servers. Now even though our SMB (Windows) users (thankfully sin= ce they are like >500 per server vs 50 NFS users) didn=92t see the same slo= wdown (or atleast didn=92t >complain about it) the root cause is probably Z= FS-related.=0A= >=0A= >We=92ve identified a number of cases where some ZFS operation can cause se= vere >slowdown of NFS operations, and I=92ve been trying to figure our what= is the cause and >ways to mitigate the problem=85=0A= >=0A= >Some operations that have caused issues:=0A= >=0A= >1. Resilver (basically made NFS service useless during the week it took=85= ) with >response time for NFS operations regularity up to 10 seconds or mor= e (vs the normal >1-10ms)=0A= >=0A= >2. Snapshot recursive deferred destruction (=93zfs destroy -dr DATA@snapna= m=94). >Especially bad together with filesystems at or near quota.=0A= >=0A= >3. Rsync cloning of data into the servers. Response times up to 15 minutes= was seen=85 >Yes, 15 minutes to do a mkdir(=93test-dir=94). Possibly in co= njunction with #1 above=85.=0A= >=0A= >Previously #1 and #2 hasn=92t caused that much problems, and #3 definitely= . >Something has changed the last half year or so but so far I haven=92t be= en able to >figure it out.=0A= >=0A= [stuff snipped]=0A= >It would be interresting to see if others too are seeing ZFS and/or NFS sl= owdowns >during heavy writing operations (resilver, snapshot-destroy, rsync= )=85=0A= >=0A= >=0A= >Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB S= SD:s >for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX se= t to 64GB >(used to be 128GB but we ran into out-of-memory with the 500+ Sa= mba smbd >daemons that would compete for the RAM=85)=0A= Since no one else has commented, I'll mention a few things.=0A= First the disclaimer...I never use ZFS and know nothing about SSDs, so a lo= t of=0A= what I'll be saying comes from discussions I've seen by others.=0A= =0A= Now, I see you use a mirrored pair of SSDs for ZIL logging devices.=0A= You don't mention what NFS client(s) are mounting the server, so I'm going= =0A= to assume they are Linux systems.=0A= - I don't know how the client decides, but I have seen NFS Linux packet tra= ces=0A= where the client does a lot of 4K writes with FILE_STABLE. FILE_STABLE me= ans=0A= that the data and metadata related to the write must be on stable storage= =0A= before the RPC replies NFS_OK.=0A= --> This means the data and metadata changes must be written to the ZIL.= =0A= As such, really slow response when a ZIL log device is being resilvered isn= 't=0A= surprising to me.=0A= For the other cases, there is a heavy write load, which "might" also be hit= ting=0A= the ZIL log hard.=0A= =0A= What can you do about this?=0A= - You can live dangerously and set "sync=3Ddisabled" for ZFS. This means th= at=0A= the writes will reply NFS_OK without needing to write to the ZIL log fir= st.=0A= (I don't know enough about ZFS to know whether or not this makes the ZIL= =0A= log no longer get used?)=0A= - Why do I say "live dangerously"? Because data writes could get lost whe= n=0A= the NFS server reboots and the NFS client would think the data was writ= ten=0A= just fine.=0A= =0A= I'm the last guy to discuss SSDs, but they definitely have weird performanc= e=0A= for writing and can get very slow for writing, especially when they get nea= rly=0A= full.=0A= --> I have heard others recommend limiting the size of your ZIL to at most= =0A= 1/2 of the SSD's capacity, assuming the SSD is dedicated to the ZIL= =0A= and nothing else. (I have no idea if you already do this?)=0A= =0A= Hopefully others will have further comments, rick=0A= =0A= =0A= We=92ve tried it with and without L2ARC, and replaced the SSD:s. Disabled T= RIM. Not much difference. Tried trimming various sysctls but no difference = seen so far. Annoying problem this=85=0A= =0A= - Peter=0A= =0A= _______________________________________________=0A= freebsd-fs@freebsd.org mailing list=0A= https://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A= To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB3649E19EBBF0348CC2EB4796DDCF0>