From owner-freebsd-fs@freebsd.org Thu Mar 26 00:27:40 2020 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9598F268E29 for ; Thu, 26 Mar 2020 00:27:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660056.outbound.protection.outlook.com [40.107.66.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48nm5j1bgRz4bvY for ; Thu, 26 Mar 2020 00:27:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Kt69Q6NdC4imo36daRJIGYbmbx8/L6A9RATtoMdSY6q+ocJuUZo8MDG07KKX9iUKkUybZJnY9EvLJj3mGn59C6BmEgupJQbYAHjXS5oooT5KMldy4KN8ZkOXfeGAWDMCXoC3toS8wPmGrbgMuDTTMOtLVkhFuagEsbXZD2mcwnH/Wi/pu3ixrUgmKL7D9U4XjBKt5q87XxSLiXZawfrvcIvn6Jibc6j4xRhfdehqjMNRuthGAYQTbuJ5DrieCnAfNFFMU9XRSkTSPGeYCzO1B3dqENlF0/s6OSMefhXe/v22YfI+8H+E8DGStpzIzmmtErgkGWv53moXqiZvAzsLoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pCtCMNoN47DfIHCzFbayQwBsPb61k51zgjfsCuwaVK4=; b=Spz0VIplHJq2SpttVo4Kysonphns0pM+5T+SCpKSKji6/V2iYMpoNEp9pIiVeoCKtYZzkM7u0Q6KMii8z3OKAjiyeSKKpEulPcRbKYJynitxWNMdlzk0OsPEG/jKDt7RB8ysOr25j8Ix24nhqOPnBZgt7ENEx8+IBMIXO9lLTFNJxhxLvgCi4ixYDd9mfWrIgfZgn+7B1+9cV6sILcRw3fx6+2JywBr8idoKunMvusLcz7LmdYIXdg+dZMetXfRupEBCcGO/F3xdPYR1v1B1EjxCWNpxLtxq4WuF8stsqnUgp5gBvwxmx3UtSsgG57MBkXLagkX39ZmrxqKWtzhbaQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none Received: from QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM (52.132.86.26) by QB1PR01MB3346.CANPRD01.PROD.OUTLOOK.COM (52.132.89.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2835.18; Thu, 26 Mar 2020 00:27:10 +0000 Received: from QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM ([fe80::ed8c:7662:79ba:5f9f]) by QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM ([fe80::ed8c:7662:79ba:5f9f%5]) with mapi id 15.20.2835.023; Thu, 26 Mar 2020 00:27:10 +0000 From: Rick Macklem To: Peter Eriksson , FreeBSD Filesystems Subject: Re: ZFS/NFS hickups and some tools to monitor stuff... Thread-Topic: ZFS/NFS hickups and some tools to monitor stuff... Thread-Index: AQHV+hZvwap6TBZNNkqxEDBzyftSQKhaEQa+ Date: Thu, 26 Mar 2020 00:27:10 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 52a6224f-5e8c-4b88-68ae-08d7d11c6c7f x-ms-traffictypediagnostic: QB1PR01MB3346: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:6430; x-forefront-prvs: 0354B4BED2 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(396003)(136003)(376002)(366004)(39860400002)(55016002)(33656002)(66476007)(76116006)(66446008)(66946007)(64756008)(186003)(52536014)(9686003)(5660300002)(66556008)(86362001)(110136005)(478600001)(71200400001)(966005)(81156014)(8676002)(7696005)(6506007)(8936002)(316002)(2906002)(296002)(81166006)(786003); DIR:OUT; SFP:1101; SCL:1; SRVR:QB1PR01MB3346; H:QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: A3YEem++dNsgfhU9oUVd2HxXJb9Nar2DFcqMYhMI6Rq14SJzIE+fVqwZOr9bxVWFMg4rhl4IZ2kQeVrcYHcnD9DSaWO24fD8H+sMdEKRpYuRx1bkC7D7GeT8tw04vVY0TA40eyxD4WpVUnUE7hO3Pvt0xkDaY1vAb9Dpkayl1i43iYWrzmk903e7dEv6630a/uM/Tsc9raT2z7XhVCz/bHmQB1hb88jv7+hwLH8M1WPzKHo9qrqqd1+lxpe/hoWsbqWVI1/nrn+OYeOhn3s9aqlSWLCFVlJY5c/w1srVr/qy55LjvbzNDVuv1CsWwtrb0W05jMMg8uPvvmm/TqkRBhT3t3YfVc1Dyam+uRt7DKjID4BnIsHTAxs30sHro1khqoRQS0kEnFbbJdvxd7Sf1xWwcDIXaRGx7ko5Cj7ZtaSB95XG8lg4I5ouGQ10erEWwu57nw9ty7D+JKzVYCa4dVy+7rBV1+DzQiGDnfO4BtyGOdin4a7EkQdF2PmQ8LzEmmBbkk2mtenl5kkC1NL5bg== x-ms-exchange-antispam-messagedata: NfETnS3qy1+RhNrNi23ul+1w8nMjs9AKLRgJmhlLzlFw9sXMX5TmWDudjopRbC9HDOUNoWOvRvz68dkMJJIC7sRgOwTjJAftyfw0G1VWLDkDVGBAfl9UXGuPwKSbW+gSj5s70GqMJFTJZCY3FEHgcblougRY+mAxHZ5BuAhIN739VMV0qoIBQLUzeSxRQdCV9IFTERm2vw9+1S6gmlBgTQ== x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 52a6224f-5e8c-4b88-68ae-08d7d11c6c7f X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Mar 2020 00:27:10.5213 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: imYZt3OXLCNNW6ooM/wo33HhqH4Di0q2bn8AyD3VX5BoImoeinHc8jBNKNfGkB6AjD90GaD6b3UXX7VkLwPgiA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: QB1PR01MB3346 X-Rspamd-Queue-Id: 48nm5j1bgRz4bvY X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.56 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.68 / 15.00]; RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[56.66.107.40.list.dnswl.org : 127.0.3.0]; IP_SCORE(-1.38)[ipnet: 40.64.0.0/10(-3.75), asn: 8075(-3.13), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; ARC_ALLOW(-1.00)[i=1] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Mar 2020 00:27:40 -0000 Peter Eriksson wrote:=0A= >The last couple of weeks I=92ve been fighting with a severe case of NFS us= ers >complaining about slow response times from our (5) FreeBSD 11.3-RELEAS= E-p6 file >servers. Now even though our SMB (Windows) users (thankfully sin= ce they are like >500 per server vs 50 NFS users) didn=92t see the same slo= wdown (or atleast didn=92t >complain about it) the root cause is probably Z= FS-related.=0A= >=0A= >We=92ve identified a number of cases where some ZFS operation can cause se= vere >slowdown of NFS operations, and I=92ve been trying to figure our what= is the cause and >ways to mitigate the problem=85=0A= >=0A= >Some operations that have caused issues:=0A= >=0A= >1. Resilver (basically made NFS service useless during the week it took=85= ) with >response time for NFS operations regularity up to 10 seconds or mor= e (vs the normal >1-10ms)=0A= >=0A= >2. Snapshot recursive deferred destruction (=93zfs destroy -dr DATA@snapna= m=94). >Especially bad together with filesystems at or near quota.=0A= >=0A= >3. Rsync cloning of data into the servers. Response times up to 15 minutes= was seen=85 >Yes, 15 minutes to do a mkdir(=93test-dir=94). Possibly in co= njunction with #1 above=85.=0A= >=0A= >Previously #1 and #2 hasn=92t caused that much problems, and #3 definitely= . >Something has changed the last half year or so but so far I haven=92t be= en able to >figure it out.=0A= >=0A= [stuff snipped]=0A= >It would be interresting to see if others too are seeing ZFS and/or NFS sl= owdowns >during heavy writing operations (resilver, snapshot-destroy, rsync= )=85=0A= >=0A= >=0A= >Our DATA pools are basically 2xRAIDZ2(4+2) of 10TB 7200rpm disks + 400GB S= SD:s >for ZIL + 400GB SSDs for L2ARC. 256GB RAM, configured with ARC-MAX se= t to 64GB >(used to be 128GB but we ran into out-of-memory with the 500+ Sa= mba smbd >daemons that would compete for the RAM=85)=0A= Since no one else has commented, I'll mention a few things.=0A= First the disclaimer...I never use ZFS and know nothing about SSDs, so a lo= t of=0A= what I'll be saying comes from discussions I've seen by others.=0A= =0A= Now, I see you use a mirrored pair of SSDs for ZIL logging devices.=0A= You don't mention what NFS client(s) are mounting the server, so I'm going= =0A= to assume they are Linux systems.=0A= - I don't know how the client decides, but I have seen NFS Linux packet tra= ces=0A= where the client does a lot of 4K writes with FILE_STABLE. FILE_STABLE me= ans=0A= that the data and metadata related to the write must be on stable storage= =0A= before the RPC replies NFS_OK.=0A= --> This means the data and metadata changes must be written to the ZIL.= =0A= As such, really slow response when a ZIL log device is being resilvered isn= 't=0A= surprising to me.=0A= For the other cases, there is a heavy write load, which "might" also be hit= ting=0A= the ZIL log hard.=0A= =0A= What can you do about this?=0A= - You can live dangerously and set "sync=3Ddisabled" for ZFS. This means th= at=0A= the writes will reply NFS_OK without needing to write to the ZIL log fir= st.=0A= (I don't know enough about ZFS to know whether or not this makes the ZIL= =0A= log no longer get used?)=0A= - Why do I say "live dangerously"? Because data writes could get lost whe= n=0A= the NFS server reboots and the NFS client would think the data was writ= ten=0A= just fine.=0A= =0A= I'm the last guy to discuss SSDs, but they definitely have weird performanc= e=0A= for writing and can get very slow for writing, especially when they get nea= rly=0A= full.=0A= --> I have heard others recommend limiting the size of your ZIL to at most= =0A= 1/2 of the SSD's capacity, assuming the SSD is dedicated to the ZIL= =0A= and nothing else. (I have no idea if you already do this?)=0A= =0A= Hopefully others will have further comments, rick=0A= =0A= =0A= We=92ve tried it with and without L2ARC, and replaced the SSD:s. Disabled T= RIM. Not much difference. Tried trimming various sysctls but no difference = seen so far. Annoying problem this=85=0A= =0A= - Peter=0A= =0A= _______________________________________________=0A= freebsd-fs@freebsd.org mailing list=0A= https://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A= To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A=